Below is a list of all the commands I have had to use while working with Hadoop. If you have any other ones that are not listed here please feel free to add them in or if you have updates to ones below.
Move Files:
hadoop fs -mv /OLD_DIR/* /NEW_DIR/
Sort Files By Size. Note this is for viewing information only on terminal. It has no affect on the files or the way they are displayed via web ui:
hdfs fsck /logs/ -files | grep "/FILE_DIR/" | grep -v "<dir>" | gawk '{print $2, $1;}' | sort –n
Display system information:
hdfs fsck /FILE_dir/ -files
Remove folder with all files in it:
hadoop fs -rm -R hdfs:///DIR_TO_REMOVE
Make folder:
hadoop fs -mkdir hdfs:///NEW_DIR
Remove one file:
hadoop fs -rm hdfs:///DIR/FILENAME.EXTENSION
Copy all file from directory outside of HDFS to HDFS:
hadoop fs -copyFromLocal LOCAL_DIR hdfs:///DIR
Copy files from HDFS to local directory:
hadoop dfs -copyToLocal hdfs:///DIR/REGPATTERN LOCAL_DIR
Kill a running MR job:
hadoop job -kill job_1461090210469_0003
You could also do that via the 8088 web ui interface
Kill yarn application:
yarn application -kill application_1461778722971_0001
Check status of DATANODES. Check “Under Replicated blocks” field. If you have any you should probably rebalance:
hadoop dfsadmin –report
Number of files in HDFS directory:
hadoop fs -count -q hdfs:///DIR
-q is optional – Gives columns QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME
Rename directory:
hadoop fs -mv hdfs:///OLD_NAME hdfs:///NEW_NAME
Change replication factor on files:
hadoop fs -setrep -R 3 hdfs:///DIR
3 is the replication number.
You can choose a file if you want
Get yarn log. You can also view via web ui 8088:
yarn logs -applicationId application_1462141864581_0016
Refresh Nodes:
hadoop dfsadmin –refreshNodes
Report of blocks and their locations:
hadoop fsck / -files -blocks –locations
Find out where a particular file is located with blocks:
hadoop fsck /DIR/FILENAME -files -locations –blocks
Fix under replicated blocks. First command gets the blocks that are under replicated. The second sets replication to 2 for those files. You might have to restart the dfs to see a change from dfsadmin –report:
hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 2 $hdfsfile; done
Show all the classpaths associated to hadoop:
hadoop classpath
You must be logged in to post a comment.