Hadoop: Commands

Below is a list of all the commands I have had to use while working with Hadoop. If you have any other ones that are not listed here please feel free to add them in or if you have updates to ones below.

Move Files:

 hadoop fs -mv /OLD_DIR/* /NEW_DIR/

Sort Files By Size. Note this is for viewing information only on terminal. It has no affect on the files or the way they are displayed via web ui:

 hdfs fsck /logs/ -files | grep "/FILE_DIR/" | grep -v "<dir>" | gawk '{print $2, $1;}' | sort –n

Display system information:

 hdfs fsck /FILE_dir/ -files

Remove folder with all files in it:

 hadoop fs -rm -R hdfs:///DIR_TO_REMOVE

Make folder:

 hadoop fs -mkdir hdfs:///NEW_DIR

Remove one file:

 hadoop fs -rm hdfs:///DIR/FILENAME.EXTENSION

Copy all file from directory outside of HDFS to HDFS:

 hadoop fs -copyFromLocal LOCAL_DIR hdfs:///DIR

Copy files from HDFS to local directory:

 hadoop dfs -copyToLocal hdfs:///DIR/REGPATTERN LOCAL_DIR

Kill a running MR job:

 hadoop job -kill job_1461090210469_0003

You could also do that via the 8088 web ui interface

Kill yarn application:

 yarn application -kill application_1461778722971_0001

Check status of DATANODES. Check “Under Replicated blocks” field. If you have any you should probably rebalance:

 hadoop dfsadmin –report

Number of files in HDFS directory:

 hadoop fs -count -q hdfs:///DIR

-q is optional – Gives columns QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME

Rename directory:

 hadoop fs -mv hdfs:///OLD_NAME hdfs:///NEW_NAME

Change replication factor on files:

 hadoop fs -setrep -R 3 hdfs:///DIR

3 is the replication number.
You can choose a file if you want

Get yarn log. You can also view via web ui 8088:

 yarn logs -applicationId application_1462141864581_0016

Refresh Nodes:

 hadoop dfsadmin –refreshNodes

Report of blocks and their locations:

 hadoop fsck / -files -blocks –locations

Find out where a particular file is located with blocks:

 hadoop fsck /DIR/FILENAME -files -locations –blocks

Fix under replicated blocks. First command gets the blocks that are under replicated. The second sets replication to 2 for those files. You might have to restart the dfs to see a change from dfsadmin –report:

 hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files

for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ; hadoop fs -setrep 2 $hdfsfile; done

Show all the classpaths associated to hadoop:

 hadoop classpath