Hadoop: Rack Awareness

If you want your multi node cluster to be rack aware you need to do a few things. The following is to be done only on the master (namenode) only.

nano /home/myuser/rack.sh

With the following contents

#!/bin/bash

# Adjust/Add the property "net.topology.script.file.name"
# to core-site.xml with the "absolute" path the this
# file. ENSURE the file is "executable".

# Supply appropriate rack prefix
RACK_PREFIX=myrackprefix

# To test, supply a hostname as script input:
if [ $# -gt 0 ]; then

CTL_FILE=${CTL_FILE:-"rack.data"}

HADOOP_CONF=${HADOOP_CONF:-"/home/myuser"}

if [ ! -f ${HADOOP_CONF}/${CTL_FILE} ]; then
 echo -n "/$RACK_PREFIX/rack "
 exit 0
fi

while [ $# -gt 0 ] ; do
 nodeArg=$1
 exec< ${HADOOP_CONF}/${CTL_FILE}
 result=""
 while read line ; do
 ar=( $line )
 if [ "${ar[0]}" = "$nodeArg" ] ; then
 result="${ar[1]}"
 fi
 done
 shift
 if [ -z "$result" ] ; then
 echo -n "/$RACK_PREFIX/rack "
 else
 echo -n "/$RACK_PREFIX/rack_$result "
 fi
done

else
 echo -n "/$RACK_PREFIX/rack "
fi

Set execute permissions

sudo chmod 755 rack.sh

Create the data file that has your rack information. You must be very careful not to have too many spaces between the host and the rack.

namenode_ip 1
secondarynode_ip 2
datanode1_ip 1
datanode2_ip 2

The last step is to update core-site.xml file located in your hadoop directory.

nano /usr/local/hadoop/etc/hadoop/core-site.xml

Set the contents to the following of where your rack.sh file is located.

  <property>
    <name>net.topology.script.file.name</name>
    <value>/home/myuser/rack.sh</value>
  </property>