Hadoop: Rack Awareness

(Last Updated On: )

If you want your multi node cluster to be rack aware you need to do a few things. The following is to be done only on the master (namenode) only.

  1. nano /home/myuser/rack.sh

With the following contents

  1. #!/bin/bash
  2.  
  3. # Adjust/Add the property "net.topology.script.file.name"
  4. # to core-site.xml with the "absolute" path the this
  5. # file. ENSURE the file is "executable".
  6.  
  7. # Supply appropriate rack prefix
  8. RACK_PREFIX=myrackprefix
  9.  
  10. # To test, supply a hostname as script input:
  11. if [ $# -gt 0 ]; then
  12.  
  13. CTL_FILE=${CTL_FILE:-"rack.data"}
  14.  
  15. HADOOP_CONF=${HADOOP_CONF:-"/home/myuser"}
  16.  
  17. if [ ! -f ${HADOOP_CONF}/${CTL_FILE} ]; then
  18. echo -n "/$RACK_PREFIX/rack "
  19. exit 0
  20. fi
  21.  
  22. while [ $# -gt 0 ] ; do
  23. nodeArg=$1
  24. exec< ${HADOOP_CONF}/${CTL_FILE}
  25. result=""
  26. while read line ; do
  27. ar=( $line )
  28. if [ "${ar[0]}" = "$nodeArg" ] ; then
  29. result="${ar[1]}"
  30. fi
  31. done
  32. shift
  33. if [ -z "$result" ] ; then
  34. echo -n "/$RACK_PREFIX/rack "
  35. else
  36. echo -n "/$RACK_PREFIX/rack_$result "
  37. fi
  38. done
  39.  
  40. else
  41. echo -n "/$RACK_PREFIX/rack "
  42. fi

Set execute permissions

  1. sudo chmod 755 rack.sh

Create the data file that has your rack information. You must be very careful not to have too many spaces between the host and the rack.

  1. namenode_ip 1
  2. secondarynode_ip 2
  3. datanode1_ip 1
  4. datanode2_ip 2

The last step is to update core-site.xml file located in your hadoop directory.

  1. nano /usr/local/hadoop/etc/hadoop/core-site.xml

Set the contents to the following of where your rack.sh file is located.

  1. <property>
  2. <name>net.topology.script.file.name</name>
  3. <value>/home/myuser/rack.sh</value>
  4. </property>

2 thoughts on “Hadoop: Rack Awareness”

Comments are closed.