If you want your multi node cluster to be rack aware you need to do a few things. The following is to be done only on the master (namenode) only.
nano /home/myuser/rack.sh
With the following contents
#!/bin/bash # Adjust/Add the property "net.topology.script.file.name" # to core-site.xml with the "absolute" path the this # file. ENSURE the file is "executable". # Supply appropriate rack prefix RACK_PREFIX=myrackprefix # To test, supply a hostname as script input: if [ $# -gt 0 ]; then CTL_FILE=${CTL_FILE:-"rack.data"} HADOOP_CONF=${HADOOP_CONF:-"/home/myuser"} if [ ! -f ${HADOOP_CONF}/${CTL_FILE} ]; then echo -n "/$RACK_PREFIX/rack " exit 0 fi while [ $# -gt 0 ] ; do nodeArg=$1 exec< ${HADOOP_CONF}/${CTL_FILE} result="" while read line ; do ar=( $line ) if [ "${ar[0]}" = "$nodeArg" ] ; then result="${ar[1]}" fi done shift if [ -z "$result" ] ; then echo -n "/$RACK_PREFIX/rack " else echo -n "/$RACK_PREFIX/rack_$result " fi done else echo -n "/$RACK_PREFIX/rack " fi
Set execute permissions
sudo chmod 755 rack.sh
Create the data file that has your rack information. You must be very careful not to have too many spaces between the host and the rack.
namenode_ip 1 secondarynode_ip 2 datanode1_ip 1 datanode2_ip 2
The last step is to update core-site.xml file located in your hadoop directory.
nano /usr/local/hadoop/etc/hadoop/core-site.xml
Set the contents to the following of where your rack.sh file is located.
<property> <name>net.topology.script.file.name</name> <value>/home/myuser/rack.sh</value> </property>
2 thoughts on “Hadoop: Rack Awareness”
Comments are closed.