This tutorial guides you through installing Hadoop on hortonworks using a multi node cluster setup with Ubuntu OS.
Hosts File:
Ensure every server has the FQDN of all the servers to be in the cluster.
sudo nano /etc/hosts
SSH (Ambari Server)
You will do the following to all the servers in the Hadoop cluster.
ssh-keygen -t rsa -P "" cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys ssh-copy-id -i ~/.ssh/id_rsa.pub ##USER##@##FQDN## ssh ##USER##@##FQDN##
Pre-Requisites: (not Ambari Server)
Java 8:
sudo apt-get install openjdk-8-jdk
Chrony:
sudo apt-get install chrony
Disable HugePage:
sudo su echo never > /sys/kernel/mm/transparent_hugepage/enabled exit
Install HDFS Service:
You will need to login to Ambari Server and click “Launch Install Wizard”. For the most part you will just follow the prompts. The major hurdles is that in the “Install Options” section make sure you put the FQDN (IE: host@domain.com). You will also need to get the SSH Private Key from the Ambari Server you just did during pre requisites from this location /home/##USER##/.ssh/id_rsa. Make sure you also set the SSH User Account to what you used during SSH creation. If for any reason it fails you can click the status to find out what failed and rectify the problem. As long as you did the pre-requisites you should be fine.
ZooKeeper / Ambari Metrics
As you install HDFS you will notice that Ambari Metrics and ZooKeeper get installed automatically. This is a good thing and you want it. ZooKeeper keeps all configs in sync and Ambari Metrics lets you easily monitor the system.
Assign Masters
You will need to setup how you want your masters to look. I usually have three zookeepers. Your secondary name node should go on a separate server. But it is totally up to you how you design your cluster. Have fun!
Assign Slaves / Clients
Your slaves (aka DataNodes) I don’t put any on my namenode or secondary namenode or my zookeeper servers. I leave my datanodes to perform that action alone. I also install clients on namenode, secondary namenode and all datanodes. Up to you how you configure it just have fun while doing it!
Key Config Optional Changes
Once you get to the customize services section. You can for the most part leave this as is and just do the password areas. But I do recommend reviewing the following and update as needed.
- HDFS: NameNode/DataNode/Secondary NameNode directories
- ZooKeeper: ZooKeeper directory
Deploy
Deploy should work with no issues. If there is issues sometimes you don’t need to worry about it. Such as connection issue. As long as it installed if it didn’t start right away and that was the connection issue then it may start once completed. You should also note that Ambari Metrics shows errors directly after starting. That is expected and no need to worry it will clear itself.