Sqoop2: Kerberize Installation

In this tutorial I will show you how to kerberize Sqoop installation. Before you begin ensure you have installed Sqoop.

This assumes your hostname is “hadoop”

Create Kerberos Principals

cd /etc/security/keytabs
sudo kadmin.local
addprinc -randkey sqoop/hadoop@REALM.CA
xst -kt sqoop.service.keytab sqoop/hadoop@REALM.CA
addprinc -randkey sqoopHTTP/hadoop@REALM.CA
xst -kt sqoopHTTP.service.keytab sqoopHTTP/hadoop@REALM.CA
q

Set Keytab Permissions/Ownership

sudo chown root:hadoopuser /etc/security/keytabs/*
sudo chmod 750 /etc/security/keytabs/*

Configuration

Configure Kerberos with Sqoop

cd /usr/local/sqoop/conf/
nano sqoop.properties

#uncomment the following
org.apache.sqoop.security.authentication.type=KERBEROS
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.KerberosAuthenticationHandler

#update to the following
org.apache.sqoop.security.authentication.kerberos.principal=sqoop/hadoop@GAUDREAULT_KDC.CA
org.apache.sqoop.security.authentication.kerberos.keytab=/etc/security/keytabs/sqoop.service.keytab

 

 

 

 

 

 

 

 

 

 

Sqoop2: Installation

We are going to install Sqoop. Ensure you have Hadoop installed already.

This assumes your hostname is “hadoop”

Install Java JDK

apt-get update
apt-get upgrade
apt-get install default-jdk

Download Sqoop:

wget https://archive.apache.org/dist/sqoop/1.99.7/sqoop-1.99.7-bin-hadoop200.tar.gz
tar -zxvf sqoop-1.99.7-bin-hadoop200.tar.gz
sudo mv sqoop-1.99.7-bin-hadoop200 /usr/local/sqoop/
sudo chown -R root:hadoopuser /usr/local/sqoop/

Setup .bashrc:

 sudo nano ~/.bashrc

Add the following to the end of the file.

#SQOOP VARIABLES START
export SQOOP_HOME=/usr/local/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
export SQOOP_CONF_DIR=$SQOOP_HOME/conf
export SQOOP_CLASS_PATH=$SQOOP_CONF_DIR
#SQOOP VARIABLES STOP

 source ~/.bashrc

Initialise Repository

./bin/sqoop2-tool upgrade

Modify sqoop2-server

If you are running Hadoop on the same server as Sqoop Server you will need to modify this file. The reason is because Sqoop needs you to point to the lib directory for common, hdfs, mapreduce and yarn.

nano /usr/loca/sqoop/bin/sqoop.sh

#Modify these lines
  HADOOP_COMMON_HOME=${HADOOP_COMMON_HOME:-${HADOOP_HOME}/share/hadoop/common}
  HADOOP_HDFS_HOME=${HADOOP_HDFS_HOME:-${HADOOP_HOME}/share/hadoop/hdfs}
  HADOOP_MAPRED_HOME=${HADOOP_MAPRED_HOME:-${HADOOP_HOME}/share/hadoop/mapreduce}
  HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-${HADOOP_HOME}/share/hadoop/yarn}

#TO

  HADOOP_COMMON_HOME=${HADOOP_HOME}/share/hadoop/common
  HADOOP_HDFS_HOME=${HADOOP_HOME}/share/hadoop/hdfs
  HADOOP_MAPRED_HOME=${HADOOP_HOME}/share/hadoop/mapreduce
  HADOOP_YARN_HOME=${HADOOP_HOME}/share/hadoop/yarn

Configuration

nano /usr/local/sqoop/conf/sqoop.properties
#Update the following line
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/hadoop/etc/hadoop/

Start Sqoop Server

./bin/sqoop2-server start

References

https://linoxide.com/tools/install-apache-sqoop-ubuntu-16-04/