Spark Installation on Hadoop

In this tutorial I will show you how to use Kerberos/SSL with Spark integrated with Yarn. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server and Hadoop.

This assumes your hostname is “hadoop”

Create Kerberos Principals

cd /etc/security/keytabs/

sudo kadmin.local

#You can list princepals
listprincs

#Create the following principals
addprinc -randkey spark/hadoop@REALM.CA

#Create the keytab files.
#You will need these for Hadoop to be able to login
xst -k spark.service.keytab spark/hadoop@REALM.CA

Set Keytab Permissions/Ownership

sudo chown root:hadoopuser /etc/security/keytabs/*
sudo chmod 750 /etc/security/keytabs/*

Download

Go to Apache Spark Download and get the link for Spark.

wget http://apache.forsale.plus/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
tar -xvf spark-2.4.4-bin-hadoop2.7.tgz
mv spark-2.4.4-bin-hadoop2.7 /usr/local/spark/

Update .bashrc

sudo nano ~/.bashrc

#Ensure we have the following in the Hadoop section
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

#Add the following

#SPARK VARIABLES START
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$LD_LIBRARY_PATH
#SPARK VARIABLES STOP

source ~/.bashrc

Setup Configuration

cd /usr/local/spark/conf
mv spark-defaults.conf.template spark-defaults.conf
nano spark-defaults.conf

#Add to the end
spark.master                            yarn
spark.yarn.historyServer.address        ${hadoopconf-yarn.resourcemanager.hostname}:18080
spark.yarn.keytab                       /etc/security/keytabs/spark.service.keytab
spark.yarn.principal                    spark/hadoop@REALM.CA
spark.yarn.access.hadoopFileSystems     hdfs://NAMENODE:54310
spark.authenticate                      true
spark.driver.bindAddress                0.0.0.0
spark.authenticate.enableSaslEncryption true
spark.eventLog.enabled                  true
spark.eventLog.dir                      hdfs://NAMENODE:54310/user/spark/applicationHistory
spark.history.fs.logDirectory           hdfs://NAMENODE:54310/user/spark/applicationHistory
spark.history.fs.update.interval        10s
spark.history.ui.port                   18080

#SSL
spark.ssl.enabled                       true
spark.ssl.keyPassword                   PASSWORD
spark.ssl.keyStore                      /etc/security/serverKeys/keystore.jks
spark.ssl.keyStorePassword              PASSWORD
spark.ssl.keyStoreType                  JKS
spark.ssl.trustStore                    /etc/security/serverKeys/truststore.jks
spark.ssl.trustStorePassword            PASSWORD
spark.ssl.trustStoreType                JKS

Kinit

kinit -kt /etc/security/keytabs/spark.service.keytab spark/hadoop@REALM.CA
klist
hdfs dfs -mkdir /user/spark/
hdfs dfs -mkdir /user/spark/applicationHistory
hdfs dfs -ls /user/spark

Start The Service

$SPARK_HOME/sbin/start-history-server.sh

Stop The Service

$SPARK_HOME/sbin/stop-history-server.sh

Spark History Server Web UI

References

I used a lot of different resources and reference material on this. Below are just a few I used.

https://spark.apache.org/docs/latest/running-on-yarn.html#configuration

https://spark.apache.org/docs/latest/security.html

https://www.linode.com/docs/databases/hadoop/install-configure-run-spark-on-top-of-hadoop-yarn-cluster/

 

 

 

 

Sqoop2: Kerberize Installation

In this tutorial I will show you how to kerberize Sqoop installation. Before you begin ensure you have installed Sqoop.

This assumes your hostname is “hadoop”

Create Kerberos Principals

cd /etc/security/keytabs
sudo kadmin.local
addprinc -randkey sqoop/hadoop@REALM.CA
xst -kt sqoop.service.keytab sqoop/hadoop@REALM.CA
addprinc -randkey sqoopHTTP/hadoop@REALM.CA
xst -kt sqoopHTTP.service.keytab sqoopHTTP/hadoop@REALM.CA
q

Set Keytab Permissions/Ownership

sudo chown root:hadoopuser /etc/security/keytabs/*
sudo chmod 750 /etc/security/keytabs/*

Configuration

Configure Kerberos with Sqoop

cd /usr/local/sqoop/conf/
nano sqoop.properties

#uncomment the following
org.apache.sqoop.security.authentication.type=KERBEROS
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.KerberosAuthenticationHandler

#update to the following
org.apache.sqoop.security.authentication.kerberos.principal=sqoop/hadoop@GAUDREAULT_KDC.CA
org.apache.sqoop.security.authentication.kerberos.keytab=/etc/security/keytabs/sqoop.service.keytab

 

 

 

 

 

 

 

 

 

 

Kerberos: Commands

In this tutorial I will give you a few useful commands when using Kerberos. If you haven’t installed Kerberos yet go here. I will keep this updated as time goes on. Also note that the commands below have a variety of options. Please go check.

Admin

This will open Kerberos V5 administration system.

kadmin.local
Add Principal

This will add a new principal. -randkey is optional. When specified the encrypted key will be chosen at random instead of derived from a password. Be sure to change USER to whatever your user is.

addprinc -randkey USER/_HOST@REALM.CA
Create KeyTab

This will create a keytab in the directory where you generated it. You should put it in /etc/security/keytabs/ folder. You can also specify the full path (IE: /etc/security/keytabs/USER.keytab). Be sure to change USER to whatever your user is.

xst -k USER.keytab USER/_HOST@REALM.CA
Kinit

When using the -kt uses the keytab to grant a ticket

kinit -kt /etc/security/keytabs/USER.keytab USER/_HOST@REALM.CA
Klist

If you want to see what tickets have been granted. You can issue the below command.

klist
Inline Commands

You can do inline Kerberos commands without first opening kadmin.local. To do so you must specify the “-q” option then in quotes the command to issue. See below.

kadmin.local -q "addprinc -randkey USER/_HOST@REALM.CA"

 

 

 

 

 

 

 

 

NiFi: Kerberized Kafka Consumer Processor

In this tutorial I will guide you through how to add a Kafka consumer to NiFi which is Kerberized.

For this tutorial you will need an AVRO schema called “person” and it’s contents are as follows.

{
     "type": "record",
     "namespace": "com.example",
     "name": "FullName",
     "fields": [
       { "name": "first_name", "type": "string" },
       { "name": "last_name", "type": "string" }
     ]
}

When ready you can publish this record to Kafka using the Kafka Producer.

{ "first_name": "John", "last_name": "Smith" }

First we need to drag the processor onto the grid.

Next we need select the Kafka Consumer.

Next we configure the processor

 

 

 

 

 

 

 

We will need to create 5 controller services.
First is the Kerberos Service

Next is the SSL Service

Next is the Json Record Reader

Next is the Avro Registry

Next is the Json Record Writer

Now you have finished configuring the services. Ensure your final Kafka Consumer configuration looks like this and you are ready.

Next we need to enable all the controller services

We need to start the processor to start receiving data

Now the record i gave you earlier you can now put to the queue. As you can see the data starts flowing in.

You can now view the queue to see the data.

We are done now and you can start using the consumer.

Phoenix: Kerberize Installation

In this tutorial I will show you how to use Kerberos with Phoenix. Before you begin ensure you have installed Kerberos Server, Hadoop, HBase and Zookeeper.

This assumes your hostname is “hadoop”

Install Phoenix

wget http://apache.forsale.plus/phoenix/apache-phoenix-5.0.0-HBase-2.0/bin/apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz
tar -zxvf apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz
sudo mv apache-phoenix-5.0.0-HBase-2.0-bin /usr/local/phoenix/
cd /usr/local/phoenix/

Setup .bashrc:

 sudo nano ~/.bashrc

Add the following to the end of the file.

#PHOENIX VARIABLES START
export PHOENIX_HOME=/usr/local/phoenix
export PHOENIX_CLASSPATH=$PHOENIX_HOME/*
export PATH=$PATH:$PHOENIX_HOME/bin
#PHOENIX VARIABLES END

 source ~/.bashrc

Link Files

ln -sf $HBASE_CONF_DIR/hbase-site.xml $PHOENIX_HOME/bin/hbase-site.xml
ln -sf $HADOOP_CONF_DIR/core-site.xml $PHOENIX_HOME/bin/core-site.xml
ln -sf $PHOENIX_HOME/phoenix-5.0.0-HBase-2.0-server.jar $HBASE_HOME/lib/phoenix-5.0.0-HBase-2.0-server.jar

hbase-env.sh

nano /usr/local/hbase/conf/hbase-env.sh

#Ensure the following env variables are set

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/local/hadoop/etc/hadoop}
export PHOENIX_CLASSPATH=${PHOENIX_CLASSPATH:-/usr/local/phoenix}
export HBASE_CLASSPATH="$HBASE_CLASSPATH:$CLASSPATH:$HADOOP_CONF_DIR:$PHOENIX_CLASSPATH/phoenix-5.0.0-HBase-2.0-server.jar:$PHOENIX_CLASSPATH/phoenix-core-5.0.0-HBase-2.0.jar:$PHOENIX_CLASSPATH/phoenix-5.0.0-HBase-2.0-client.jar"

hbase-site.xml

nano /usr/local/hbase/conf/hbase-site.xml

#Add the following properties

<property>
	<name>phoenix.functions.allowUserDefinedFunctions</name>
	<value>true</value>
	<description>enable UDF functions</description>
</property>
<property>
	<name>hbase.regionserver.wal.codec</name>
	<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
<property>
	<name>hbase.region.server.rpc.scheduler.factory.class</name>
	<value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value>
	<description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description>
</property>
<property>
	<name>hbase.rpc.controllerfactory.class</name>
	<value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value>
	<description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description>
</property>
<property>
	<name>hbase.defaults.for.version.skip</name>
	<value>true</value>
</property>
<property>
	<name>phoenix.queryserver.http.port</name>
	<value>8765</value>
</property>
<property>
	<name>phoenix.queryserver.serialization</name>
	<value>PROTOBUF</value>
</property>
<property>
	<name>phoenix.queryserver.keytab.file</name>
	<value>/etc/security/keytabs/hbase.service.keytab</value>
</property>
<property>
	<name>phoenix.queryserver.kerberos.principal</name>
	<value>hbase/hadoop@REALM.CA</value>
</property>
<property>
	<name>hoenix.queryserver.http.keytab.file</name>
	<value>/etc/security/keytabs/hbaseHTTP.service.keytab</value>
</property>
<property>
	<name>phoenix.queryserver.http.kerberos.principal</name>
	<value>hbaseHTTP/hadoop@REALM.CA</value>
</property>
<property>
	<name>phoenix.queryserver.dns.nameserver</name>
	<value>hadoop</value>
</property>
<property>
	<name>phoenix.queryserver.dns.interface</name>
	<value>enp0s3</value>
</property>
<property>
		<name>phoenix.schema.mapSystemTablesToNamespace</name>
		<value>true</value>
</property>
<property>
		<name>phoenix.schema.isNamespaceMappingEnabled</name>
		<value>true</value>
</property>

sqlline.py

sqlline.py hadoop:2181:/hbase-secure:hbase/hadoop@GAUDREAULT_KDC.CA:/etc/security/keytabs/hbase.service.keytab

 

HBASE & Java: Connecting Secure

In this tutorial I will show you how to connect to an Secure HBASE using Java. It’s rather straight forward.

Import SSL Cert to Java:

Follow this tutorial to “Installing unlimited strength encryption Java libraries

If on Windows do the following

#Import it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -import -file hadoop.csr -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts" -alias "hadoop"

#Check it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -list -v -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

#If you want to delete it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -delete -alias hadoop -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

POM.xml

<dependency>
	<groupId>org.apache.hbase</groupId>
	<artifactId>hbase-client</artifactId>
	<version>2.1.0</version>
</dependency>
<dependency>
	<groupId>org.apache.hbase</groupId>
	<artifactId>hbase</artifactId>
	<version>2.1.0</version>
	<type>pom</type>
</dependency>

Imports:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.security.UserGroupInformation;

Initiate Kerberos Authentication

System.setProperty("java.security.auth.login.config", "C:\\data\\kafkaconnect\\kafka\\src\\main\\resources\\client_jaas.conf");
System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
System.setProperty("java.security.krb5.conf", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\krb5.conf");
System.setProperty("java.security.krb5.realm", "REALM.CA");
System.setProperty("java.security.krb5.kdc", "REALM.CA");
System.setProperty("sun.security.krb5.debug", "false");
System.setProperty("javax.net.debug", "false");
System.setProperty("javax.net.ssl.keyStorePassword", "changeit");
System.setProperty("javax.net.ssl.keyStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
System.setProperty("javax.net.ssl.trustStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
System.setProperty("javax.net.ssl.trustStorePassword", "changeit");
System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");

Config:

We will use the basic configuration here. You should secure the cluster and use appropriate settings for that.

// Setup the configuration object.
final Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "hadoop");
config.set("hbase.zookeeper.property.clientPort", "2181");
config.set("hadoop.security.authentication", "kerberos");
config.set("hbase.security.authentication", "kerberos");
config.set("hbase.cluster.distributed", "true");
config.set("hbase.rpc.protection", "integrity");
config.set("zookeeper.znode.parent", "/hbase-secure");
config.set("hbase.master.kerberos.principal", "hbase/hadoop@REALM.CA");
config.set("hbase.regionserver.kerberos.principal", "hbase/hadoop@REALM.CA");

Connect:

Now we create the connection.

UserGroupInformation.setConfiguration(config);
UserGroupInformation.setLoginUser(UserGroupInformation.loginUserFromKeytabAndReturnUGI("hbase/hadoop@REALM.CA", "c:\\data\\hbase.service.keytab"));

System.out.println(UserGroupInformation.getLoginUser());
System.out.println(UserGroupInformation.getCurrentUser());

Connection conn = ConnectionFactory.createConnection(config);

//Later when we are done we will want to close the connection.
conn.close();

Hbase Admin:

Retrieve an Admin implementation to administer an HBase cluster. If you need it.

Admin admin = conn.getAdmin();
//Later when we are done we will want to close the connection.
admin.close();

HBase: Kerberize/SSL Installation

In this tutorial I will show you how to use Kerberos/SSL with HBase. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server, Hadoop and Zookeeper.

This assumes your hostname is “hadoop”

We will install a Master, RegionServer and Rest Client

Create Kerberos Principals

cd /etc/security/keytabs/

sudo kadmin.local

#You can list princepals
listprincs

#Create the following principals
addprinc -randkey hbase/hadoop@REALM.CA
addprinc -randkey hbaseHTTP/hadoop@REALM.CA

#Create the keytab files.
#You will need these for Hadoop to be able to login
xst -k hbase.service.keytab hbase/hadoop@REALM.CA
xst -k hbaseHTTP.service.keytab hbaseHTTP/hadoop@REALM.CA

Set Keytab Permissions/Ownership

sudo chown root:hadoopuser /etc/security/keytabs/*
sudo chmod 750 /etc/security/keytabs/*

Install HBase

wget http://apache.forsale.plus/hbase/2.1.0/hbase-2.1.0-bin.tar.gz
tar -zxvf hbase-2.1.0-bin.tar.gz
sudo mv hbase-2.1.0 /usr/local/hbase/
cd /usr/local/hbase/conf/

Setup .bashrc:

 sudo nano ~/.bashrc

Add the following to the end of the file.

#HBASE VARIABLES START
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
export HBASE_CONF_DIR=$HBASE_HOME/conf
#HBASE VARIABLES END

 source ~/.bashrc

hbase_client_jaas.conf

Client {
        com.sun.security.auth.module.Krb5LoginModule required
        useKeyTab=false
        useTicketCache=true;
};

hbase_server_jaas.conf

Client {
        com.sun.security.auth.module.Krb5LoginModule required
        useKeyTab=true
        useTicketCache=false
        keyTab="/etc/security/keytabs/hbase.service.keytab"
        principal="hbase/hadoop@REALM.CA";
};

regionservers

hadoop

hbase-env.sh

Add or modify the following settings.

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
export HBASE_CONF_DIR=${HBASE_CONF_DIR:-/usr/local/hbase/conf}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/local/hadoop/etc/hadoop}
export HBASE_CLASSPATH="$CLASSPATH:$HADOOP_CONF_DIR"
export HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers
export HBASE_LOG_DIR=${HBASE_HOME}/logs
export HBASE_PID_DIR=/home/hadoopuser
export HBASE_MANAGES_ZK=false
export HBASE_OPTS="-Djava.security.auth.login.config=$HBASE_CONF_DIR/hbase_client_jaas.conf"
export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_CONF_DIR/hbase_server_jaas.conf"
export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_CONF_DIR/hbase_server_jaas.conf"

hbase-site.xml

<configuration>
	<property>
		<name>hbase.rootdir</name>
		<value>hdfs://hadoop:54310/hbase</value>
	</property>
	<property>
		<name>hbase.zookeeper.property.dataDir</name>
		<value>/usr/local/zookeeper/data</value>
	</property>
	<property>
		<name>hbase.cluster.distributed</name>
		<value>true</value>
	</property>
	<property>
		<name>hbase.regionserver.kerberos.principal</name>
		<value>hbase/_HOST@REALM.CA</value>
	</property>
	<property>
		<name>hbase.regionserver.keytab.file</name>
		<value>/etc/security/keytabs/hbase.service.keytab</value>
	</property>
	<property>
		<name>hbase.master.kerberos.principal</name>
		<value>hbase/_HOST@REALM.CA</value>
	</property>
	<property>
		<name>hbase.master.keytab.file</name>
		<value>/etc/security/keytabs/hbase.service.keytab</value>
	</property>
	<property>
		<name>hbase.security.authentication.spnego.kerberos.principal</name>
		<value>hbaseHTTP/_HOST@REALM.CA</value>
	</property>
	<property>
		<name>hbase.security.authentication.spnego.kerberos.keytab</name>
		<value>/etc/security/keytabs/hbaseHTTP.service.keytab</value>
	</property>
	<property>
		<name>hbase.security.authentication</name>
		<value>kerberos</value>
	</property>
	<property>
		<name>hbase.security.authorization</name>
		<value>true</value>
	</property>
	<property>
		<name>hbase.coprocessor.region.classes</name>
		<value>org.apache.hadoop.hbase.security.token.TokenProvider</value>
	</property>
	<property>
		<name>hbase.rpc.protection</name>
		<value>integrity</value>
	</property>
	<property>
		<name>hbase.rpc.engine</name>
		<value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
	</property>
	<property>
		<name>hbase.coprocessor.master.classes</name>
		<value>org.apache.hadoop.hbase.security.access.AccessController</value>
	</property>
	<property>
		<name>hbase.coprocessor.region.classes</name>
		<value>org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController</value>
	</property>
	<property>
		<name>hbase.security.authentication.ui</name>
		<value>kerberos</value>
		<description>Controls what kind of authentication should be used for the HBase web UIs.</description>
	</property>
	<property>
		<name>hbase.master.port</name>
		<value>16000</value>
	</property>
	<property>
		<name>hbase.master.info.bindAddress</name>
		<value>0.0.0.0</value>
	</property>
	<property>
		<name>hbase.master.info.port</name>
		<value>16010</value>
	</property>
	<property>
		<name>hbase.regionserver.hostname</name>
		<value>hadoop</value>
	</property>
	<property>
		<name>hbase.regionserver.port</name>
		<value>16020</value>
	</property>
	<property>
		<name>hbase.regionserver.info.port</name>
		<value>16030</value>
	</property>
	<property>
		<name>hbase.regionserver.info.bindAddress</name>
		<value>0.0.0.0</value>
	</property>
	<property>
		<name>hbase.master.ipc.address</name>
		<value>0.0.0.0</value>
	</property>
	<property>
		<name>hbase.regionserver.ipc.address</name>
		<value>0.0.0.0</value>
	</property>
	<property>
		<name>hbase.ssl.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>hadoop.ssl.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>ssl.server.keystore.keypassword</name>
		<value>startrek</value>
	</property>
	<property>
		<name>ssl.server.keystore.password</name>
		<value>startrek</value>
	</property>
	<property>
		<name>ssl.server.keystore.location</name>
		<value>/etc/security/serverKeys/keystore.jks</value>
	</property>
	<property>
		<name>hbase.rest.ssl.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>hbase.rest.ssl.keystore.store</name>
		<value>/etc/security/serverKeys/keystore.jks</value>
	</property>
	<property>
		<name>hbase.rest.ssl.keystore.password</name>
		<value>startrek</value>
	</property>
	<property>
		<name>hbase.rest.ssl.keystore.keypassword</name>
		<value>startrek</value>
	</property>
	<property>
		<name>hbase.superuser</name>
		<value>hduser</value>
	</property>
	<property>
		<name>hbase.tmp.dir</name>
		<value>/tmp/hbase-${user.name}</value>
	</property>
	<property>
		<name>hbase.local.dir</name>
		<value>${hbase.tmp.dir}/local</value>
	</property>
	<property>
		<name>hbase.zookeeper.property.clientPort</name>
		<value>2181</value>
	</property>
	<property>
		<name>hbase.unsafe.stream.capability.enforce</name>
		<value>false</value>
	</property>
	<property>
		<name>hbase.zookeeper.quorum</name>
		<value>hadoop</value>
	</property>
	<property>
		<name>zookeeper.znode.parent</name>
		<value>/hbase-secure</value>
	</property>
	<property>
		<name>hbase.regionserver.dns.interface</name>
		<value>enp0s3</value>
	</property>
        <property>
                <name>hbase.rest.authentication.type</name>
                <value>kerberos</value>
        </property>
        <property>
                <name>hadoop.proxyuser.HTTP.groups</name>
                <value>*</value>
        </property>
        <property>
                <name>hadoop.proxyuser.HTTP.hosts</name>
                <value>*</value>
        </property>
        <property>
                <name>hbase.rest.authentication.kerberos.keytab</name>
                <value>/etc/security/keytabs/hbaseHTTP.service.keytab</value>
        </property>
        <property>
                <name>hbase.rest.authentication.kerberos.principal</name>
                <value>hbaseHTTP/_HOST@REALM.CA</value>
        </property>
        <property>
                <name>hbase.rest.kerberos.principal</name>
                <value>hbase/_HOST@REALM.CA</value>
        </property>
        <property>
                <name>hbase.rest.keytab.file</name>
                <value>/etc/security/keytabs/hbase.service.keytab</value>
        </property>
</configuration>

Change Ownership of HBase files

sudo chown hadoopuser:hadoopuser -R /usr/local/hbase/*

Hadoop HDFS Config Changes

You will need to add two properties into the core-site.xml file of Hadoop.

nano /usr/local/hadoop/etc/hadoop/core-site.xml

<property>
	<name>hadoop.proxyuser.hbase.hosts</name>
	<value>*</value>
</property>
<property>
	<name>hadoop.proxyuser.hbase.groups</name>
	<value>*</value>
</property>
<property>
	<name>hadoop.proxyuser.HTTP.hosts</name>
	<value>*</value>
</property>
<property>
	<name>hadoop.proxyuser.HTTP.groups</name>
	<value>*</value>
</property>

AutoStart

crontab -e

@reboot /usr/local/hbase/bin/hbase-daemon.sh --config /usr/local/hbase/conf/ start master
@reboot /usr/local/hbase/bin/hbase-daemon.sh --config /usr/local/hbase/conf/ start regionserver
@reboot /usr/local/hbase/bin/hbase-daemon.sh --config /usr/local/hbase/conf/ start rest --infoport 17001 -p 17000

Validation

kinit -kt /etc/security/keytabs/hbase.service.keytab hbase/hadoop@REALM.ca
hbase shell
status 'detailed'
whoami
kdestroy

References

https://hbase.apache.org/0.94/book/security.html
https://pivotalhd-210.docs.pivotal.io/doc/2100/webhelp/topics/ConfiguringSecureHBase.html
https://ambari.apache.org/1.2.5/installing-hadoop-using-ambari/content/ambari-kerb-2-3-2-1.html
https://hbase.apache.org/book.html#_using_secure_http_https_for_the_web_ui

Zookeeper Kerberos Installation

We are going to install Zookeeper. Ensure you install Kerberos.

This assumes your hostname is “hadoop”

Install Java JDK

apt-get update
apt-get upgrade
apt-get install default-jdk

Download Zookeeper:

wget http://apache.forsale.plus/zookeeper/zookeeper-3.4.13/zookeeper-3.4.13.tar.gz
tar -zxvf zookeeper-3.4.13.tar.gz
sudo mv zookeeper-3.4.13 /usr/local/zookeeper/
sudo chown -R root:hadoopuser /usr/local/zookeeper/

Setup .bashrc:

 sudo nano ~/.bashrc

Add the following to the end of the file.

#ZOOKEEPER VARIABLES START
export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
#ZOOKEEPER VARIABLES STOP

 source ~/.bashrc

Create Kerberos Principals

cd /etc/security/keytabs
sudo kadmin.local
addprinc -randkey zookeeper/hadoop@REALM.CA
xst -kt zookeeper.service.keytab zookeeper/hadoop@REALM.CA
q

Set Keytab Permissions/Ownership

sudo chown root:hadoopuser /etc/security/keytabs/*
sudo chmod 750 /etc/security/keytabs/*

zoo.cfg

cd /usr/local/zookeeper/conf/
cp zoo_sample.cfg zoo.cfg
nano zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/usr/local/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to “0” to disable auto purge feature
#autopurge.purgeInterval=1

server.1=hadoop:2888:3888

authProvider.1 = org.apache.zookeeper.server.auth.SASLAuthenticationProvider
kerberos.removeHostFromPrincipal = true
kerberos.removeRealmFromPrincipal = true
jaasLoginRenew=3600000

java.env

cd /usr/local/zookeeper/conf/
touch java.env
nano java.env

ZOO_LOG4J_PROP=”INFO,ROLLINGFILE”
ZOO_LOG_DIR=”/usr/local/zookeeper/logs”

zookeeper_client_jaas.conf

cd /usr/local/zookeeper/conf/
touch zookeeper_client_jaas.conf
nano zookeeper_client_jaas.conf

Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=false
useTicketCache=true;
};

zookeeper_jaas.conf

cd /usr/local/zookeeper/conf/
touch zookeeper_jaas.conf
nano zookeeper_jaas.conf

Server {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
useTicketCache=false
keyTab=”/etc/security/keytabs/zookeeper.service.keytab”
principal=”zookeeper/hadoop@REALM.CA”;
};

zkServer.sh

cd /usr/local/zookeeper/bin/
nano zkServer.sh

#Add the following at the top

export CLIENT_JVMFLAGS="-Djava.security.auth.login.config=/usr/local/zookeeper/conf/zookeeper_client_jaas.conf"
export SERVER_JVMFLAGS="-Xmx1024m -Djava.security.auth.login.config=/usr/local/zookeeper/conf/zookeeper_jaas.conf"

zkCli.sh

cd /usr/local/zookeeper/bin/
nano zkCli.sh

#Add the following at the top

export CLIENT_JVMFLAGS="-Djava.security.auth.login.config=/usr/local/zookeeper/conf/zookeeper_client_jaas.conf"
export SERVER_JVMFLAGS="-Xmx1024m -Djava.security.auth.login.config=/usr/local/zookeeper/conf/zookeeper_jaas.conf"

MkDir

mkdir /usr/local/zookeeper/data/
mkdir /usr/local/zookeeper/logs/

echo "1" > /usr/local/zookeeper/data/myid

sudo chown -R hduser:hduser /usr/local/zookeeper

Auto Start

crontab -e

#Add the following
@reboot /usr/local/zookeeper/bin/zkServer.sh start

Run Client

kinit -kt /etc/security/keytabs/zookeeper.service.keytab zookeeper/hadoop@REALM.CA
./zkCli.sh -server 127.0.0.1:2181

#Now you can list all directories
ls /

#Or delete directories

rmr /folder

References

https://my-bigdata-blog.blogspot.com/2017/07/apache-Zookeeper-install-Ubuntu.html
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_command-line-installation/content/zookeeper_configuration.html
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_command-line-installation/content/securing_zookeeper_with_kerberos.html

 

 

 

Kafka & Java: Secured Consumer Read Record

In this tutorial I will show you how to read a record to Kafka. Before you begin you will need Maven/Eclipse all setup and a project ready to go. If you haven’t installed Kafka Kerberos yet please do so.

Import SSL Cert to Java:

Follow this tutorial to “Installing unlimited strength encryption Java libraries

If on Windows do the following

#Import it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -import -file hadoop.csr -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts" -alias "hadoop"

#Check it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -list -v -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

#If you want to delete it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -delete -alias hadoop -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

POM.xml

<dependency>
	<groupId>org.apache.kafka</groupId>
	<artifactId>kafka-clients</artifactId>
	<version>1.1.0</version>
</dependency>

Imports

import org.apache.kafka.clients.consumer.*;
import java.util.Properties;
import java.io.InputStream;
import java.util.Arrays;

Consumer JAAS Conf (client_jaas.conf)

KafkaClient {
    com.sun.security.auth.module.Krb5LoginModule required
    useTicketCache=false
    refreshKrb5Config=true
    debug=true
    useKeyTab=true
    storeKey=true
    keyTab="c:\\data\\kafka.service.keytab"
    principal="kafka/hadoop@REALM.CA";
};

Consumer Props File

You can go here to view all the options for consumer properties.

bootstrap.servers=hadoop:9094
group.id=test

security.protocol=SASL_SSL
sasl.kerberos.service.name=kafka

#offset will be periodically committed in the background
enable.auto.commit=true

# The serializer for the key
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer

# The serializer for the value
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer

# heartbeat to detect worker failures
session.timeout.ms=10000

#Automatically reset offset to earliest offset
auto.offset.reset=earliest

Initiate Kerberos Authentication

System.setProperty("java.security.auth.login.config", "C:\\data\\kafkaconnect\\kafka\\src\\main\\resources\\client_jaas.conf");
System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
System.setProperty("java.security.krb5.conf", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\krb5.conf");
System.setProperty("java.security.krb5.realm", "REALM.CA");
System.setProperty("java.security.krb5.kdc", "REALM.CA");
System.setProperty("sun.security.krb5.debug", "false");
System.setProperty("javax.net.debug", "false");
System.setProperty("javax.net.ssl.keyStorePassword", "changeit");
System.setProperty("javax.net.ssl.keyStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
System.setProperty("javax.net.ssl.trustStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
System.setProperty("javax.net.ssl.trustStorePassword", "changeit");
System.setProperty("javax.security.auth.useSubjectCredsOnly", "true");

Consumer Connection/Send

The record we will read will just be a string for both key and value.

Consumer<String, String> consumer = null;

try {
	ClassLoader classLoader = getClass().getClassLoader();

	try (InputStream props = classLoader.getResourceAsStream("consumer.props")) {
		Properties properties = new Properties();
		properties.load(props);
		consumer = new KafkaConsumer<>(properties);
	}
	
	System.out.println("Consumer Created");

	// Subscribe to the topic.
	consumer.subscribe(Arrays.asList("testTopic"));

	while (true) {
		final ConsumerRecords<String, String> consumerRecords = consumer.poll(1000);
		
		if (consumerRecords.count() == 0) {
			//Keep reading till no records
			break;
		}

		consumerRecords.forEach(record -> {
			System.out.printf("Consumer Record:(%s, %s, %d, %d)\n", record.key(), record.value(), record.partition(), record.offset());
		});

		//Commit offsets returned on the last poll() for all the subscribed list of topics and partition
		consumer.commitAsync();
	}
} finally {
	consumer.close();
}
System.out.println("Consumer Closed");

References

I used kafka-sample-programs as a guide for setting up props.

Kafka: Kerberize/SSL

In this tutorial I will show you how to use Kerberos/SSL with NiFi. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server and Kafka.

If you don’t want to use the built in Zookeeper you can setup your own. To do that following this tutorial.

This assumes your hostname is “hadoop”

Create Kerberos Principals

cd /etc/security/keytabs/

sudo kadmin.local

#You can list princepals
listprincs

#Create the following principals
addprinc -randkey kafka/hadoop@REALM.CA
addprinc -randkey zookeeper/hadoop@REALM.CA

#Create the keytab files.
#You will need these for Hadoop to be able to login
xst -k kafka.service.keytab kafka/hadoop@REALM.CA
xst -k zookeeper.service.keytab zookeeper/hadoop@REALM.CA

Set Keytab Permissions/Ownership

sudo chown root:hadoopuser /etc/security/keytabs/*
sudo chmod 750 /etc/security/keytabs/*

Hosts Update

sudo nano /etc/hosts

#Remove 127.0.1.1 line

#Change 127.0.0.1 to the following
127.0.0.1 realm.ca hadoop localhost

Ubuntu Firewall

sudo ufw disable

SSL

Setup SSL Directories if you have not previously done so.

sudo mkdir -p /etc/security/serverKeys
sudo chown -R root:hadoopuser /etc/security/serverKeys/
sudo chmod 755 /etc/security/serverKeys/

cd /etc/security/serverKeys

Setup Keystore

sudo keytool -genkey -alias NAMENODE -keyalg RSA -keysize 1024 -dname "CN=NAMENODE,OU=ORGANIZATION_UNIT,C=canada" -keypass PASSWORD -keystore /etc/security/serverKeys/keystore.jks -storepass PASSWORD
sudo keytool -export -alias NAMENODE -keystore /etc/security/serverKeys/keystore.jks -rfc -file /etc/security/serverKeys/NAMENODE.csr -storepass PASSWORD

Setup Truststore

sudo keytool -import -noprompt -alias NAMENODE -file /etc/security/serverKeys/NAMENODE.csr -keystore /etc/security/serverKeys/truststore.jks -storepass PASSWORD

Generate Self Signed Certifcate

sudo openssl genrsa -out /etc/security/serverKeys/NAMENODE.key 2048

sudo openssl req -x509 -new -key /etc/security/serverKeys/NAMENODE.key -days 300 -out /etc/security/serverKeys/NAMENODE.pem

sudo keytool -keystore /etc/security/serverKeys/keystore.jks -alias NAMENODE -certreq -file /etc/security/serverKeys/NAMENODE.cert -storepass PASSWORD -keypass PASSWORD

sudo openssl x509 -req -CA /etc/security/serverKeys/NAMENODE.pem -CAkey /etc/security/serverKeys/NAMENODE.key -in /etc/security/serverKeys/NAMENODE.cert -out /etc/security/serverKeys/NAMENODE.signed -days 300 -CAcreateserial

Setup File Permissions

sudo chmod 440 /etc/security/serverKeys/*
sudo chown root:hadoopuser /etc/security/serverKeys/*

Edit server.properties Config

cd /usr/local/kafka/config

sudo nano server.properties

#Edit or Add the following properties.
ssl.endpoint.identification.algorithm=HTTPS
ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
ssl.key.password=PASSWORD
ssl.keystore.location=/etc/security/serverKeys/keystore.jks
ssl.keystore.password=PASSWORD
ssl.truststore.location=/etc/security/serverKeys/truststore.jks
ssl.truststore.password=PASSWORD
listeners=SASL_SSL://:9094
security.inter.broker.protocol=SASL_SSL
ssl.client.auth=required
authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer
ssl.keystore.type=JKS
ssl.truststore.type=JKS
sasl.kerberos.service.name=kafka
zookeeper.connect=hadoop:2181
sasl.mechanism.inter.broker.protocol=GSSAPI
sasl.enabled.mechanisms=GSSAPI

Edit zookeeper.properties Config

sudo nano zookeeper.properties

#Edit or Add the following properties.

server.1=hadoop:2888:3888
clientPort=2181
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
requireClientAuthScheme=SASL
jaasLoginRenew=3600000

Edit producer.properties Config

sudo nano producer.properties

bootstrap.servers=hadoop:9094
security.protocol=SASL_SSL
sasl.kerberos.service.name=kafka
ssl.truststore.location=/etc/security/serverKeys/truststore.jks
ssl.truststore.password=PASSWORD
ssl.keystore.location=/etc/security/serverKeys/keystore.jks
ssl.keystore.password=PASSWORD
ssl.key.password=PASSWORD
sasl.mechanism=GSSAPI

Edit consumer.properties Config

sudo nano consumer.properties

zookeeper.connect=hadoop:2181
bootstrap.servers=hadoop:9094
group.id=securing-kafka-group
security.protocol=SASL_SSL
sasl.kerberos.service.name=kafka
ssl.truststore.location=/etc/security/serverKeys/truststore.jks
ssl.truststore.password=PASSWORD
sasl.mechanism=GSSAPI

Add zookeeper_jass.conf Config

sudo nano zookeeper_jass.conf

Server {
  com.sun.security.auth.module.Krb5LoginModule required
  debug=true
  useKeyTab=true
  keyTab="/etc/security/keytabs/zookeeper.service.keytab"
  storeKey=true
  useTicketCache=true
  refreshKrb5Config=true
  principal="zookeeper/hadoop@REALM.CA";
};

Add kafkaserver_jass.conf Config

sudo nano kafkaserver_jass.conf

KafkaServer {
    com.sun.security.auth.module.Krb5LoginModule required
    debug=true
    useKeyTab=true
    storeKey=true
    refreshKrb5Config=true
    keyTab="/etc/security/keytabs/kafka.service.keytab"
    principal="kafka/hadoop@REALM.CA";
};

kafkaClient {
    com.sun.security.auth.module.Krb5LoginModule required
    useTicketCache=true
    refreshKrb5Config=true
    debug=true
    useKeyTab=true
    storeKey=true
    keyTab="/etc/security/keytabs/kafka.service.keytab"
    principal="kafka/hadoop@REALM.CA";
};

Edit kafka-server-start.sh

cd /usr/local/kafka/bin/

sudo nano kafka-server-start.sh

jaas="$base_dir/../config/kafkaserver_jaas.conf"

export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=$jaas"

Edit zookeeper-server-start.sh

sudo nano zookeeper-server-start.sh

jaas="$base_dir/../config/zookeeper_jaas.conf"

export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=$jaas"

Kafka-ACL

cd /usr/local/kafka/bin/

#Grant topic access and cluster access
./kafka-acls.sh  --operation All --allow-principal User:kafka --authorizer-properties zookeeper.connect=hadoop:2181 --add --cluster
./kafka-acls.sh  --operation All --allow-principal User:kafka --authorizer-properties zookeeper.connect=hadoop:2181 --add --topic TOPIC

#Grant all groups for a specific topic
./kafka-acls.sh --operation All --allow-principal User:kafka --authorizer-properties zookeeper.connect=hadoop:2181 --add --topic TOPIC --group *

#If you want to remove cluster access
./kafka-acls.sh --authorizer-properties zookeeper.connect=hadoop:2181 --remove --cluster

#If you want to remove topic access
./kafka-acls.sh --authorizer-properties zookeeper.connect=hadoop:2181 --remove --topic TOPIC

#List access for cluster
./kafka-acls.sh --list --authorizer-properties zookeeper.connect=hadoop:2181 --cluster

#List access for topic
./kafka-acls.sh --list --authorizer-properties zookeeper.connect=hadoop:2181 --topic TOPIC

kafka-console-producer.sh

If you want to test using the console producer you need to make these changes.

cd /usr/local/kafka/bin/
nano kafka-console-producer.sh

#Add the below before the last line

base_dir=$(dirname $0)
jaas="$base_dir/../config/kafkaserver_jaas.conf"
export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=$jaas"


#Now you can run the console producer
./kafka-console-producer.sh --broker-list hadoop:9094 --topic TOPIC -producer.config ../config/producer.properties

kafka-console-consumer.sh

If you want to test using the console consumer you need to make these changes.

cd /usr/local/kafka/bin/
nano kafka-console-consumer.sh

#Add the below before the last line

base_dir=$(dirname $0)
jaas="$base_dir/../config/kafkaserver_jaas.conf"
export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=$jaas"


#Now you can run the console consumer
./kafka-console-consumer.sh --bootstrap-server hadoop:9094 --topic TOPIC --consumer.config ../config/consumer.properties --from-beginning

References

https://www.confluent.io/blog/apache-kafka-security-authorization-authentication-encryption/
https://github.com/confluentinc/securing-kafka-blog/blob/master/manifests/default.pp

Hive & Java: Connect to Remote Kerberos Hive using KeyTab

In this tutorial I will show you how to connect to remote Kerberos Hive cluster using Java. If you haven’t install Hive yet follow the tutorial.

Import SSL Cert to Java:

Follow this tutorial to “Installing unlimited strength encryption Java libraries

If on Windows do the following

#Import it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -import -file hadoop.csr -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts" -alias "hadoop"

#Check it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -list -v -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

#If you want to delete it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -delete -alias hadoop -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

POM.xml:

<dependency>
	<groupId>org.apache.hive</groupId>
	<artifactId>hive-jdbc</artifactId>
	<version>2.3.3</version>
	<exclusions>
		<exclusion>
			<groupId>jdk.tools</groupId>
			<artifactId>jdk.tools</artifactId>
		</exclusion>
	</exclusions>
</dependency>

Imports:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.security.UserGroupInformation;
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

Connect:

// Setup the configuration object.
final Configuration config = new Configuration();

config.set("fs.defaultFS", "swebhdfs://hadoop:50470");
config.set("hadoop.security.authentication", "kerberos");
config.set("hadoop.rpc.protection", "integrity");

System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
System.setProperty("java.security.krb5.conf", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\krb5.conf");
System.setProperty("java.security.krb5.realm", "REALM.CA");
System.setProperty("java.security.krb5.kdc", "REALM.CA");
System.setProperty("sun.security.krb5.debug", "true");
System.setProperty("javax.net.debug", "all");
System.setProperty("javax.net.ssl.keyStorePassword","changeit");
System.setProperty("javax.net.ssl.keyStore","C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
System.setProperty("javax.net.ssl.trustStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
System.setProperty("javax.net.ssl.trustStorePassword","changeit");
System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");

UserGroupInformation.setConfiguration(config);
UserGroupInformation.setLoginUser(UserGroupInformation.loginUserFromKeytabAndReturnUGI("hive/hadoop@REALM.CA", "c:\\data\\hive.service.keytab"));

System.out.println(UserGroupInformation.getLoginUser());
System.out.println(UserGroupInformation.getCurrentUser());

//Add the hive driver
Class.forName("org.apache.hive.jdbc.HiveDriver");

//Connect to hive jdbc
Connection connection = DriverManager.getConnection("jdbc:hive2://hadoop:10000/default;principal=hive/hadoop@REALM.CA");
Statement statement = connection.createStatement();

//Create a table
String createTableSql = "CREATE TABLE IF NOT EXISTS "
		+" employee ( eid int, name String, "
		+" salary String, designation String)"
		+" COMMENT 'Employee details'"
		+" ROW FORMAT DELIMITED"
		+" FIELDS TERMINATED BY '\t'"
		+" LINES TERMINATED BY '\n'"
		+" STORED AS TEXTFILE";

System.out.println("Creating Table: " + createTableSql);
statement.executeUpdate(createTableSql);

//Show all the tables to ensure we successfully added the table
String showTablesSql = "show tables";
System.out.println("Show All Tables: " + showTablesSql);
ResultSet res = statement.executeQuery(showTablesSql);

while (res.next()) {
	System.out.println(res.getString(1));
}

//Drop the table
String dropTablesSql = "DROP TABLE IF EXISTS employee";

System.out.println("Dropping Table: " + dropTablesSql);
statement.executeUpdate(dropTablesSql);

System.out.println("Finish!");

NiFi: Kerberize/SSL

In this tutorial I will show you how to use Kerberos/SSL with NiFi. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server and NiFi.

This assumes your hostname is “hadoop”

Create Kerberos Principals

cd /etc/security/keytabs/

sudo kadmin.local

#You can list principals
listprincs

#Create the following principals
addprinc -randkey nifi/hadoop@REALM.CA
addprinc -randkey nifi-spnego/hadoop@REALM.CA
#Notice this user does not have -randkey because we are a login user
#Also notice that this user does not have a keytab created
addprinc admin/hadoop@REALM.CA


#Create the keytab files.
#You will need these for Hadoop to be able to login
xst -k nifi.service.keytab nifi/hadoop@REALM.CA
xst -k nifi-spnego.service.keytab nifi-spnego/hadoop@REALM.CA

Set Keytab Permissions/Ownership

sudo chown root:hadoopuser /etc/security/keytabs/*
sudo chmod 750 /etc/security/keytabs/*

Stop NiFi

sudo service nifi stop

Hosts Update

sudo nano /etc/hosts

#Remove 127.0.1.1 line

#Change 127.0.0.1 to the following
127.0.0.1 gaudreault_kdc.ca hadoop localhost

Ubuntu Firewall

sudo ufw disable

sysctl.conf

Disable ipv6 as it causes issues in getting your server up and running.

nano /etc/sysctl.conf

Add the following to the end and save

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
#Change eth0 to what ifconfig has
net.ipv6.conf.eth0.disable_ipv6 = 1

Close sysctl

sysctl -p
cat /proc/sys/net/ipv6/conf/all/disable_ipv6
reboot

TrustStore / KeyStore

#Creating your Certificate Authority
sudo mkdir -p /etc/security/serverKeys
sudo chown -R root:hduser /etc/security/serverKeys/
sudo chmod 750 /etc/security/serverKeys/
 
cd /etc/security/serverKeys

sudo openssl genrsa -aes128 -out nifi.key 4096
sudo openssl req -x509 -new -key nifi.key -days 1095 -out nifi.pem
sudo openssl rsa -check -in nifi.key #check it
sudo openssl x509 -outform der -in nifi.pem -out nifi.der
sudo keytool -import -keystore truststore.jks -file nifi.der -alias nifi
#***You must type 'yes' to trust this certificate.
sudo keytool -v -list -keystore truststore.jks

#Creating your Server Keystore
sudo keytool -genkey -alias nifi -keyalg RSA -keystore keystore.jks -keysize 2048
sudo keytool -certreq -alias nifi -keystore keystore.jks -file nifi.csr
sudo openssl x509 -sha256 -req -in nifi.csr -CA nifi.pem -CAkey nifi.key -CAcreateserial -out nifi.crt -days 730
sudo keytool -import -keystore keystore.jks -file nifi.pem
sudo keytool -import -trustcacerts -alias nifi -file nifi.crt -keystore keystore.jks

sudo chown -R root:hduser /etc/security/serverKeys/*
sudo chmod 750 /etc/security/serverKeys/*

nifi.properties

cd /usr/local/nifi/conf/
nano nifi.properties

#Find "# Site to Site properties" and change the following properties to what is below

nifi.remote.input.host=
nifi.remote.input.secure=true
nifi.remote.input.socket.port=9096
nifi.remote.input.http.enabled=false

#Find "# web properties #" and change the following properties to what is below

nifi.web.http.host=
nifi.web.http.port=
nifi.web.https.host=0.0.0.0
nifi.web.https.port=9095

#Find "# security properties #" and change the following properties to what is below

nifi.security.keystore=/etc/security/serverKeys/keystore.jks
nifi.security.keystoreType=JKS
nifi.security.keystorePasswd=PASSWORD
nifi.security.keyPasswd=PASSWORD
nifi.security.truststore=/etc/security/serverKeys/truststore.jks
nifi.security.truststoreType=JKS
nifi.security.truststorePasswd=PASSWORD
nifi.security.needClientAuth=true
nifi.security.user.authorizer=managed-authorizer
nifi.security.user.login.identity.provider=kerberos-provider

#Find "# Core Properties #" and change the following properties to what is below

nifi.authorizer.configuration.file=./conf/authorizers.xml
nifi.login.identity.provider.configuration.file=./conf/login-identity-providers.xml

#Find "# kerberos #" and change the following properties to what is below

nifi.kerberos.krb5.file=/etc/krb5.conf

#Find "# kerberos service principal #" and change the following properties to what is below

nifi.kerberos.service.principal=nifi/hadoop@REALM.CA
nifi.kerberos.service.keytab.location=/etc/security/keytabs/nifi.service.keytab

#Find "# kerberos spnego principal #" and change the following properties to what is below

nifi.kerberos.spnego.principal=nifi-spnego/hadoop@REALM.CA
nifi.kerberos.spnego.keytab.location=/etc/security/keytabs/nifi-spnego.service.keytab
nifi.kerberos.spnego.authentication.expiration=12 hours

#Find "# cluster common properties (all nodes must have same values) #" and change the following properties to what is below

nifi.cluster.protocol.is.secure=true

login-identity-providers.xml

nano login-identity-providers.xml

#Find "kerberos-provider"
<provider>
	<identifier>kerberos-provider</identifier>
	<class>org.apache.nifi.kerberos.KerberosProvider</class>
	<property name="Default Realm">REALM.CA</property>
	<property name="Kerberos Config File">/etc/krb5.conf</property>
	<property name="Authentication Expiration">12 hours</property>
</provider>

authorizers.xml

nano authorizers.xml

#Find "file-provider"
<authorizer>
	<identifier>file-provider</identifier>
	<class>org.apache.nifi.authorization.FileAuthorizer</class>
	<property name="Authorizations File">./conf/authorizations.xml</property>
	<property name="Users File">./conf/users.xml</property>
	<property name="Initial Admin Identity">admin/hadoop@REALM.CA</property>
	<property name="Legacy Authorized Users File"></property>

	<property name="Node Identity 1"></property>
</authorizer>

Start Nifi

sudo service nifi start

NiFi Web Login

Issues:

  • If you get the error “No applicable policies could be found” after logging in and no GUI is shown stop the NiFi service and restart. Then you should be good.
  • If you can then login but you don’t have any policies still you will need to update “authorizations.xml” and add the below lines. Making sure to change the resource process group id to the root process group id and the user id to the user id
nano /usr/local/nifi/conf/authorizations.xml

<policy identifier="1c897e9d-3dd5-34ca-ae3d-75fb5ee3e1a5" resource="/data/process-groups/##CHANGE TO ROOT ID##" action="R">
	<user identifier="##CHANGE TO USER ID##"/>
</policy>
<policy identifier="91c64c2d-7848-371d-9d5f-db71138b152f" resource="/data/process-groups/##CHANGE TO ROOT ID##" action="W">
	<user identifier="##CHANGE TO USER ID##"/>
</policy>
<policy identifier="7aeb4d67-e2e1-3a3e-a8fa-94576f35539e" resource="/process-groups/##CHANGE TO ROOT ID##" action="R">
	<user identifier="##CHANGE TO USER ID##"/>
</policy>
<policy identifier="f5b620e0-b094-3f70-9542-dd6920ad5bd9" resource="/process-groups/##CHANGE TO ROOT ID##" action="W">
	<user identifier="##CHANGE TO USER ID##"/>
</policy>

References

https://community.hortonworks.com/articles/34147/nifi-security-user-authentication-with-kerberos.html

https://community.hortonworks.com/content/supportkb/151106/nifi-how-to-create-your-own-certs-for-securing-nif.html

Hadoop & Java: Connect to Remote Kerberos HDFS using KeyTab

In this tutorial I will show you how to connect to remote Kerberos HDFS cluster using Java.  If you haven’t install hdfs with kerberos yet follow the tutorial.

Import SSL Cert to Java:

Follow this tutorial to “Installing unlimited strength encryption Java libraries

If on Windows do the following

#Import it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -import -file hadoop.csr -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts" -alias "hadoop"

#Check it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -list -v -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

#If you want to delete it
"C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -delete -alias hadoop -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

POM.xml:

<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-client</artifactId>
	<version>2.9.1</version>
</dependency>

Imports:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.security.UserGroupInformation;

Connect:

// Setup the configuration object.
final Configuration config = new Configuration();

config.set("fs.defaultFS", "swebhdfs://hadoop:50470");
config.set("hadoop.security.authentication", "kerberos");
config.set("hadoop.rpc.protection", "integrity");

System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
System.setProperty("java.security.krb5.conf", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\krb5.conf");
System.setProperty("java.security.krb5.realm", "REALM.CA");
System.setProperty("java.security.krb5.kdc", "REALM.CA");
System.setProperty("sun.security.krb5.debug", "true");
System.setProperty("javax.net.debug", "all");
System.setProperty("javax.net.ssl.keyStorePassword","YOURPASSWORD");
System.setProperty("javax.net.ssl.keyStore","C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
System.setProperty("javax.net.ssl.trustStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
System.setProperty("javax.net.ssl.trustStorePassword","YOURPASSWORD");
System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");

UserGroupInformation.setConfiguration(config);
UserGroupInformation.setLoginUser(UserGroupInformation.loginUserFromKeytabAndReturnUGI("myuser/hadoop@REALM.CA", "c:\\data\\myuser.keytab"));

System.out.println(UserGroupInformation.getLoginUser());
System.out.println(UserGroupInformation.getCurrentUser());

HDFS/Yarn/MapRed: Kerberize/SSL

In this tutorial I will show you how to use Kerberos/SSL with HDFS/Yarn/MapRed. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server and Hadoop.

This assumes your hostname is “hadoop”

Create Kerberos Principals

cd /etc/security/keytabs/

sudo kadmin.local

#You can list princepals
listprincs

#Create the following principals
addprinc -randkey nn/hadoop@REALM.CA
addprinc -randkey jn/hadoop@REALM.CA
addprinc -randkey dn/hadoop@REALM.CA
addprinc -randkey sn/hadoop@REALM.CA
addprinc -randkey nm/hadoop@REALM.CA
addprinc -randkey rm/hadoop@REALM.CA
addprinc -randkey jhs/hadoop@REALM.CA
addprinc -randkey HTTP/hadoop@REALM.CA

#We are going to create a user to access with later
addprinc -pw hadoop myuser/hadoop@REALM.CA
xst -k myuser.keytab myuser/hadoop@REALM.CA

#Create the keytab files.
#You will need these for Hadoop to be able to login
xst -k nn.service.keytab nn/hadoop@REALM.CA
xst -k jn.service.keytab jn/hadoop@REALM.CA
xst -k dn.service.keytab dn/hadoop@REALM.CA
xst -k sn.service.keytab sn/hadoop@REALM.CA
xst -k nm.service.keytab nm/hadoop@REALM.CA
xst -k rm.service.keytab rm/hadoop@REALM.CA
xst -k jhs.service.keytab jhs/hadoop@REALM.CA
xst -k spnego.service.keytab HTTP/hadoop@REALM.CA

Set Keytab Permissions/Ownership

sudo chown root:hadoopuser /etc/security/keytabs/*
sudo chmod 750 /etc/security/keytabs/*

Stop the Cluster

stop-dfs.sh
stop-yarn.sh
mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver

Hosts Update

sudo nano /etc/hosts

#Remove 127.0.1.1 line

#Change 127.0.0.1 to the following
#Notice how realm.ca is there its because we need to tell where that host resides
127.0.0.1 realm.ca hadoop localhost

hadoop-env.sh

We don’t set the HADOOP_SECURE_DN_USER because we are going to use Kerberos

sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

#Locate "export ${HADOOP_SECURE_DN_USER}=${HADOOP_SECURE_DN_USER}"
#and change to

export HADOOP_SECURE_DN_USER=

core-site.xml

nano /usr/local/hadoop/etc/hadoop/core-site.xml

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://NAMENODE:54310</value>
		<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming
		the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/app/hadoop/tmp</value>
	</property>
	<property>
		<name>hadoop.proxyuser.hadoopuser.hosts</name>
		<value>*</value>
	</property>
	<property>
		<name>hadoop.proxyuser.hadoopuser.groups</name>
		<value>*</value>
	</property>
	<property>
		<name>hadoop.security.authentication</name>
		<value>kerberos</value> <!-- A value of "simple" would disable security. -->
	</property>
	<property>
		<name>hadoop.security.authorization</name>
		<value>true</value>
	</property>
	<property>
		<name>hadoop.security.auth_to_local</name>
		<value>
		RULE:[2:$1@$0](nn/.*@.*REALM.TLD)s/.*/hdfs/
		RULE:[2:$1@$0](jn/.*@.*REALM.TLD)s/.*/hdfs/
		RULE:[2:$1@$0](dn/.*@.*REALM.TLD)s/.*/hdfs/
		RULE:[2:$1@$0](sn/.*@.*REALM.TLD)s/.*/hdfs/
		RULE:[2:$1@$0](nm/.*@.*REALM.TLD)s/.*/yarn/
		RULE:[2:$1@$0](rm/.*@.*REALM.TLD)s/.*/yarn/
		RULE:[2:$1@$0](jhs/.*@.*REALM.TLD)s/.*/mapred/
		DEFAULT
		</value>
	</property>
	<property>
		<name>hadoop.rpc.protection</name>
		<value>integrity</value>
	</property>
	<property>
		<name>hadoop.ssl.require.client.cert</name>
		<value>false</value>
	</property>
	<property>
		<name>hadoop.ssl.hostname.verifier</name>
		<value>DEFAULT</value>
	</property>
	<property>
		<name>hadoop.ssl.keystores.factory.class</name>
		<value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
	</property>
	<property>
		<name>hadoop.ssl.server.conf</name>
		<value>ssl-server.xml</value>
	</property>
	<property>
		<name>hadoop.ssl.client.conf</name>
		<value>ssl-client.xml</value>
	</property>
	<property>
		<name>hadoop.rpc.protection</name>
		<value>integrity</value>
	</property>
</configuration>

ssl-server.xml

Change ssl-server.xml.example to ssl-server.xml

cp /usr/local/hadoop/etc/hadoop/ssl-server.xml.example /usr/local/hadoop/etc/hadoop/ssl-server.xml

nano /usr/local/hadoop/etc/hadoop/ssl-server.xml

Update properties

<configuration>
	<property>
		<name>ssl.server.truststore.location</name>
		<value>/etc/security/serverKeys/truststore.jks</value>
		<description>Truststore to be used by NN and DN. Must be specified.</description>
	</property>
	<property>
		<name>ssl.server.truststore.password</name>
		<value>PASSWORD</value>
		<description>Optional. Default value is "".</description>
	</property>
	<property>
		<name>ssl.server.truststore.type</name>
		<value>jks</value>
		<description>Optional. The keystore file format, default value is "jks".</description>
	</property>
	<property>
		<name>ssl.server.truststore.reload.interval</name>
		<value>10000</value>
		<description>Truststore reload check interval, in milliseconds. Default value is 10000 (10 seconds).</description>
	</property>
	<property>
		<name>ssl.server.keystore.location</name>
		<value>/etc/security/serverKeys/keystore.jks</value>
		<description>Keystore to be used by NN and DN. Must be specified.</description>
	</property>
	<property>
		<name>ssl.server.keystore.password</name>
		<value>PASSWORD</value>
		<description>Must be specified.</description>
	</property>
	<property>
		<name>ssl.server.keystore.keypassword</name>
		<value>PASSWORD</value>
		<description>Must be specified.</description>
	</property>
	<property>
		<name>ssl.server.keystore.type</name>
		<value>jks</value>
		<description>Optional. The keystore file format, default value is "jks".</description>
	</property>
	<property>
		<name>ssl.server.exclude.cipher.list</name>
		<value>TLS_ECDHE_RSA_WITH_RC4_128_SHA,SSL_DHE_RSA_EXPORT_WITH_DES40_CBC_SHA,
		SSL_RSA_WITH_DES_CBC_SHA,SSL_DHE_RSA_WITH_DES_CBC_SHA,
		SSL_RSA_EXPORT_WITH_RC4_40_MD5,SSL_RSA_EXPORT_WITH_DES40_CBC_SHA,
		SSL_RSA_WITH_RC4_128_MD5</value>
		<description>Optional. The weak security cipher suites that you want excluded from SSL communication.</description>
	</property>
</configuration>

ssl-client.xml

Change ssl-client.xml.example to ssl-client.xml

cp /usr/local/hadoop/etc/hadoop/ssl-client.xml.example /usr/local/hadoop/etc/hadoop/ssl-client.xml

nano /usr/local/hadoop/etc/hadoop/ssl-client.xml

Update properties

<configuration>
	<property>
		<name>ssl.client.truststore.location</name>
		<value>/etc/security/serverKeys/truststore.jks</value>
		<description>Truststore to be used by clients like distcp. Must be specified.</description>
	</property>
	<property>
		<name>ssl.client.truststore.password</name>
		<value>PASSWORD</value>
		<description>Optional. Default value is "".</description>
	</property>
	<property>
		<name>ssl.client.truststore.type</name>
		<value>jks</value>
		<description>Optional. The keystore file format, default value is "jks".</description>
	</property>
	<property>
		<name>ssl.client.truststore.reload.interval</name>
		<value>10000</value>
		<description>Truststore reload check interval, in milliseconds. Default value is 10000 (10 seconds).</description>
	</property>
	<property>
		<name>ssl.client.keystore.location</name>
		<value></value>
		<description>Keystore to be used by clients like distcp. Must be specified.</description>
	</property>
	<property>
		<name>ssl.client.keystore.password</name>
		<value></value>
		<description>Optional. Default value is "".</description>
	</property>
	<property>
		<name>ssl.client.keystore.keypassword</name>
		<value></value>
		<description>Optional. Default value is "".</description>
	</property>
	<property>
		<name>ssl.client.keystore.type</name>
		<value>jks</value>
		<description>Optional. The keystore file format, default value is "jks".</description>
	</property>
</configuration>

mapred-site.xml

Just add the following to the config to let it know the Kerberos keytabs to use.

nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

<property>
	<name>mapreduce.jobhistory.keytab</name>
	<value>/etc/security/keytabs/jhs.service.keytab</value>
</property>
<property>
	<name>mapreduce.jobhistory.principal</name>
	<value>jhs/_HOST@REALM.CA</value>
</property>
<property>
	<name>mapreduce.jobhistory.http.policy</name>
	<value>HTTPS_ONLY</value>
</property>

hdfs-site.xml

Add the following properties

nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<property>
	<name>dfs.http.policy</name>
	<value>HTTPS_ONLY</value>
</property>
<property>
	<name>hadoop.ssl.enabled</name>
	<value>true</value>
</property>
<property>
	<name>dfs.datanode.https.address</name>
	<value>NAMENODE:50475</value>
</property>
<property>
	<name>dfs.namenode.https-address</name>
	<value>NAMENODE:50470</value>
	<description>Your NameNode hostname for http access.</description>
</property>
<property>
	<name>dfs.namenode.secondary.https-address</name>
	<value>NAMENODE:50091</value>
	<description>Your Secondary NameNode hostname for http access.</description>
</property>
<property>
	<name>dfs.namenode.https-bind-host</name>
	<value>0.0.0.0</value>
</property>
<property>
	<name>dfs.block.access.token.enable</name>
	<value>true</value>
	<description> If "true", access tokens are used as capabilities for accessing datanodes. If "false", no access tokens are checked on accessing datanod</description>
</property>
<property>
	<name>dfs.namenode.kerberos.principal</name>
	<value>nn/_HOST@REALM.CA</value>
	<description> Kerberos principal name for the NameNode</description>
</property>
<property>
	<name>dfs.secondary.namenode.kerberos.principal</name>
	<value>sn/_HOST@REALM.CA</value>
	<description>Kerberos principal name for the secondary NameNode.</description>
</property>
<property>
	<name>dfs.web.authentication.kerberos.keytab</name>
	<value>/etc/security/keytabs/spnego.service.keytab</value>
	<description>The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.</description>
</property>
<property>
	<name>dfs.namenode.keytab.file</name>
	<value>/etc/security/keytabs/nn.service.keytab</value>
	<description>Combined keytab file containing the namenode service and host principals.</description>
</property>
<property>
	<name>dfs.datanode.keytab.file</name>
	<value>/etc/security/keytabs/dn.service.keytab</value>
	<description>The filename of the keytab file for the DataNode.</description>
</property>
<property>
	<name>dfs.datanode.kerberos.principal</name>
	<value>dn/_HOST@REALM.CA</value>
	<description>The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real host name.</description>
</property>
<property>
	<name>dfs.namenode.kerberos.internal.spnego.principal</name>
	<value>${dfs.web.authentication.kerberos.principal}</value>
</property>
<property>
	<name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
	<value>>${dfs.web.authentication.kerberos.principal}</value>
</property>
<property>
	<name>dfs.web.authentication.kerberos.principal</name>
	<value>HTTP/_HOST@REALM.CA</value>
	<description>The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.</description>          
</property>
<property>
	<name>dfs.data.transfer.protection</name>
	<value>integrity</value>
</property>
<property>
	<name>dfs.datanode.address</name>
	<value>NAMENODE:50010</value>
</property>
<property>
	<name>dfs.secondary.namenode.keytab.file</name>
	<value>/etc/security/keytabs/sn.service.keytab</value>
</property>
<property>
	<name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
	<value>HTTP/_HOST@REALM.CA</value>
</property>
<property>
	<name>dfs.webhdfs.enabled</name>
	<value>true</value>
</property>

Remove the following properties

dfs.namenode.http-address
dfs.namenode.secondary.http-address
dfs.namenode.http-bind-host

yarn-site.xml

Add the following properties

nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

<property>
	<name>yarn.http.policy</name>
	<value>HTTPS_ONLY</value>
</property>
<property>
	<name>yarn.resourcemanager.webapp.https.address</name>
	<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>NAMENODE</value>
</property>
<property>
	<name>yarn.nodemanager.bind-host</name>
	<value>0.0.0.0</value>
</property>
<property>
	<name>yarn.nodemanager.webapp.address</name>
	<value>${yarn.nodemanager.hostname}:8042</value>
</property>
<property>
	<name>yarn.resourcemanager.principal</name>
	<value>rm/_HOST@REALM.CA</value>
</property>
<property>
	<name>yarn.resourcemanager.keytab</name>
	<value>/etc/security/keytabs/rm.service.keytab</value>
</property>
<property>
	<name>yarn.nodemanager.principal</name>
	<value>nm/_HOST@REALM.CA</value>
</property>
<property>
	<name>yarn.nodemanager.keytab</name>
	<value>/etc/security/keytabs/nm.service.keytab</value>
</property>
<property>
	<name>yarn.nodemanager.hostname</name>
	<value>NAMENODE</value>
</property>
<property>
	<name>yarn.resourcemanager.bind-host</name>
	<value>0.0.0.0</value>
</property>
<property>
	<name>yarn.timeline-service.bind-host</name>
	<value>0.0.0.0</value>
</property>

Remove the following properties

yarn.resourcemanager.webapp.address

SSL

Setup SSL Directories

sudo mkdir -p /etc/security/serverKeys
sudo chown -R root:hadoopuser /etc/security/serverKeys/
sudo chmod 755 /etc/security/serverKeys/

cd /etc/security/serverKeys

Setup Keystore

sudo keytool -genkey -alias NAMENODE -keyalg RSA -keysize 1024 -dname "CN=NAMENODE,OU=ORGANIZATION_UNIT,C=canada" -keypass PASSWORD -keystore /etc/security/serverKeys/keystore.jks -storepass PASSWORD
sudo keytool -export -alias NAMENODE -keystore /etc/security/serverKeys/keystore.jks -rfc -file /etc/security/serverKeys/NAMENODE.csr -storepass PASSWORD

Setup Truststore

sudo keytool -import -noprompt -alias NAMENODE -file /etc/security/serverKeys/NAMENODE.csr -keystore /etc/security/serverKeys/truststore.jks -storepass PASSWORD

Generate Self Signed Certifcate

sudo openssl genrsa -out /etc/security/serverKeys/NAMENODE.key 2048

sudo openssl req -x509 -new -key /etc/security/serverKeys/NAMENODE.key -days 300 -out /etc/security/serverKeys/NAMENODE.pem

sudo keytool -keystore /etc/security/serverKeys/keystore.jks -alias NAMENODE -certreq -file /etc/security/serverKeys/NAMENODE.cert -storepass PASSWORD -keypass PASSWORD

sudo openssl x509 -req -CA /etc/security/serverKeys/NAMENODE.pem -CAkey /etc/security/serverKeys/NAMENODE.key -in /etc/security/serverKeys/NAMENODE.cert -out /etc/security/serverKeys/NAMENODE.signed -days 300 -CAcreateserial

Setup File Permissions

sudo chmod 440 /etc/security/serverKeys/*
sudo chown root:hadoopuser /etc/security/serverKeys/*

Start the Cluster

start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver

Create User Directory

kinit -kt /etc/security/keytabs/myuser.keytab myuser/hadoop@REALM.CA
#ensure the login worked
klist

#Create hdfs directory now
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/myuser

#remove kerberos ticket
kdestroy

URL

https://NAMENODE:50470
https://NAMENODE:50475
https://NAMENODE:8090

References

https://www.ibm.com/support/knowledgecenter/en/SSPT3X_4.2.0/com.ibm.swg.im.infosphere.biginsights.admin.doc/doc/admin_ssl_hbase_mr_yarn_hdfs_web.html

Kerberos Server Installation

In this tutorial I will show you how to install Kerberos server on Ubuntu 16.04.

sudo apt install krb5-kdc krb5-admin-server krb5-config -y

Enter your realm. I will use REALM.CA

Enter your servers. I will use localhost

Enter your administrative server. I will use localhost

Now you can click Ok and installation will continue.

Next we can create our new realm

sudo krb5_newrealm

Enter your password then confirm it.

Now we can edit our kadm5.acl to have admin. Uncomment “*/admin *”

sudo nano /etc/krb5kdc/kadm5.acl

Now we make our keytabs directory and grant the necessary permissions.

sudo mkdir -p /etc/security/keytabs/
sudo chown root:hduser /etc/security/keytabs
sudo chmod 750 /etc/security/keytabs

Now we edit our krb5.conf file

sudo nano /etc/krb5.conf

Ensure it looks like the below

[libdefaults]
        default_realm = REALM.CA


[realms]
        REALM.CA = {
                kdc = localhost
                admin_server = localhost
        }


[domain_realm]
        .realm.ca = REALM.CA
        realm.ca = REALM.CA

Now we can restart the kerberos services

sudo service krb5-kdc restart; service krb5-admin-server restart

Once you create a principal if when you attempt to use kadmin you get the error “GSS-API (or Kerberos) error while initializing kadmin interface”. Then do the following.

sudo RUNLEVEL=1 apt-get install rng-tools
cat /dev/random | rngtest -c 1000
sudo apt-get install haveged
cat /proc/sys/kernel/random/entropy_avail
cat /dev/random | rngtest -c 1000
haveged -n 2g -f - | dd of=/dev/null

Uninstallation

sudo apt remove --purge krb5-kdc krb5-admin-server krb5-config -y
sudo rm -rf /var/lib/krb5kdc

References
I used the following references as a guide.

http://blog.ruanbekker.com/blog/2017/10/18/setup-kerberos-server-and-client-on-ubuntu/ 
http://csetutorials.com/setup-kerberos-ubuntu.html  

HortonWorks: Kerberize Ambari Server

This entry is part 7 of 7 in the series HortonWorks

You may want to integrate Kerberos authentication into your Ambari Server implementation. If you do follow the next few steps. It’s that easy.

Step 1: Stop Ambari Server

sudo ambari-server stop

Step 2: Create keytab file

ktutil
 
addent -password -p ##USER##@##DOMAIN##.COM -k 1 -e RC4-HMAC
 
# Enter password
 
wkt ##USER##.keytab
q
 
$ sudo mkdir /etc/security/keytabs
$ mv ##USER##.keytab /etc/security/keytabs

Step 3: Test Keytab. You should see the ticket once you klist.

kinit -kt /etc/security/keytabs/ambarisa.keytab -a ambarisa@AERYON.COM
klist

Step 4: Run Ambari Server Kerberos Setup

sudo ambari-server setup-kerberos

Follow the prompts. Say true to enabling kerberos. The keytab file will be the /etc/security/##USER##.keytab file. You should be able to leave the rest defaults. Save the settings and you are done.

Step 5: Remove the kinit ticket you created that way you can make sure you kerberos authentication is working correctly.

kdestroy

Step 6: Start Ambari Server

sudo ambari-server start

Step 7: Validate Kerberos. You should see your ticket get created and you should now be able to login with no issues.

klist