In this tutorial I will show you how to use Kerberos/SSL with Spark integrated with Yarn. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server and Hadoop.
This assumes your hostname is “hadoop”
Create Kerberos Principals
cd /etc/security/keytabs/ sudo kadmin.local #You can list princepals listprincs #Create the following principals addprinc -randkey spark/hadoop@REALM.CA #Create the keytab files. #You will need these for Hadoop to be able to login xst -k spark.service.keytab spark/hadoop@REALM.CA
Set Keytab Permissions/Ownership
sudo chown root:hadoopuser /etc/security/keytabs/* sudo chmod 750 /etc/security/keytabs/*
Download
Go to Apache Spark Download and get the link for Spark.
wget http://apache.forsale.plus/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz tar -xvf spark-2.4.4-bin-hadoop2.7.tgz mv spark-2.4.4-bin-hadoop2.7 /usr/local/spark/
Update .bashrc
sudo nano ~/.bashrc #Ensure we have the following in the Hadoop section export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop #Add the following #SPARK VARIABLES START export SPARK_HOME=/usr/local/spark export PATH=$PATH:$SPARK_HOME/bin export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$LD_LIBRARY_PATH #SPARK VARIABLES STOP source ~/.bashrc
Setup Configuration
cd /usr/local/spark/conf mv spark-defaults.conf.template spark-defaults.conf nano spark-defaults.conf #Add to the end spark.master yarn spark.yarn.historyServer.address ${hadoopconf-yarn.resourcemanager.hostname}:18080 spark.yarn.keytab /etc/security/keytabs/spark.service.keytab spark.yarn.principal spark/hadoop@REALM.CA spark.yarn.access.hadoopFileSystems hdfs://NAMENODE:54310 spark.authenticate true spark.driver.bindAddress 0.0.0.0 spark.authenticate.enableSaslEncryption true spark.eventLog.enabled true spark.eventLog.dir hdfs://NAMENODE:54310/user/spark/applicationHistory spark.history.fs.logDirectory hdfs://NAMENODE:54310/user/spark/applicationHistory spark.history.fs.update.interval 10s spark.history.ui.port 18080 #SSL spark.ssl.enabled true spark.ssl.keyPassword PASSWORD spark.ssl.keyStore /etc/security/serverKeys/keystore.jks spark.ssl.keyStorePassword PASSWORD spark.ssl.keyStoreType JKS spark.ssl.trustStore /etc/security/serverKeys/truststore.jks spark.ssl.trustStorePassword PASSWORD spark.ssl.trustStoreType JKS
Kinit
kinit -kt /etc/security/keytabs/spark.service.keytab spark/hadoop@REALM.CA klist hdfs dfs -mkdir /user/spark/ hdfs dfs -mkdir /user/spark/applicationHistory hdfs dfs -ls /user/spark
Start The Service
$SPARK_HOME/sbin/start-history-server.sh
Stop The Service
$SPARK_HOME/sbin/stop-history-server.sh
References
I used a lot of different resources and reference material on this. Below are just a few I used.
https://spark.apache.org/docs/latest/running-on-yarn.html#configuration
https://spark.apache.org/docs/latest/security.html