In this tutorial I will show you how to use Kerberos/SSL with Spark integrated with Yarn. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server and Hadoop.
This assumes your hostname is “hadoop”
Create Kerberos Principals
cd /etc/security/keytabs/ sudo kadmin.local #You can list princepals listprincs #Create the following principals addprinc -randkey spark/hadoop@REALM.CA #Create the keytab files. #You will need these for Hadoop to be able to login xst -k spark.service.keytab spark/hadoop@REALM.CA
Set Keytab Permissions/Ownership
sudo chown root:hadoopuser /etc/security/keytabs/* sudo chmod 750 /etc/security/keytabs/*
Go to Apache Spark Download and get the link for Spark.
wget tar -xvf spark-2.4.4-bin-hadoop2.7.tgz mv spark-2.4.4-bin-hadoop2.7 /usr/local/spark/
Update .bashrc
sudo nano ~/.bashrc #Ensure we have the following in the Hadoop section export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop #Add the following #SPARK VARIABLES START export SPARK_HOME=/usr/local/spark export PATH=$PATH:$SPARK_HOME/bin export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$LD_LIBRARY_PATH #SPARK VARIABLES STOP source ~/.bashrc
Setup Configuration
cd /usr/local/spark/conf mv spark-defaults.conf.template spark-defaults.conf nano spark-defaults.conf #Add to the end spark.master yarn spark.yarn.historyServer.address ${hadoopconf-yarn.resourcemanager.hostname}:18080 spark.yarn.keytab /etc/security/keytabs/spark.service.keytab spark.yarn.principal spark/hadoop@REALM.CA spark.yarn.access.hadoopFileSystems hdfs://NAMENODE:54310 spark.authenticate true spark.driver.bindAddress spark.authenticate.enableSaslEncryption true spark.eventLog.enabled true spark.eventLog.dir hdfs://NAMENODE:54310/user/spark/applicationHistory spark.history.fs.logDirectory hdfs://NAMENODE:54310/user/spark/applicationHistory spark.history.fs.update.interval 10s spark.history.ui.port 18080 #SSL spark.ssl.enabled true spark.ssl.keyPassword PASSWORD spark.ssl.keyStore /etc/security/serverKeys/keystore.jks spark.ssl.keyStorePassword PASSWORD spark.ssl.keyStoreType JKS spark.ssl.trustStore /etc/security/serverKeys/truststore.jks spark.ssl.trustStorePassword PASSWORD spark.ssl.trustStoreType JKS
kinit -kt /etc/security/keytabs/spark.service.keytab spark/hadoop@REALM.CA klist hdfs dfs -mkdir /user/spark/ hdfs dfs -mkdir /user/spark/applicationHistory hdfs dfs -ls /user/spark
Start The Service
Stop The Service
I used a lot of different resources and reference material on this. Below are just a few I used.