In this tutorial I will show you how to use Kerberos/SSL with Spark integrated with Yarn. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server and Hadoop.
This assumes your hostname is “hadoop”
Create Kerberos Principals
- cd /etc/security/keytabs/
- sudo kadmin.local
- #You can list princepals
- listprincs
- #Create the following principals
- addprinc -randkey spark/hadoop@REALM.CA
- #Create the keytab files.
- #You will need these for Hadoop to be able to login
- xst -k spark.service.keytab spark/hadoop@REALM.CA
Set Keytab Permissions/Ownership
- sudo chown root:hadoopuser /etc/security/keytabs/*
- sudo chmod 750 /etc/security/keytabs/*
Download
Go to Apache Spark Download and get the link for Spark.
- wget http://apache.forsale.plus/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
- tar -xvf spark-2.4.4-bin-hadoop2.7.tgz
- mv spark-2.4.4-bin-hadoop2.7 /usr/local/spark/
Update .bashrc
- sudo nano ~/.bashrc
- #Ensure we have the following in the Hadoop section
- export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
- #Add the following
- #SPARK VARIABLES START
- export SPARK_HOME=/usr/local/spark
- export PATH=$PATH:$SPARK_HOME/bin
- export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$LD_LIBRARY_PATH
- #SPARK VARIABLES STOP
- source ~/.bashrc
Setup Configuration
- cd /usr/local/spark/conf
- mv spark-defaults.conf.template spark-defaults.conf
- nano spark-defaults.conf
- #Add to the end
- spark.master yarn
- spark.yarn.historyServer.address ${hadoopconf-yarn.resourcemanager.hostname}:18080
- spark.yarn.keytab /etc/security/keytabs/spark.service.keytab
- spark.yarn.principal spark/hadoop@REALM.CA
- spark.yarn.access.hadoopFileSystems hdfs://NAMENODE:54310
- spark.authenticate true
- spark.driver.bindAddress 0.0.0.0
- spark.authenticate.enableSaslEncryption true
- spark.eventLog.enabled true
- spark.eventLog.dir hdfs://NAMENODE:54310/user/spark/applicationHistory
- spark.history.fs.logDirectory hdfs://NAMENODE:54310/user/spark/applicationHistory
- spark.history.fs.update.interval 10s
- spark.history.ui.port 18080
- #SSL
- spark.ssl.enabled true
- spark.ssl.keyPassword PASSWORD
- spark.ssl.keyStore /etc/security/serverKeys/keystore.jks
- spark.ssl.keyStorePassword PASSWORD
- spark.ssl.keyStoreType JKS
- spark.ssl.trustStore /etc/security/serverKeys/truststore.jks
- spark.ssl.trustStorePassword PASSWORD
- spark.ssl.trustStoreType JKS
Kinit
- kinit -kt /etc/security/keytabs/spark.service.keytab spark/hadoop@REALM.CA
- klist
- hdfs dfs -mkdir /user/spark/
- hdfs dfs -mkdir /user/spark/applicationHistory
- hdfs dfs -ls /user/spark
Start The Service
- $SPARK_HOME/sbin/start-history-server.sh
Stop The Service
- $SPARK_HOME/sbin/stop-history-server.sh
References
I used a lot of different resources and reference material on this. Below are just a few I used.
https://spark.apache.org/docs/latest/running-on-yarn.html#configuration
https://spark.apache.org/docs/latest/security.html