Spark Installation on Hadoop

In this tutorial I will show you how to use Kerberos/SSL with Spark integrated with Yarn. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server and Hadoop.

This assumes your hostname is “hadoop”

Create Kerberos Principals

  1. cd /etc/security/keytabs/
  2.  
  3. sudo kadmin.local
  4.  
  5. #You can list princepals
  6. listprincs
  7.  
  8. #Create the following principals
  9. addprinc -randkey spark/hadoop@REALM.CA
  10.  
  11. #Create the keytab files.
  12. #You will need these for Hadoop to be able to login
  13. xst -k spark.service.keytab spark/hadoop@REALM.CA

Set Keytab Permissions/Ownership

  1. sudo chown root:hadoopuser /etc/security/keytabs/*
  2. sudo chmod 750 /etc/security/keytabs/*

Download

Go to Apache Spark Download and get the link for Spark.

  1. wget http://apache.forsale.plus/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
  2. tar -xvf spark-2.4.4-bin-hadoop2.7.tgz
  3. mv spark-2.4.4-bin-hadoop2.7 /usr/local/spark/

Update .bashrc

  1. sudo nano ~/.bashrc
  2.  
  3. #Ensure we have the following in the Hadoop section
  4. export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
  5.  
  6. #Add the following
  7.  
  8. #SPARK VARIABLES START
  9. export SPARK_HOME=/usr/local/spark
  10. export PATH=$PATH:$SPARK_HOME/bin
  11. export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$LD_LIBRARY_PATH
  12. #SPARK VARIABLES STOP
  13.  
  14. source ~/.bashrc

Setup Configuration

  1. cd /usr/local/spark/conf
  2. mv spark-defaults.conf.template spark-defaults.conf
  3. nano spark-defaults.conf
  4.  
  5. #Add to the end
  6. spark.master yarn
  7. spark.yarn.historyServer.address ${hadoopconf-yarn.resourcemanager.hostname}:18080
  8. spark.yarn.keytab /etc/security/keytabs/spark.service.keytab
  9. spark.yarn.principal spark/hadoop@REALM.CA
  10. spark.yarn.access.hadoopFileSystems hdfs://NAMENODE:54310
  11. spark.authenticate true
  12. spark.driver.bindAddress 0.0.0.0
  13. spark.authenticate.enableSaslEncryption true
  14. spark.eventLog.enabled true
  15. spark.eventLog.dir hdfs://NAMENODE:54310/user/spark/applicationHistory
  16. spark.history.fs.logDirectory hdfs://NAMENODE:54310/user/spark/applicationHistory
  17. spark.history.fs.update.interval 10s
  18. spark.history.ui.port 18080
  19.  
  20. #SSL
  21. spark.ssl.enabled true
  22. spark.ssl.keyPassword PASSWORD
  23. spark.ssl.keyStore /etc/security/serverKeys/keystore.jks
  24. spark.ssl.keyStorePassword PASSWORD
  25. spark.ssl.keyStoreType JKS
  26. spark.ssl.trustStore /etc/security/serverKeys/truststore.jks
  27. spark.ssl.trustStorePassword PASSWORD
  28. spark.ssl.trustStoreType JKS

Kinit

  1. kinit -kt /etc/security/keytabs/spark.service.keytab spark/hadoop@REALM.CA
  2. klist
  3. hdfs dfs -mkdir /user/spark/
  4. hdfs dfs -mkdir /user/spark/applicationHistory
  5. hdfs dfs -ls /user/spark

Start The Service

  1. $SPARK_HOME/sbin/start-history-server.sh

Stop The Service

  1. $SPARK_HOME/sbin/stop-history-server.sh

Spark History Server Web UI

References

I used a lot of different resources and reference material on this. Below are just a few I used.

https://spark.apache.org/docs/latest/running-on-yarn.html#configuration

https://spark.apache.org/docs/latest/security.html

https://www.linode.com/docs/databases/hadoop/install-configure-run-spark-on-top-of-hadoop-yarn-cluster/

 

 

 

 

Sqoop2: Kerberize Installation

In this tutorial I will show you how to kerberize Sqoop installation. Before you begin ensure you have installed Sqoop.

This assumes your hostname is “hadoop”

Create Kerberos Principals

  1. cd /etc/security/keytabs
  2. sudo kadmin.local
  3. addprinc -randkey sqoop/hadoop@REALM.CA
  4. xst -kt sqoop.service.keytab sqoop/hadoop@REALM.CA
  5. addprinc -randkey sqoopHTTP/hadoop@REALM.CA
  6. xst -kt sqoopHTTP.service.keytab sqoopHTTP/hadoop@REALM.CA
  7. q

Set Keytab Permissions/Ownership

  1. sudo chown root:hadoopuser /etc/security/keytabs/*
  2. sudo chmod 750 /etc/security/keytabs/*

Configuration

Configure Kerberos with Sqoop

  1. cd /usr/local/sqoop/conf/
  2. nano sqoop.properties
  3.  
  4. #uncomment the following
  5. org.apache.sqoop.security.authentication.type=KERBEROS
  6. org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.KerberosAuthenticationHandler
  7.  
  8. #update to the following
  9. org.apache.sqoop.security.authentication.kerberos.principal=sqoop/hadoop@GAUDREAULT_KDC.CA
  10. org.apache.sqoop.security.authentication.kerberos.keytab=/etc/security/keytabs/sqoop.service.keytab

 

 

 

 

 

 

 

 

 

 

Kerberos: Commands

In this tutorial I will give you a few useful commands when using Kerberos. If you haven’t installed Kerberos yet go here. I will keep this updated as time goes on. Also note that the commands below have a variety of options. Please go check.

Admin

This will open Kerberos V5 administration system.

  1. kadmin.local
Add Principal

This will add a new principal. -randkey is optional. When specified the encrypted key will be chosen at random instead of derived from a password. Be sure to change USER to whatever your user is.

  1. addprinc -randkey USER/_HOST@REALM.CA
Create KeyTab

This will create a keytab in the directory where you generated it. You should put it in /etc/security/keytabs/ folder. You can also specify the full path (IE: /etc/security/keytabs/USER.keytab). Be sure to change USER to whatever your user is.

  1. xst -k USER.keytab USER/_HOST@REALM.CA
Kinit

When using the -kt uses the keytab to grant a ticket

  1. kinit -kt /etc/security/keytabs/USER.keytab USER/_HOST@REALM.CA
Klist

If you want to see what tickets have been granted. You can issue the below command.

  1. klist
Inline Commands

You can do inline Kerberos commands without first opening kadmin.local. To do so you must specify the “-q” option then in quotes the command to issue. See below.

  1. kadmin.local -q "addprinc -randkey USER/_HOST@REALM.CA"

 

 

 

 

 

 

 

 

Hadoop & Java: Connect to Remote Kerberos HDFS using KeyTab

In this tutorial I will show you how to connect to remote Kerberos HDFS cluster using Java.  If you haven’t install hdfs with kerberos yet follow the tutorial.

Import SSL Cert to Java:

Follow this tutorial to “Installing unlimited strength encryption Java libraries

If on Windows do the following

  1. #Import it
  2. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -import -file hadoop.csr -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts" -alias "hadoop"
  3.  
  4. #Check it
  5. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -list -v -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"
  6.  
  7. #If you want to delete it
  8. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -delete -alias hadoop -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

POM.xml:

  1. <dependency>
  2. <groupId>org.apache.hadoop</groupId>
  3. <artifactId>hadoop-client</artifactId>
  4. <version>2.9.1</version>
  5. </dependency>

Imports:

  1. import org.apache.hadoop.conf.Configuration;
  2. import org.apache.hadoop.fs.FileStatus;
  3. import org.apache.hadoop.fs.FileSystem;
  4. import org.apache.hadoop.fs.Path;
  5. import org.apache.hadoop.security.UserGroupInformation;

Connect:

  1. // Setup the configuration object.
  2. final Configuration config = new Configuration();
  3.  
  4. config.set("fs.defaultFS", "swebhdfs://hadoop:50470");
  5. config.set("hadoop.security.authentication", "kerberos");
  6. config.set("hadoop.rpc.protection", "integrity");
  7.  
  8. System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
  9. System.setProperty("java.security.krb5.conf", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\krb5.conf");
  10. System.setProperty("java.security.krb5.realm", "REALM.CA");
  11. System.setProperty("java.security.krb5.kdc", "REALM.CA");
  12. System.setProperty("sun.security.krb5.debug", "true");
  13. System.setProperty("javax.net.debug", "all");
  14. System.setProperty("javax.net.ssl.keyStorePassword","YOURPASSWORD");
  15. System.setProperty("javax.net.ssl.keyStore","C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
  16. System.setProperty("javax.net.ssl.trustStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
  17. System.setProperty("javax.net.ssl.trustStorePassword","YOURPASSWORD");
  18. System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
  19.  
  20. UserGroupInformation.setConfiguration(config);
  21. UserGroupInformation.setLoginUser(UserGroupInformation.loginUserFromKeytabAndReturnUGI("myuser/hadoop@REALM.CA", "c:\\data\\myuser.keytab"));
  22.  
  23. System.out.println(UserGroupInformation.getLoginUser());
  24. System.out.println(UserGroupInformation.getCurrentUser());

HDFS/Yarn/MapRed: Kerberize/SSL

In this tutorial I will show you how to use Kerberos/SSL with HDFS/Yarn/MapRed. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server and Hadoop.

This assumes your hostname is “hadoop”

Create Kerberos Principals

  1. cd /etc/security/keytabs/
  2.  
  3. sudo kadmin.local
  4.  
  5. #You can list princepals
  6. listprincs
  7.  
  8. #Create the following principals
  9. addprinc -randkey nn/hadoop@REALM.CA
  10. addprinc -randkey jn/hadoop@REALM.CA
  11. addprinc -randkey dn/hadoop@REALM.CA
  12. addprinc -randkey sn/hadoop@REALM.CA
  13. addprinc -randkey nm/hadoop@REALM.CA
  14. addprinc -randkey rm/hadoop@REALM.CA
  15. addprinc -randkey jhs/hadoop@REALM.CA
  16. addprinc -randkey HTTP/hadoop@REALM.CA
  17.  
  18. #We are going to create a user to access with later
  19. addprinc -pw hadoop myuser/hadoop@REALM.CA
  20. xst -k myuser.keytab myuser/hadoop@REALM.CA
  21.  
  22. #Create the keytab files.
  23. #You will need these for Hadoop to be able to login
  24. xst -k nn.service.keytab nn/hadoop@REALM.CA
  25. xst -k jn.service.keytab jn/hadoop@REALM.CA
  26. xst -k dn.service.keytab dn/hadoop@REALM.CA
  27. xst -k sn.service.keytab sn/hadoop@REALM.CA
  28. xst -k nm.service.keytab nm/hadoop@REALM.CA
  29. xst -k rm.service.keytab rm/hadoop@REALM.CA
  30. xst -k jhs.service.keytab jhs/hadoop@REALM.CA
  31. xst -k spnego.service.keytab HTTP/hadoop@REALM.CA

Set Keytab Permissions/Ownership

  1. sudo chown root:hadoopuser /etc/security/keytabs/*
  2. sudo chmod 750 /etc/security/keytabs/*

Stop the Cluster

  1. stop-dfs.sh
  2. stop-yarn.sh
  3. mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver

Hosts Update

  1. sudo nano /etc/hosts
  2.  
  3. #Remove 127.0.1.1 line
  4.  
  5. #Change 127.0.0.1 to the following
  6. #Notice how realm.ca is there its because we need to tell where that host resides
  7. 127.0.0.1 realm.ca hadoop localhost

hadoop-env.sh

We don’t set the HADOOP_SECURE_DN_USER because we are going to use Kerberos

  1. sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
  2.  
  3. #Locate "export ${HADOOP_SECURE_DN_USER}=${HADOOP_SECURE_DN_USER}"
  4. #and change to
  5.  
  6. export HADOOP_SECURE_DN_USER=

core-site.xml

  1. nano /usr/local/hadoop/etc/hadoop/core-site.xml
  2.  
  3. <configuration>
  4. <property>
  5. <name>fs.defaultFS</name>
  6. <value>hdfs://NAMENODE:54310</value>
  7. <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming
  8. the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
  9. </property>
  10. <property>
  11. <name>hadoop.tmp.dir</name>
  12. <value>/app/hadoop/tmp</value>
  13. </property>
  14. <property>
  15. <name>hadoop.proxyuser.hadoopuser.hosts</name>
  16. <value>*</value>
  17. </property>
  18. <property>
  19. <name>hadoop.proxyuser.hadoopuser.groups</name>
  20. <value>*</value>
  21. </property>
  22. <property>
  23. <name>hadoop.security.authentication</name>
  24. <value>kerberos</value> <!-- A value of "simple" would disable security. -->
  25. </property>
  26. <property>
  27. <name>hadoop.security.authorization</name>
  28. <value>true</value>
  29. </property>
  30. <property>
  31. <name>hadoop.security.auth_to_local</name>
  32. <value>
  33. RULE:[2:$1@$0](nn/.*@.*REALM.TLD)s/.*/hdfs/
  34. RULE:[2:$1@$0](jn/.*@.*REALM.TLD)s/.*/hdfs/
  35. RULE:[2:$1@$0](dn/.*@.*REALM.TLD)s/.*/hdfs/
  36. RULE:[2:$1@$0](sn/.*@.*REALM.TLD)s/.*/hdfs/
  37. RULE:[2:$1@$0](nm/.*@.*REALM.TLD)s/.*/yarn/
  38. RULE:[2:$1@$0](rm/.*@.*REALM.TLD)s/.*/yarn/
  39. RULE:[2:$1@$0](jhs/.*@.*REALM.TLD)s/.*/mapred/
  40. DEFAULT
  41. </value>
  42. </property>
  43. <property>
  44. <name>hadoop.rpc.protection</name>
  45. <value>integrity</value>
  46. </property>
  47. <property>
  48. <name>hadoop.ssl.require.client.cert</name>
  49. <value>false</value>
  50. </property>
  51. <property>
  52. <name>hadoop.ssl.hostname.verifier</name>
  53. <value>DEFAULT</value>
  54. </property>
  55. <property>
  56. <name>hadoop.ssl.keystores.factory.class</name>
  57. <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
  58. </property>
  59. <property>
  60. <name>hadoop.ssl.server.conf</name>
  61. <value>ssl-server.xml</value>
  62. </property>
  63. <property>
  64. <name>hadoop.ssl.client.conf</name>
  65. <value>ssl-client.xml</value>
  66. </property>
  67. <property>
  68. <name>hadoop.rpc.protection</name>
  69. <value>integrity</value>
  70. </property>
  71. </configuration>

ssl-server.xml

Change ssl-server.xml.example to ssl-server.xml

  1. cp /usr/local/hadoop/etc/hadoop/ssl-server.xml.example /usr/local/hadoop/etc/hadoop/ssl-server.xml
  2.  
  3. nano /usr/local/hadoop/etc/hadoop/ssl-server.xml

Update properties

  1. <configuration>
  2. <property>
  3. <name>ssl.server.truststore.location</name>
  4. <value>/etc/security/serverKeys/truststore.jks</value>
  5. <description>Truststore to be used by NN and DN. Must be specified.</description>
  6. </property>
  7. <property>
  8. <name>ssl.server.truststore.password</name>
  9. <value>PASSWORD</value>
  10. <description>Optional. Default value is "".</description>
  11. </property>
  12. <property>
  13. <name>ssl.server.truststore.type</name>
  14. <value>jks</value>
  15. <description>Optional. The keystore file format, default value is "jks".</description>
  16. </property>
  17. <property>
  18. <name>ssl.server.truststore.reload.interval</name>
  19. <value>10000</value>
  20. <description>Truststore reload check interval, in milliseconds. Default value is 10000 (10 seconds).</description>
  21. </property>
  22. <property>
  23. <name>ssl.server.keystore.location</name>
  24. <value>/etc/security/serverKeys/keystore.jks</value>
  25. <description>Keystore to be used by NN and DN. Must be specified.</description>
  26. </property>
  27. <property>
  28. <name>ssl.server.keystore.password</name>
  29. <value>PASSWORD</value>
  30. <description>Must be specified.</description>
  31. </property>
  32. <property>
  33. <name>ssl.server.keystore.keypassword</name>
  34. <value>PASSWORD</value>
  35. <description>Must be specified.</description>
  36. </property>
  37. <property>
  38. <name>ssl.server.keystore.type</name>
  39. <value>jks</value>
  40. <description>Optional. The keystore file format, default value is "jks".</description>
  41. </property>
  42. <property>
  43. <name>ssl.server.exclude.cipher.list</name>
  44. <value>TLS_ECDHE_RSA_WITH_RC4_128_SHA,SSL_DHE_RSA_EXPORT_WITH_DES40_CBC_SHA,
  45. SSL_RSA_WITH_DES_CBC_SHA,SSL_DHE_RSA_WITH_DES_CBC_SHA,
  46. SSL_RSA_EXPORT_WITH_RC4_40_MD5,SSL_RSA_EXPORT_WITH_DES40_CBC_SHA,
  47. SSL_RSA_WITH_RC4_128_MD5</value>
  48. <description>Optional. The weak security cipher suites that you want excluded from SSL communication.</description>
  49. </property>
  50. </configuration>

ssl-client.xml

Change ssl-client.xml.example to ssl-client.xml

  1. cp /usr/local/hadoop/etc/hadoop/ssl-client.xml.example /usr/local/hadoop/etc/hadoop/ssl-client.xml
  2.  
  3. nano /usr/local/hadoop/etc/hadoop/ssl-client.xml

Update properties

  1. <configuration>
  2. <property>
  3. <name>ssl.client.truststore.location</name>
  4. <value>/etc/security/serverKeys/truststore.jks</value>
  5. <description>Truststore to be used by clients like distcp. Must be specified.</description>
  6. </property>
  7. <property>
  8. <name>ssl.client.truststore.password</name>
  9. <value>PASSWORD</value>
  10. <description>Optional. Default value is "".</description>
  11. </property>
  12. <property>
  13. <name>ssl.client.truststore.type</name>
  14. <value>jks</value>
  15. <description>Optional. The keystore file format, default value is "jks".</description>
  16. </property>
  17. <property>
  18. <name>ssl.client.truststore.reload.interval</name>
  19. <value>10000</value>
  20. <description>Truststore reload check interval, in milliseconds. Default value is 10000 (10 seconds).</description>
  21. </property>
  22. <property>
  23. <name>ssl.client.keystore.location</name>
  24. <value></value>
  25. <description>Keystore to be used by clients like distcp. Must be specified.</description>
  26. </property>
  27. <property>
  28. <name>ssl.client.keystore.password</name>
  29. <value></value>
  30. <description>Optional. Default value is "".</description>
  31. </property>
  32. <property>
  33. <name>ssl.client.keystore.keypassword</name>
  34. <value></value>
  35. <description>Optional. Default value is "".</description>
  36. </property>
  37. <property>
  38. <name>ssl.client.keystore.type</name>
  39. <value>jks</value>
  40. <description>Optional. The keystore file format, default value is "jks".</description>
  41. </property>
  42. </configuration>

mapred-site.xml

Just add the following to the config to let it know the Kerberos keytabs to use.

  1. nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
  2.  
  3. <property>
  4. <name>mapreduce.jobhistory.keytab</name>
  5. <value>/etc/security/keytabs/jhs.service.keytab</value>
  6. </property>
  7. <property>
  8. <name>mapreduce.jobhistory.principal</name>
  9. <value>jhs/_HOST@REALM.CA</value>
  10. </property>
  11. <property>
  12. <name>mapreduce.jobhistory.http.policy</name>
  13. <value>HTTPS_ONLY</value>
  14. </property>

hdfs-site.xml

Add the following properties

  1. nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
  2.  
  3. <property>
  4. <name>dfs.http.policy</name>
  5. <value>HTTPS_ONLY</value>
  6. </property>
  7. <property>
  8. <name>hadoop.ssl.enabled</name>
  9. <value>true</value>
  10. </property>
  11. <property>
  12. <name>dfs.datanode.https.address</name>
  13. <value>NAMENODE:50475</value>
  14. </property>
  15. <property>
  16. <name>dfs.namenode.https-address</name>
  17. <value>NAMENODE:50470</value>
  18. <description>Your NameNode hostname for http access.</description>
  19. </property>
  20. <property>
  21. <name>dfs.namenode.secondary.https-address</name>
  22. <value>NAMENODE:50091</value>
  23. <description>Your Secondary NameNode hostname for http access.</description>
  24. </property>
  25. <property>
  26. <name>dfs.namenode.https-bind-host</name>
  27. <value>0.0.0.0</value>
  28. </property>
  29. <property>
  30. <name>dfs.block.access.token.enable</name>
  31. <value>true</value>
  32. <description> If "true", access tokens are used as capabilities for accessing datanodes. If "false", no access tokens are checked on accessing datanod</description>
  33. </property>
  34. <property>
  35. <name>dfs.namenode.kerberos.principal</name>
  36. <value>nn/_HOST@REALM.CA</value>
  37. <description> Kerberos principal name for the NameNode</description>
  38. </property>
  39. <property>
  40. <name>dfs.secondary.namenode.kerberos.principal</name>
  41. <value>sn/_HOST@REALM.CA</value>
  42. <description>Kerberos principal name for the secondary NameNode.</description>
  43. </property>
  44. <property>
  45. <name>dfs.web.authentication.kerberos.keytab</name>
  46. <value>/etc/security/keytabs/spnego.service.keytab</value>
  47. <description>The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.</description>
  48. </property>
  49. <property>
  50. <name>dfs.namenode.keytab.file</name>
  51. <value>/etc/security/keytabs/nn.service.keytab</value>
  52. <description>Combined keytab file containing the namenode service and host principals.</description>
  53. </property>
  54. <property>
  55. <name>dfs.datanode.keytab.file</name>
  56. <value>/etc/security/keytabs/dn.service.keytab</value>
  57. <description>The filename of the keytab file for the DataNode.</description>
  58. </property>
  59. <property>
  60. <name>dfs.datanode.kerberos.principal</name>
  61. <value>dn/_HOST@REALM.CA</value>
  62. <description>The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real host name.</description>
  63. </property>
  64. <property>
  65. <name>dfs.namenode.kerberos.internal.spnego.principal</name>
  66. <value>${dfs.web.authentication.kerberos.principal}</value>
  67. </property>
  68. <property>
  69. <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
  70. <value>>${dfs.web.authentication.kerberos.principal}</value>
  71. </property>
  72. <property>
  73. <name>dfs.web.authentication.kerberos.principal</name>
  74. <value>HTTP/_HOST@REALM.CA</value>
  75. <description>The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.</description>
  76. </property>
  77. <property>
  78. <name>dfs.data.transfer.protection</name>
  79. <value>integrity</value>
  80. </property>
  81. <property>
  82. <name>dfs.datanode.address</name>
  83. <value>NAMENODE:50010</value>
  84. </property>
  85. <property>
  86. <name>dfs.secondary.namenode.keytab.file</name>
  87. <value>/etc/security/keytabs/sn.service.keytab</value>
  88. </property>
  89. <property>
  90. <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
  91. <value>HTTP/_HOST@REALM.CA</value>
  92. </property>
  93. <property>
  94. <name>dfs.webhdfs.enabled</name>
  95. <value>true</value>
  96. </property>

Remove the following properties

  1. dfs.namenode.http-address
  2. dfs.namenode.secondary.http-address
  3. dfs.namenode.http-bind-host

yarn-site.xml

Add the following properties

  1. nano /usr/local/hadoop/etc/hadoop/yarn-site.xml
  2.  
  3. <property>
  4. <name>yarn.http.policy</name>
  5. <value>HTTPS_ONLY</value>
  6. </property>
  7. <property>
  8. <name>yarn.resourcemanager.webapp.https.address</name>
  9. <value>${yarn.resourcemanager.hostname}:8090</value>
  10. </property>
  11. <property>
  12. <name>yarn.resourcemanager.hostname</name>
  13. <value>NAMENODE</value>
  14. </property>
  15. <property>
  16. <name>yarn.nodemanager.bind-host</name>
  17. <value>0.0.0.0</value>
  18. </property>
  19. <property>
  20. <name>yarn.nodemanager.webapp.address</name>
  21. <value>${yarn.nodemanager.hostname}:8042</value>
  22. </property>
  23. <property>
  24. <name>yarn.resourcemanager.principal</name>
  25. <value>rm/_HOST@REALM.CA</value>
  26. </property>
  27. <property>
  28. <name>yarn.resourcemanager.keytab</name>
  29. <value>/etc/security/keytabs/rm.service.keytab</value>
  30. </property>
  31. <property>
  32. <name>yarn.nodemanager.principal</name>
  33. <value>nm/_HOST@REALM.CA</value>
  34. </property>
  35. <property>
  36. <name>yarn.nodemanager.keytab</name>
  37. <value>/etc/security/keytabs/nm.service.keytab</value>
  38. </property>
  39. <property>
  40. <name>yarn.nodemanager.hostname</name>
  41. <value>NAMENODE</value>
  42. </property>
  43. <property>
  44. <name>yarn.resourcemanager.bind-host</name>
  45. <value>0.0.0.0</value>
  46. </property>
  47. <property>
  48. <name>yarn.timeline-service.bind-host</name>
  49. <value>0.0.0.0</value>
  50. </property>

Remove the following properties

  1. yarn.resourcemanager.webapp.address

SSL

Setup SSL Directories

  1. sudo mkdir -p /etc/security/serverKeys
  2. sudo chown -R root:hadoopuser /etc/security/serverKeys/
  3. sudo chmod 755 /etc/security/serverKeys/
  4.  
  5. cd /etc/security/serverKeys

Setup Keystore

  1. sudo keytool -genkey -alias NAMENODE -keyalg RSA -keysize 1024 -dname "CN=NAMENODE,OU=ORGANIZATION_UNIT,C=canada" -keypass PASSWORD -keystore /etc/security/serverKeys/keystore.jks -storepass PASSWORD
  2. sudo keytool -export -alias NAMENODE -keystore /etc/security/serverKeys/keystore.jks -rfc -file /etc/security/serverKeys/NAMENODE.csr -storepass PASSWORD

Setup Truststore

  1. sudo keytool -import -noprompt -alias NAMENODE -file /etc/security/serverKeys/NAMENODE.csr -keystore /etc/security/serverKeys/truststore.jks -storepass PASSWORD

Generate Self Signed Certifcate

  1. sudo openssl genrsa -out /etc/security/serverKeys/NAMENODE.key 2048
  2.  
  3. sudo openssl req -x509 -new -key /etc/security/serverKeys/NAMENODE.key -days 300 -out /etc/security/serverKeys/NAMENODE.pem
  4.  
  5. sudo keytool -keystore /etc/security/serverKeys/keystore.jks -alias NAMENODE -certreq -file /etc/security/serverKeys/NAMENODE.cert -storepass PASSWORD -keypass PASSWORD
  6.  
  7. sudo openssl x509 -req -CA /etc/security/serverKeys/NAMENODE.pem -CAkey /etc/security/serverKeys/NAMENODE.key -in /etc/security/serverKeys/NAMENODE.cert -out /etc/security/serverKeys/NAMENODE.signed -days 300 -CAcreateserial

Setup File Permissions

  1. sudo chmod 440 /etc/security/serverKeys/*
  2. sudo chown root:hadoopuser /etc/security/serverKeys/*

Start the Cluster

  1. start-dfs.sh
  2. start-yarn.sh
  3. mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver

Create User Directory

  1. kinit -kt /etc/security/keytabs/myuser.keytab myuser/hadoop@REALM.CA
  2. #ensure the login worked
  3. klist
  4.  
  5. #Create hdfs directory now
  6. hdfs dfs -mkdir /user
  7. hdfs dfs -mkdir /user/myuser
  8.  
  9. #remove kerberos ticket
  10. kdestroy

URL

https://NAMENODE:50470
https://NAMENODE:50475
https://NAMENODE:8090

References

https://www.ibm.com/support/knowledgecenter/en/SSPT3X_4.2.0/com.ibm.swg.im.infosphere.biginsights.admin.doc/doc/admin_ssl_hbase_mr_yarn_hdfs_web.html

Kerberos Server Installation

In this tutorial I will show you how to install Kerberos server on Ubuntu 16.04.

  1. sudo apt install krb5-kdc krb5-admin-server krb5-config -y

Enter your realm. I will use REALM.CA

Enter your servers. I will use localhost

Enter your administrative server. I will use localhost

Now you can click Ok and installation will continue.

Next we can create our new realm

  1. sudo krb5_newrealm

Enter your password then confirm it.

Now we can edit our kadm5.acl to have admin. Uncomment “*/admin *”

  1. sudo nano /etc/krb5kdc/kadm5.acl

Now we make our keytabs directory and grant the necessary permissions.

  1. sudo mkdir -p /etc/security/keytabs/
  2. sudo chown root:hduser /etc/security/keytabs
  3. sudo chmod 750 /etc/security/keytabs

Now we edit our krb5.conf file

  1. sudo nano /etc/krb5.conf

Ensure it looks like the below

  1. [libdefaults]
  2. default_realm = REALM.CA
  3.  
  4.  
  5. [realms]
  6. REALM.CA = {
  7. kdc = localhost
  8. admin_server = localhost
  9. }
  10.  
  11.  
  12. [domain_realm]
  13. .realm.ca = REALM.CA
  14. realm.ca = REALM.CA

Now we can restart the kerberos services

  1. sudo service krb5-kdc restart; service krb5-admin-server restart

Once you create a principal if when you attempt to use kadmin you get the error “GSS-API (or Kerberos) error while initializing kadmin interface”. Then do the following.

  1. sudo RUNLEVEL=1 apt-get install rng-tools
  2. cat /dev/random | rngtest -c 1000
  3. sudo apt-get install haveged
  4. cat /proc/sys/kernel/random/entropy_avail
  5. cat /dev/random | rngtest -c 1000
  6. haveged -n 2g -f - | dd of=/dev/null

Uninstallation

  1. sudo apt remove --purge krb5-kdc krb5-admin-server krb5-config -y
  2. sudo rm -rf /var/lib/krb5kdc

References
I used the following references as a guide.

http://blog.ruanbekker.com/blog/2017/10/18/setup-kerberos-server-and-client-on-ubuntu/ 
http://csetutorials.com/setup-kerberos-ubuntu.html