Phoenix: Kerberize Installation

In this tutorial I will show you how to use Kerberos with Phoenix. Before you begin ensure you have installed Kerberos Server, Hadoop, HBase and Zookeeper.

This assumes your hostname is “hadoop”

Install Phoenix

  1. wget http://apache.forsale.plus/phoenix/apache-phoenix-5.0.0-HBase-2.0/bin/apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz
  2. tar -zxvf apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz
  3. sudo mv apache-phoenix-5.0.0-HBase-2.0-bin /usr/local/phoenix/
  4. cd /usr/local/phoenix/

Setup .bashrc:

  1. sudo nano ~/.bashrc

Add the following to the end of the file.

#PHOENIX VARIABLES START
export PHOENIX_HOME=/usr/local/phoenix
export PHOENIX_CLASSPATH=$PHOENIX_HOME/*
export PATH=$PATH:$PHOENIX_HOME/bin
#PHOENIX VARIABLES END

  1. source ~/.bashrc

Link Files

  1. ln -sf $HBASE_CONF_DIR/hbase-site.xml $PHOENIX_HOME/bin/hbase-site.xml
  2. ln -sf $HADOOP_CONF_DIR/core-site.xml $PHOENIX_HOME/bin/core-site.xml
  3. ln -sf $PHOENIX_HOME/phoenix-5.0.0-HBase-2.0-server.jar $HBASE_HOME/lib/phoenix-5.0.0-HBase-2.0-server.jar

hbase-env.sh

  1. nano /usr/local/hbase/conf/hbase-env.sh
  2.  
  3. #Ensure the following env variables are set
  4.  
  5. export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/local/hadoop/etc/hadoop}
  6. export PHOENIX_CLASSPATH=${PHOENIX_CLASSPATH:-/usr/local/phoenix}
  7. export HBASE_CLASSPATH="$HBASE_CLASSPATH:$CLASSPATH:$HADOOP_CONF_DIR:$PHOENIX_CLASSPATH/phoenix-5.0.0-HBase-2.0-server.jar:$PHOENIX_CLASSPATH/phoenix-core-5.0.0-HBase-2.0.jar:$PHOENIX_CLASSPATH/phoenix-5.0.0-HBase-2.0-client.jar"

hbase-site.xml

  1. nano /usr/local/hbase/conf/hbase-site.xml
  2.  
  3. #Add the following properties
  4.  
  5. <property>
  6. <name>phoenix.functions.allowUserDefinedFunctions</name>
  7. <value>true</value>
  8. <description>enable UDF functions</description>
  9. </property>
  10. <property>
  11. <name>hbase.regionserver.wal.codec</name>
  12. <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
  13. </property>
  14. <property>
  15. <name>hbase.region.server.rpc.scheduler.factory.class</name>
  16. <value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value>
  17. <description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description>
  18. </property>
  19. <property>
  20. <name>hbase.rpc.controllerfactory.class</name>
  21. <value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value>
  22. <description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description>
  23. </property>
  24. <property>
  25. <name>hbase.defaults.for.version.skip</name>
  26. <value>true</value>
  27. </property>
  28. <property>
  29. <name>phoenix.queryserver.http.port</name>
  30. <value>8765</value>
  31. </property>
  32. <property>
  33. <name>phoenix.queryserver.serialization</name>
  34. <value>PROTOBUF</value>
  35. </property>
  36. <property>
  37. <name>phoenix.queryserver.keytab.file</name>
  38. <value>/etc/security/keytabs/hbase.service.keytab</value>
  39. </property>
  40. <property>
  41. <name>phoenix.queryserver.kerberos.principal</name>
  42. <value>hbase/hadoop@REALM.CA</value>
  43. </property>
  44. <property>
  45. <name>hoenix.queryserver.http.keytab.file</name>
  46. <value>/etc/security/keytabs/hbaseHTTP.service.keytab</value>
  47. </property>
  48. <property>
  49. <name>phoenix.queryserver.http.kerberos.principal</name>
  50. <value>hbaseHTTP/hadoop@REALM.CA</value>
  51. </property>
  52. <property>
  53. <name>phoenix.queryserver.dns.nameserver</name>
  54. <value>hadoop</value>
  55. </property>
  56. <property>
  57. <name>phoenix.queryserver.dns.interface</name>
  58. <value>enp0s3</value>
  59. </property>
  60. <property>
  61. <name>phoenix.schema.mapSystemTablesToNamespace</name>
  62. <value>true</value>
  63. </property>
  64. <property>
  65. <name>phoenix.schema.isNamespaceMappingEnabled</name>
  66. <value>true</value>
  67. </property>

sqlline.py

  1. sqlline.py hadoop:2181:/hbase-secure:hbase/hadoop@GAUDREAULT_KDC.CA:/etc/security/keytabs/hbase.service.keytab

 

HBASE & Java: Connecting Secure

In this tutorial I will show you how to connect to an Secure HBASE using Java. It’s rather straight forward.

Import SSL Cert to Java:

Follow this tutorial to “Installing unlimited strength encryption Java libraries

If on Windows do the following

  1. #Import it
  2. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -import -file hadoop.csr -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts" -alias "hadoop"
  3.  
  4. #Check it
  5. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -list -v -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"
  6.  
  7. #If you want to delete it
  8. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -delete -alias hadoop -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

POM.xml

  1. <dependency>
  2. <groupId>org.apache.hbase</groupId>
  3. <artifactId>hbase-client</artifactId>
  4. <version>2.1.0</version>
  5. </dependency>
  6. <dependency>
  7. <groupId>org.apache.hbase</groupId>
  8. <artifactId>hbase</artifactId>
  9. <version>2.1.0</version>
  10. <type>pom</type>
  11. </dependency>

Imports:

  1. import org.apache.hadoop.conf.Configuration;
  2. import org.apache.hadoop.hbase.HBaseConfiguration;
  3. import org.apache.hadoop.hbase.client.Admin;
  4. import org.apache.hadoop.hbase.client.Connection;
  5. import org.apache.hadoop.hbase.client.ConnectionFactory;
  6. import org.apache.hadoop.security.UserGroupInformation;

Initiate Kerberos Authentication

  1. System.setProperty("java.security.auth.login.config", "C:\\data\\kafkaconnect\\kafka\\src\\main\\resources\\client_jaas.conf");
  2. System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
  3. System.setProperty("java.security.krb5.conf", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\krb5.conf");
  4. System.setProperty("java.security.krb5.realm", "REALM.CA");
  5. System.setProperty("java.security.krb5.kdc", "REALM.CA");
  6. System.setProperty("sun.security.krb5.debug", "false");
  7. System.setProperty("javax.net.debug", "false");
  8. System.setProperty("javax.net.ssl.keyStorePassword", "changeit");
  9. System.setProperty("javax.net.ssl.keyStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
  10. System.setProperty("javax.net.ssl.trustStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
  11. System.setProperty("javax.net.ssl.trustStorePassword", "changeit");
  12. System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");

Config:

We will use the basic configuration here. You should secure the cluster and use appropriate settings for that.

  1. // Setup the configuration object.
  2. final Configuration config = HBaseConfiguration.create();
  3. config.set("hbase.zookeeper.quorum", "hadoop");
  4. config.set("hbase.zookeeper.property.clientPort", "2181");
  5. config.set("hadoop.security.authentication", "kerberos");
  6. config.set("hbase.security.authentication", "kerberos");
  7. config.set("hbase.cluster.distributed", "true");
  8. config.set("hbase.rpc.protection", "integrity");
  9. config.set("zookeeper.znode.parent", "/hbase-secure");
  10. config.set("hbase.master.kerberos.principal", "hbase/hadoop@REALM.CA");
  11. config.set("hbase.regionserver.kerberos.principal", "hbase/hadoop@REALM.CA");

Connect:

Now we create the connection.

  1. UserGroupInformation.setConfiguration(config);
  2. UserGroupInformation.setLoginUser(UserGroupInformation.loginUserFromKeytabAndReturnUGI("hbase/hadoop@REALM.CA", "c:\\data\\hbase.service.keytab"));
  3.  
  4. System.out.println(UserGroupInformation.getLoginUser());
  5. System.out.println(UserGroupInformation.getCurrentUser());
  6.  
  7. Connection conn = ConnectionFactory.createConnection(config);
  8.  
  9. //Later when we are done we will want to close the connection.
  10. conn.close();

Hbase Admin:

Retrieve an Admin implementation to administer an HBase cluster. If you need it.

  1. Admin admin = conn.getAdmin();
  2. //Later when we are done we will want to close the connection.
  3. admin.close();

HBase: Kerberize/SSL Installation

In this tutorial I will show you how to use Kerberos/SSL with HBase. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server, Hadoop and Zookeeper.

This assumes your hostname is “hadoop”

We will install a Master, RegionServer and Rest Client

Create Kerberos Principals

  1. cd /etc/security/keytabs/
  2.  
  3. sudo kadmin.local
  4.  
  5. #You can list princepals
  6. listprincs
  7.  
  8. #Create the following principals
  9. addprinc -randkey hbase/hadoop@REALM.CA
  10. addprinc -randkey hbaseHTTP/hadoop@REALM.CA
  11.  
  12. #Create the keytab files.
  13. #You will need these for Hadoop to be able to login
  14. xst -k hbase.service.keytab hbase/hadoop@REALM.CA
  15. xst -k hbaseHTTP.service.keytab hbaseHTTP/hadoop@REALM.CA

Set Keytab Permissions/Ownership

  1. sudo chown root:hadoopuser /etc/security/keytabs/*
  2. sudo chmod 750 /etc/security/keytabs/*

Install HBase

  1. wget http://apache.forsale.plus/hbase/2.1.0/hbase-2.1.0-bin.tar.gz
  2. tar -zxvf hbase-2.1.0-bin.tar.gz
  3. sudo mv hbase-2.1.0 /usr/local/hbase/
  4. cd /usr/local/hbase/conf/

Setup .bashrc:

  1. sudo nano ~/.bashrc

Add the following to the end of the file.

#HBASE VARIABLES START
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
export HBASE_CONF_DIR=$HBASE_HOME/conf
#HBASE VARIABLES END

  1. source ~/.bashrc

hbase_client_jaas.conf

  1. Client {
  2. com.sun.security.auth.module.Krb5LoginModule required
  3. useKeyTab=false
  4. useTicketCache=true;
  5. };

hbase_server_jaas.conf

  1. Client {
  2. com.sun.security.auth.module.Krb5LoginModule required
  3. useKeyTab=true
  4. useTicketCache=false
  5. keyTab="/etc/security/keytabs/hbase.service.keytab"
  6. principal="hbase/hadoop@REALM.CA";
  7. };

regionservers

  1. hadoop

hbase-env.sh

Add or modify the following settings.

  1. export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
  2. export HBASE_CONF_DIR=${HBASE_CONF_DIR:-/usr/local/hbase/conf}
  3. export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/local/hadoop/etc/hadoop}
  4. export HBASE_CLASSPATH="$CLASSPATH:$HADOOP_CONF_DIR"
  5. export HBASE_REGIONSERVERS=${HBASE_CONF_DIR}/regionservers
  6. export HBASE_LOG_DIR=${HBASE_HOME}/logs
  7. export HBASE_PID_DIR=/home/hadoopuser
  8. export HBASE_MANAGES_ZK=false
  9. export HBASE_OPTS="-Djava.security.auth.login.config=$HBASE_CONF_DIR/hbase_client_jaas.conf"
  10. export HBASE_MASTER_OPTS="-Djava.security.auth.login.config=$HBASE_CONF_DIR/hbase_server_jaas.conf"
  11. export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_CONF_DIR/hbase_server_jaas.conf"

hbase-site.xml

  1. <configuration>
  2. <property>
  3. <name>hbase.rootdir</name>
  4. <value>hdfs://hadoop:54310/hbase</value>
  5. </property>
  6. <property>
  7. <name>hbase.zookeeper.property.dataDir</name>
  8. <value>/usr/local/zookeeper/data</value>
  9. </property>
  10. <property>
  11. <name>hbase.cluster.distributed</name>
  12. <value>true</value>
  13. </property>
  14. <property>
  15. <name>hbase.regionserver.kerberos.principal</name>
  16. <value>hbase/_HOST@REALM.CA</value>
  17. </property>
  18. <property>
  19. <name>hbase.regionserver.keytab.file</name>
  20. <value>/etc/security/keytabs/hbase.service.keytab</value>
  21. </property>
  22. <property>
  23. <name>hbase.master.kerberos.principal</name>
  24. <value>hbase/_HOST@REALM.CA</value>
  25. </property>
  26. <property>
  27. <name>hbase.master.keytab.file</name>
  28. <value>/etc/security/keytabs/hbase.service.keytab</value>
  29. </property>
  30. <property>
  31. <name>hbase.security.authentication.spnego.kerberos.principal</name>
  32. <value>hbaseHTTP/_HOST@REALM.CA</value>
  33. </property>
  34. <property>
  35. <name>hbase.security.authentication.spnego.kerberos.keytab</name>
  36. <value>/etc/security/keytabs/hbaseHTTP.service.keytab</value>
  37. </property>
  38. <property>
  39. <name>hbase.security.authentication</name>
  40. <value>kerberos</value>
  41. </property>
  42. <property>
  43. <name>hbase.security.authorization</name>
  44. <value>true</value>
  45. </property>
  46. <property>
  47. <name>hbase.coprocessor.region.classes</name>
  48. <value>org.apache.hadoop.hbase.security.token.TokenProvider</value>
  49. </property>
  50. <property>
  51. <name>hbase.rpc.protection</name>
  52. <value>integrity</value>
  53. </property>
  54. <property>
  55. <name>hbase.rpc.engine</name>
  56. <value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
  57. </property>
  58. <property>
  59. <name>hbase.coprocessor.master.classes</name>
  60. <value>org.apache.hadoop.hbase.security.access.AccessController</value>
  61. </property>
  62. <property>
  63. <name>hbase.coprocessor.region.classes</name>
  64. <value>org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController</value>
  65. </property>
  66. <property>
  67. <name>hbase.security.authentication.ui</name>
  68. <value>kerberos</value>
  69. <description>Controls what kind of authentication should be used for the HBase web UIs.</description>
  70. </property>
  71. <property>
  72. <name>hbase.master.port</name>
  73. <value>16000</value>
  74. </property>
  75. <property>
  76. <name>hbase.master.info.bindAddress</name>
  77. <value>0.0.0.0</value>
  78. </property>
  79. <property>
  80. <name>hbase.master.info.port</name>
  81. <value>16010</value>
  82. </property>
  83. <property>
  84. <name>hbase.regionserver.hostname</name>
  85. <value>hadoop</value>
  86. </property>
  87. <property>
  88. <name>hbase.regionserver.port</name>
  89. <value>16020</value>
  90. </property>
  91. <property>
  92. <name>hbase.regionserver.info.port</name>
  93. <value>16030</value>
  94. </property>
  95. <property>
  96. <name>hbase.regionserver.info.bindAddress</name>
  97. <value>0.0.0.0</value>
  98. </property>
  99. <property>
  100. <name>hbase.master.ipc.address</name>
  101. <value>0.0.0.0</value>
  102. </property>
  103. <property>
  104. <name>hbase.regionserver.ipc.address</name>
  105. <value>0.0.0.0</value>
  106. </property>
  107. <property>
  108. <name>hbase.ssl.enabled</name>
  109. <value>true</value>
  110. </property>
  111. <property>
  112. <name>hadoop.ssl.enabled</name>
  113. <value>true</value>
  114. </property>
  115. <property>
  116. <name>ssl.server.keystore.keypassword</name>
  117. <value>startrek</value>
  118. </property>
  119. <property>
  120. <name>ssl.server.keystore.password</name>
  121. <value>startrek</value>
  122. </property>
  123. <property>
  124. <name>ssl.server.keystore.location</name>
  125. <value>/etc/security/serverKeys/keystore.jks</value>
  126. </property>
  127. <property>
  128. <name>hbase.rest.ssl.enabled</name>
  129. <value>true</value>
  130. </property>
  131. <property>
  132. <name>hbase.rest.ssl.keystore.store</name>
  133. <value>/etc/security/serverKeys/keystore.jks</value>
  134. </property>
  135. <property>
  136. <name>hbase.rest.ssl.keystore.password</name>
  137. <value>startrek</value>
  138. </property>
  139. <property>
  140. <name>hbase.rest.ssl.keystore.keypassword</name>
  141. <value>startrek</value>
  142. </property>
  143. <property>
  144. <name>hbase.superuser</name>
  145. <value>hduser</value>
  146. </property>
  147. <property>
  148. <name>hbase.tmp.dir</name>
  149. <value>/tmp/hbase-${user.name}</value>
  150. </property>
  151. <property>
  152. <name>hbase.local.dir</name>
  153. <value>${hbase.tmp.dir}/local</value>
  154. </property>
  155. <property>
  156. <name>hbase.zookeeper.property.clientPort</name>
  157. <value>2181</value>
  158. </property>
  159. <property>
  160. <name>hbase.unsafe.stream.capability.enforce</name>
  161. <value>false</value>
  162. </property>
  163. <property>
  164. <name>hbase.zookeeper.quorum</name>
  165. <value>hadoop</value>
  166. </property>
  167. <property>
  168. <name>zookeeper.znode.parent</name>
  169. <value>/hbase-secure</value>
  170. </property>
  171. <property>
  172. <name>hbase.regionserver.dns.interface</name>
  173. <value>enp0s3</value>
  174. </property>
  175. <property>
  176. <name>hbase.rest.authentication.type</name>
  177. <value>kerberos</value>
  178. </property>
  179. <property>
  180. <name>hadoop.proxyuser.HTTP.groups</name>
  181. <value>*</value>
  182. </property>
  183. <property>
  184. <name>hadoop.proxyuser.HTTP.hosts</name>
  185. <value>*</value>
  186. </property>
  187. <property>
  188. <name>hbase.rest.authentication.kerberos.keytab</name>
  189. <value>/etc/security/keytabs/hbaseHTTP.service.keytab</value>
  190. </property>
  191. <property>
  192. <name>hbase.rest.authentication.kerberos.principal</name>
  193. <value>hbaseHTTP/_HOST@REALM.CA</value>
  194. </property>
  195. <property>
  196. <name>hbase.rest.kerberos.principal</name>
  197. <value>hbase/_HOST@REALM.CA</value>
  198. </property>
  199. <property>
  200. <name>hbase.rest.keytab.file</name>
  201. <value>/etc/security/keytabs/hbase.service.keytab</value>
  202. </property>
  203. </configuration>

Change Ownership of HBase files

  1. sudo chown hadoopuser:hadoopuser -R /usr/local/hbase/*

Hadoop HDFS Config Changes

You will need to add two properties into the core-site.xml file of Hadoop.

  1. nano /usr/local/hadoop/etc/hadoop/core-site.xml
  2.  
  3. <property>
  4. <name>hadoop.proxyuser.hbase.hosts</name>
  5. <value>*</value>
  6. </property>
  7. <property>
  8. <name>hadoop.proxyuser.hbase.groups</name>
  9. <value>*</value>
  10. </property>
  11. <property>
  12. <name>hadoop.proxyuser.HTTP.hosts</name>
  13. <value>*</value>
  14. </property>
  15. <property>
  16. <name>hadoop.proxyuser.HTTP.groups</name>
  17. <value>*</value>
  18. </property>

AutoStart

  1. crontab -e
  2.  
  3. @reboot /usr/local/hbase/bin/hbase-daemon.sh --config /usr/local/hbase/conf/ start master
  4. @reboot /usr/local/hbase/bin/hbase-daemon.sh --config /usr/local/hbase/conf/ start regionserver
  5. @reboot /usr/local/hbase/bin/hbase-daemon.sh --config /usr/local/hbase/conf/ start rest --infoport 17001 -p 17000

Validation

  1. kinit -kt /etc/security/keytabs/hbase.service.keytab hbase/hadoop@REALM.ca
  2. hbase shell
  3. status 'detailed'
  4. whoami
  5. kdestroy

References

https://hbase.apache.org/0.94/book/security.html
https://pivotalhd-210.docs.pivotal.io/doc/2100/webhelp/topics/ConfiguringSecureHBase.html
https://ambari.apache.org/1.2.5/installing-hadoop-using-ambari/content/ambari-kerb-2-3-2-1.html
https://hbase.apache.org/book.html#_using_secure_http_https_for_the_web_ui

Zookeeper Kerberos Installation

We are going to install Zookeeper. Ensure you install Kerberos.

This assumes your hostname is “hadoop”

Install Java JDK

  1. apt-get update
  2. apt-get upgrade
  3. apt-get install default-jdk

Download Zookeeper:

  1. wget http://apache.forsale.plus/zookeeper/zookeeper-3.4.13/zookeeper-3.4.13.tar.gz
  2. tar -zxvf zookeeper-3.4.13.tar.gz
  3. sudo mv zookeeper-3.4.13 /usr/local/zookeeper/
  4. sudo chown -R root:hadoopuser /usr/local/zookeeper/

Setup .bashrc:

  1. sudo nano ~/.bashrc

Add the following to the end of the file.

#ZOOKEEPER VARIABLES START
export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
#ZOOKEEPER VARIABLES STOP

  1. source ~/.bashrc

Create Kerberos Principals

  1. cd /etc/security/keytabs
  2. sudo kadmin.local
  3. addprinc -randkey zookeeper/hadoop@REALM.CA
  4. xst -kt zookeeper.service.keytab zookeeper/hadoop@REALM.CA
  5. q

Set Keytab Permissions/Ownership

  1. sudo chown root:hadoopuser /etc/security/keytabs/*
  2. sudo chmod 750 /etc/security/keytabs/*

zoo.cfg

  1. cd /usr/local/zookeeper/conf/
  2. cp zoo_sample.cfg zoo.cfg
  3. nano zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/usr/local/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to “0” to disable auto purge feature
#autopurge.purgeInterval=1

server.1=hadoop:2888:3888

authProvider.1 = org.apache.zookeeper.server.auth.SASLAuthenticationProvider
kerberos.removeHostFromPrincipal = true
kerberos.removeRealmFromPrincipal = true
jaasLoginRenew=3600000

java.env

  1. cd /usr/local/zookeeper/conf/
  2. touch java.env
  3. nano java.env

ZOO_LOG4J_PROP=”INFO,ROLLINGFILE”
ZOO_LOG_DIR=”/usr/local/zookeeper/logs”

zookeeper_client_jaas.conf

  1. cd /usr/local/zookeeper/conf/
  2. touch zookeeper_client_jaas.conf
  3. nano zookeeper_client_jaas.conf

Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=false
useTicketCache=true;
};

zookeeper_jaas.conf

  1. cd /usr/local/zookeeper/conf/
  2. touch zookeeper_jaas.conf
  3. nano zookeeper_jaas.conf

Server {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
useTicketCache=false
keyTab=”/etc/security/keytabs/zookeeper.service.keytab”
principal=”zookeeper/hadoop@REALM.CA”;
};

zkServer.sh

  1. cd /usr/local/zookeeper/bin/
  2. nano zkServer.sh
  3.  
  4. #Add the following at the top
  5.  
  6. export CLIENT_JVMFLAGS="-Djava.security.auth.login.config=/usr/local/zookeeper/conf/zookeeper_client_jaas.conf"
  7. export SERVER_JVMFLAGS="-Xmx1024m -Djava.security.auth.login.config=/usr/local/zookeeper/conf/zookeeper_jaas.conf"

zkCli.sh

  1. cd /usr/local/zookeeper/bin/
  2. nano zkCli.sh
  3.  
  4. #Add the following at the top
  5.  
  6. export CLIENT_JVMFLAGS="-Djava.security.auth.login.config=/usr/local/zookeeper/conf/zookeeper_client_jaas.conf"
  7. export SERVER_JVMFLAGS="-Xmx1024m -Djava.security.auth.login.config=/usr/local/zookeeper/conf/zookeeper_jaas.conf"

MkDir

  1. mkdir /usr/local/zookeeper/data/
  2. mkdir /usr/local/zookeeper/logs/
  3.  
  4. echo "1" > /usr/local/zookeeper/data/myid
  5.  
  6. sudo chown -R hduser:hduser /usr/local/zookeeper

Auto Start

  1. crontab -e
  2.  
  3. #Add the following
  4. @reboot /usr/local/zookeeper/bin/zkServer.sh start

Run Client

  1. kinit -kt /etc/security/keytabs/zookeeper.service.keytab zookeeper/hadoop@REALM.CA
  2. ./zkCli.sh -server 127.0.0.1:2181
  3.  
  4. #Now you can list all directories
  5. ls /
  6.  
  7. #Or delete directories
  8.  
  9. rmr /folder

References

https://my-bigdata-blog.blogspot.com/2017/07/apache-Zookeeper-install-Ubuntu.html
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_command-line-installation/content/zookeeper_configuration.html
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_command-line-installation/content/securing_zookeeper_with_kerberos.html

 

 

 

Kafka & Java: Secured Consumer Read Record

In this tutorial I will show you how to read a record to Kafka. Before you begin you will need Maven/Eclipse all setup and a project ready to go. If you haven’t installed Kafka Kerberos yet please do so.

Import SSL Cert to Java:

Follow this tutorial to “Installing unlimited strength encryption Java libraries

If on Windows do the following

  1. #Import it
  2. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -import -file hadoop.csr -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts" -alias "hadoop"
  3.  
  4. #Check it
  5. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -list -v -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"
  6.  
  7. #If you want to delete it
  8. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -delete -alias hadoop -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

POM.xml

  1. <dependency>
  2. <groupId>org.apache.kafka</groupId>
  3. <artifactId>kafka-clients</artifactId>
  4. <version>1.1.0</version>
  5. </dependency>

Imports

  1. import org.apache.kafka.clients.consumer.*;
  2. import java.util.Properties;
  3. import java.io.InputStream;
  4. import java.util.Arrays;

Consumer JAAS Conf (client_jaas.conf)

  1. KafkaClient {
  2. com.sun.security.auth.module.Krb5LoginModule required
  3. useTicketCache=false
  4. refreshKrb5Config=true
  5. debug=true
  6. useKeyTab=true
  7. storeKey=true
  8. keyTab="c:\\data\\kafka.service.keytab"
  9. principal="kafka/hadoop@REALM.CA";
  10. };

Consumer Props File

You can go here to view all the options for consumer properties.

  1. bootstrap.servers=hadoop:9094
  2. group.id=test
  3.  
  4. security.protocol=SASL_SSL
  5. sasl.kerberos.service.name=kafka
  6.  
  7. #offset will be periodically committed in the background
  8. enable.auto.commit=true
  9.  
  10. # The serializer for the key
  11. key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
  12.  
  13. # The serializer for the value
  14. value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
  15.  
  16. # heartbeat to detect worker failures
  17. session.timeout.ms=10000
  18.  
  19. #Automatically reset offset to earliest offset
  20. auto.offset.reset=earliest

Initiate Kerberos Authentication

  1. System.setProperty("java.security.auth.login.config", "C:\\data\\kafkaconnect\\kafka\\src\\main\\resources\\client_jaas.conf");
  2. System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
  3. System.setProperty("java.security.krb5.conf", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\krb5.conf");
  4. System.setProperty("java.security.krb5.realm", "REALM.CA");
  5. System.setProperty("java.security.krb5.kdc", "REALM.CA");
  6. System.setProperty("sun.security.krb5.debug", "false");
  7. System.setProperty("javax.net.debug", "false");
  8. System.setProperty("javax.net.ssl.keyStorePassword", "changeit");
  9. System.setProperty("javax.net.ssl.keyStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
  10. System.setProperty("javax.net.ssl.trustStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
  11. System.setProperty("javax.net.ssl.trustStorePassword", "changeit");
  12. System.setProperty("javax.security.auth.useSubjectCredsOnly", "true");

Consumer Connection/Send

The record we will read will just be a string for both key and value.

  1. Consumer<String, String> consumer = null;
  2.  
  3. try {
  4. ClassLoader classLoader = getClass().getClassLoader();
  5.  
  6. try (InputStream props = classLoader.getResourceAsStream("consumer.props")) {
  7. Properties properties = new Properties();
  8. properties.load(props);
  9. consumer = new KafkaConsumer<>(properties);
  10. }
  11. System.out.println("Consumer Created");
  12.  
  13. // Subscribe to the topic.
  14. consumer.subscribe(Arrays.asList("testTopic"));
  15.  
  16. while (true) {
  17. final ConsumerRecords<String, String> consumerRecords = consumer.poll(1000);
  18. if (consumerRecords.count() == 0) {
  19. //Keep reading till no records
  20. break;
  21. }
  22.  
  23. consumerRecords.forEach(record -> {
  24. System.out.printf("Consumer Record:(%s, %s, %d, %d)\n", record.key(), record.value(), record.partition(), record.offset());
  25. });
  26.  
  27. //Commit offsets returned on the last poll() for all the subscribed list of topics and partition
  28. consumer.commitAsync();
  29. }
  30. } finally {
  31. consumer.close();
  32. }
  33. System.out.println("Consumer Closed");

References

I used kafka-sample-programs as a guide for setting up props.

Kafka: Kerberize/SSL

In this tutorial I will show you how to use Kerberos/SSL with NiFi. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server and Kafka.

If you don’t want to use the built in Zookeeper you can setup your own. To do that following this tutorial.

This assumes your hostname is “hadoop”

Create Kerberos Principals

  1. cd /etc/security/keytabs/
  2.  
  3. sudo kadmin.local
  4.  
  5. #You can list princepals
  6. listprincs
  7.  
  8. #Create the following principals
  9. addprinc -randkey kafka/hadoop@REALM.CA
  10. addprinc -randkey zookeeper/hadoop@REALM.CA
  11.  
  12. #Create the keytab files.
  13. #You will need these for Hadoop to be able to login
  14. xst -k kafka.service.keytab kafka/hadoop@REALM.CA
  15. xst -k zookeeper.service.keytab zookeeper/hadoop@REALM.CA

Set Keytab Permissions/Ownership

  1. sudo chown root:hadoopuser /etc/security/keytabs/*
  2. sudo chmod 750 /etc/security/keytabs/*

Hosts Update

  1. sudo nano /etc/hosts
  2.  
  3. #Remove 127.0.1.1 line
  4.  
  5. #Change 127.0.0.1 to the following
  6. 127.0.0.1 realm.ca hadoop localhost

Ubuntu Firewall

  1. sudo ufw disable

SSL

Setup SSL Directories if you have not previously done so.

  1. sudo mkdir -p /etc/security/serverKeys
  2. sudo chown -R root:hadoopuser /etc/security/serverKeys/
  3. sudo chmod 755 /etc/security/serverKeys/
  4.  
  5. cd /etc/security/serverKeys

Setup Keystore

  1. sudo keytool -genkey -alias NAMENODE -keyalg RSA -keysize 1024 -dname "CN=NAMENODE,OU=ORGANIZATION_UNIT,C=canada" -keypass PASSWORD -keystore /etc/security/serverKeys/keystore.jks -storepass PASSWORD
  2. sudo keytool -export -alias NAMENODE -keystore /etc/security/serverKeys/keystore.jks -rfc -file /etc/security/serverKeys/NAMENODE.csr -storepass PASSWORD

Setup Truststore

  1. sudo keytool -import -noprompt -alias NAMENODE -file /etc/security/serverKeys/NAMENODE.csr -keystore /etc/security/serverKeys/truststore.jks -storepass PASSWORD

Generate Self Signed Certifcate

  1. sudo openssl genrsa -out /etc/security/serverKeys/NAMENODE.key 2048
  2.  
  3. sudo openssl req -x509 -new -key /etc/security/serverKeys/NAMENODE.key -days 300 -out /etc/security/serverKeys/NAMENODE.pem
  4.  
  5. sudo keytool -keystore /etc/security/serverKeys/keystore.jks -alias NAMENODE -certreq -file /etc/security/serverKeys/NAMENODE.cert -storepass PASSWORD -keypass PASSWORD
  6.  
  7. sudo openssl x509 -req -CA /etc/security/serverKeys/NAMENODE.pem -CAkey /etc/security/serverKeys/NAMENODE.key -in /etc/security/serverKeys/NAMENODE.cert -out /etc/security/serverKeys/NAMENODE.signed -days 300 -CAcreateserial

Setup File Permissions

  1. sudo chmod 440 /etc/security/serverKeys/*
  2. sudo chown root:hadoopuser /etc/security/serverKeys/*

Edit server.properties Config

  1. cd /usr/local/kafka/config
  2.  
  3. sudo nano server.properties
  4.  
  5. #Edit or Add the following properties.
  6. ssl.endpoint.identification.algorithm=HTTPS
  7. ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
  8. ssl.key.password=PASSWORD
  9. ssl.keystore.location=/etc/security/serverKeys/keystore.jks
  10. ssl.keystore.password=PASSWORD
  11. ssl.truststore.location=/etc/security/serverKeys/truststore.jks
  12. ssl.truststore.password=PASSWORD
  13. listeners=SASL_SSL://:9094
  14. security.inter.broker.protocol=SASL_SSL
  15. ssl.client.auth=required
  16. authorizer.class.name=kafka.security.auth.SimpleAclAuthorizer
  17. ssl.keystore.type=JKS
  18. ssl.truststore.type=JKS
  19. sasl.kerberos.service.name=kafka
  20. zookeeper.connect=hadoop:2181
  21. sasl.mechanism.inter.broker.protocol=GSSAPI
  22. sasl.enabled.mechanisms=GSSAPI

Edit zookeeper.properties Config

  1. sudo nano zookeeper.properties
  2.  
  3. #Edit or Add the following properties.
  4.  
  5. server.1=hadoop:2888:3888
  6. clientPort=2181
  7. authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
  8. requireClientAuthScheme=SASL
  9. jaasLoginRenew=3600000

Edit producer.properties Config

  1. sudo nano producer.properties
  2.  
  3. bootstrap.servers=hadoop:9094
  4. security.protocol=SASL_SSL
  5. sasl.kerberos.service.name=kafka
  6. ssl.truststore.location=/etc/security/serverKeys/truststore.jks
  7. ssl.truststore.password=PASSWORD
  8. ssl.keystore.location=/etc/security/serverKeys/keystore.jks
  9. ssl.keystore.password=PASSWORD
  10. ssl.key.password=PASSWORD
  11. sasl.mechanism=GSSAPI

Edit consumer.properties Config

  1. sudo nano consumer.properties
  2.  
  3. zookeeper.connect=hadoop:2181
  4. bootstrap.servers=hadoop:9094
  5. group.id=securing-kafka-group
  6. security.protocol=SASL_SSL
  7. sasl.kerberos.service.name=kafka
  8. ssl.truststore.location=/etc/security/serverKeys/truststore.jks
  9. ssl.truststore.password=PASSWORD
  10. sasl.mechanism=GSSAPI

Add zookeeper_jass.conf Config

  1. sudo nano zookeeper_jass.conf
  2.  
  3. Server {
  4. com.sun.security.auth.module.Krb5LoginModule required
  5. debug=true
  6. useKeyTab=true
  7. keyTab="/etc/security/keytabs/zookeeper.service.keytab"
  8. storeKey=true
  9. useTicketCache=true
  10. refreshKrb5Config=true
  11. principal="zookeeper/hadoop@REALM.CA";
  12. };

Add kafkaserver_jass.conf Config

  1. sudo nano kafkaserver_jass.conf
  2.  
  3. KafkaServer {
  4. com.sun.security.auth.module.Krb5LoginModule required
  5. debug=true
  6. useKeyTab=true
  7. storeKey=true
  8. refreshKrb5Config=true
  9. keyTab="/etc/security/keytabs/kafka.service.keytab"
  10. principal="kafka/hadoop@REALM.CA";
  11. };
  12.  
  13. kafkaClient {
  14. com.sun.security.auth.module.Krb5LoginModule required
  15. useTicketCache=true
  16. refreshKrb5Config=true
  17. debug=true
  18. useKeyTab=true
  19. storeKey=true
  20. keyTab="/etc/security/keytabs/kafka.service.keytab"
  21. principal="kafka/hadoop@REALM.CA";
  22. };

Edit kafka-server-start.sh

  1. cd /usr/local/kafka/bin/
  2.  
  3. sudo nano kafka-server-start.sh
  4.  
  5. jaas="$base_dir/../config/kafkaserver_jaas.conf"
  6.  
  7. export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=$jaas"

Edit zookeeper-server-start.sh

  1. sudo nano zookeeper-server-start.sh
  2.  
  3. jaas="$base_dir/../config/zookeeper_jaas.conf"
  4.  
  5. export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=$jaas"

Kafka-ACL

  1. cd /usr/local/kafka/bin/
  2.  
  3. #Grant topic access and cluster access
  4. ./kafka-acls.sh --operation All --allow-principal User:kafka --authorizer-properties zookeeper.connect=hadoop:2181 --add --cluster
  5. ./kafka-acls.sh --operation All --allow-principal User:kafka --authorizer-properties zookeeper.connect=hadoop:2181 --add --topic TOPIC
  6.  
  7. #Grant all groups for a specific topic
  8. ./kafka-acls.sh --operation All --allow-principal User:kafka --authorizer-properties zookeeper.connect=hadoop:2181 --add --topic TOPIC --group *
  9.  
  10. #If you want to remove cluster access
  11. ./kafka-acls.sh --authorizer-properties zookeeper.connect=hadoop:2181 --remove --cluster
  12.  
  13. #If you want to remove topic access
  14. ./kafka-acls.sh --authorizer-properties zookeeper.connect=hadoop:2181 --remove --topic TOPIC
  15.  
  16. #List access for cluster
  17. ./kafka-acls.sh --list --authorizer-properties zookeeper.connect=hadoop:2181 --cluster
  18.  
  19. #List access for topic
  20. ./kafka-acls.sh --list --authorizer-properties zookeeper.connect=hadoop:2181 --topic TOPIC

kafka-console-producer.sh

If you want to test using the console producer you need to make these changes.

  1. cd /usr/local/kafka/bin/
  2. nano kafka-console-producer.sh
  3.  
  4. #Add the below before the last line
  5.  
  6. base_dir=$(dirname $0)
  7. jaas="$base_dir/../config/kafkaserver_jaas.conf"
  8. export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=$jaas"
  9.  
  10.  
  11. #Now you can run the console producer
  12. ./kafka-console-producer.sh --broker-list hadoop:9094 --topic TOPIC -producer.config ../config/producer.properties

kafka-console-consumer.sh

If you want to test using the console consumer you need to make these changes.

  1. cd /usr/local/kafka/bin/
  2. nano kafka-console-consumer.sh
  3.  
  4. #Add the below before the last line
  5.  
  6. base_dir=$(dirname $0)
  7. jaas="$base_dir/../config/kafkaserver_jaas.conf"
  8. export KAFKA_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=$jaas"
  9.  
  10.  
  11. #Now you can run the console consumer
  12. ./kafka-console-consumer.sh --bootstrap-server hadoop:9094 --topic TOPIC --consumer.config ../config/consumer.properties --from-beginning

References

https://www.confluent.io/blog/apache-kafka-security-authorization-authentication-encryption/
https://github.com/confluentinc/securing-kafka-blog/blob/master/manifests/default.pp

Kafka & Java: Consumer Seek To Beginning

This is a quick tutorial on how to seek to beginning using a Kafka consumer. If you haven’t setup the consumer yet follow this tutorial.

This is all that is required once you have setup the consumer. This will put the kafka offset for the topic of your choice to the beginning so once you start reading you will get all records.

  1. consumer.seekToBeginning(consumer.assignment());

Hive & Java: Connect to Remote Kerberos Hive using KeyTab

In this tutorial I will show you how to connect to remote Kerberos Hive cluster using Java. If you haven’t install Hive yet follow the tutorial.

Import SSL Cert to Java:

Follow this tutorial to “Installing unlimited strength encryption Java libraries

If on Windows do the following

  1. #Import it
  2. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -import -file hadoop.csr -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts" -alias "hadoop"
  3.  
  4. #Check it
  5. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -list -v -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"
  6.  
  7. #If you want to delete it
  8. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -delete -alias hadoop -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

POM.xml:

  1. <dependency>
  2. <groupId>org.apache.hive</groupId>
  3. <artifactId>hive-jdbc</artifactId>
  4. <version>2.3.3</version>
  5. <exclusions>
  6. <exclusion>
  7. <groupId>jdk.tools</groupId>
  8. <artifactId>jdk.tools</artifactId>
  9. </exclusion>
  10. </exclusions>
  11. </dependency>

Imports:

  1. import org.apache.hadoop.conf.Configuration;
  2. import org.apache.hadoop.security.UserGroupInformation;
  3. import java.sql.SQLException;
  4. import java.sql.Connection;
  5. import java.sql.ResultSet;
  6. import java.sql.Statement;
  7. import java.sql.DriverManager;

Connect:

  1. // Setup the configuration object.
  2. final Configuration config = new Configuration();
  3.  
  4. config.set("fs.defaultFS", "swebhdfs://hadoop:50470");
  5. config.set("hadoop.security.authentication", "kerberos");
  6. config.set("hadoop.rpc.protection", "integrity");
  7.  
  8. System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
  9. System.setProperty("java.security.krb5.conf", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\krb5.conf");
  10. System.setProperty("java.security.krb5.realm", "REALM.CA");
  11. System.setProperty("java.security.krb5.kdc", "REALM.CA");
  12. System.setProperty("sun.security.krb5.debug", "true");
  13. System.setProperty("javax.net.debug", "all");
  14. System.setProperty("javax.net.ssl.keyStorePassword","changeit");
  15. System.setProperty("javax.net.ssl.keyStore","C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
  16. System.setProperty("javax.net.ssl.trustStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
  17. System.setProperty("javax.net.ssl.trustStorePassword","changeit");
  18. System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
  19.  
  20. UserGroupInformation.setConfiguration(config);
  21. UserGroupInformation.setLoginUser(UserGroupInformation.loginUserFromKeytabAndReturnUGI("hive/hadoop@REALM.CA", "c:\\data\\hive.service.keytab"));
  22.  
  23. System.out.println(UserGroupInformation.getLoginUser());
  24. System.out.println(UserGroupInformation.getCurrentUser());
  25.  
  26. //Add the hive driver
  27. Class.forName("org.apache.hive.jdbc.HiveDriver");
  28.  
  29. //Connect to hive jdbc
  30. Connection connection = DriverManager.getConnection("jdbc:hive2://hadoop:10000/default;principal=hive/hadoop@REALM.CA");
  31. Statement statement = connection.createStatement();
  32.  
  33. //Create a table
  34. String createTableSql = "CREATE TABLE IF NOT EXISTS "
  35. +" employee ( eid int, name String, "
  36. +" salary String, designation String)"
  37. +" COMMENT 'Employee details'"
  38. +" ROW FORMAT DELIMITED"
  39. +" FIELDS TERMINATED BY '\t'"
  40. +" LINES TERMINATED BY '\n'"
  41. +" STORED AS TEXTFILE";
  42.  
  43. System.out.println("Creating Table: " + createTableSql);
  44. statement.executeUpdate(createTableSql);
  45.  
  46. //Show all the tables to ensure we successfully added the table
  47. String showTablesSql = "show tables";
  48. System.out.println("Show All Tables: " + showTablesSql);
  49. ResultSet res = statement.executeQuery(showTablesSql);
  50.  
  51. while (res.next()) {
  52. System.out.println(res.getString(1));
  53. }
  54.  
  55. //Drop the table
  56. String dropTablesSql = "DROP TABLE IF EXISTS employee";
  57.  
  58. System.out.println("Dropping Table: " + dropTablesSql);
  59. statement.executeUpdate(dropTablesSql);
  60.  
  61. System.out.println("Finish!");

NiFi: Kerberize/SSL

In this tutorial I will show you how to use Kerberos/SSL with NiFi. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server and NiFi.

This assumes your hostname is “hadoop”

Create Kerberos Principals

  1. cd /etc/security/keytabs/
  2.  
  3. sudo kadmin.local
  4.  
  5. #You can list principals
  6. listprincs
  7.  
  8. #Create the following principals
  9. addprinc -randkey nifi/hadoop@REALM.CA
  10. addprinc -randkey nifi-spnego/hadoop@REALM.CA
  11. #Notice this user does not have -randkey because we are a login user
  12. #Also notice that this user does not have a keytab created
  13. addprinc admin/hadoop@REALM.CA
  14.  
  15.  
  16. #Create the keytab files.
  17. #You will need these for Hadoop to be able to login
  18. xst -k nifi.service.keytab nifi/hadoop@REALM.CA
  19. xst -k nifi-spnego.service.keytab nifi-spnego/hadoop@REALM.CA

Set Keytab Permissions/Ownership

  1. sudo chown root:hadoopuser /etc/security/keytabs/*
  2. sudo chmod 750 /etc/security/keytabs/*

Stop NiFi

  1. sudo service nifi stop

Hosts Update

  1. sudo nano /etc/hosts
  2.  
  3. #Remove 127.0.1.1 line
  4.  
  5. #Change 127.0.0.1 to the following
  6. 127.0.0.1 gaudreault_kdc.ca hadoop localhost

Ubuntu Firewall

  1. sudo ufw disable

sysctl.conf

Disable ipv6 as it causes issues in getting your server up and running.

  1. nano /etc/sysctl.conf

Add the following to the end and save

  1. net.ipv6.conf.all.disable_ipv6 = 1
  2. net.ipv6.conf.default.disable_ipv6 = 1
  3. net.ipv6.conf.lo.disable_ipv6 = 1
  4. #Change eth0 to what ifconfig has
  5. net.ipv6.conf.eth0.disable_ipv6 = 1

Close sysctl

  1. sysctl -p
  2. cat /proc/sys/net/ipv6/conf/all/disable_ipv6
  3. reboot

TrustStore / KeyStore

  1. #Creating your Certificate Authority
  2. sudo mkdir -p /etc/security/serverKeys
  3. sudo chown -R root:hduser /etc/security/serverKeys/
  4. sudo chmod 750 /etc/security/serverKeys/
  5. cd /etc/security/serverKeys
  6.  
  7. sudo openssl genrsa -aes128 -out nifi.key 4096
  8. sudo openssl req -x509 -new -key nifi.key -days 1095 -out nifi.pem
  9. sudo openssl rsa -check -in nifi.key #check it
  10. sudo openssl x509 -outform der -in nifi.pem -out nifi.der
  11. sudo keytool -import -keystore truststore.jks -file nifi.der -alias nifi
  12. #***You must type 'yes' to trust this certificate.
  13. sudo keytool -v -list -keystore truststore.jks
  14.  
  15. #Creating your Server Keystore
  16. sudo keytool -genkey -alias nifi -keyalg RSA -keystore keystore.jks -keysize 2048
  17. sudo keytool -certreq -alias nifi -keystore keystore.jks -file nifi.csr
  18. sudo openssl x509 -sha256 -req -in nifi.csr -CA nifi.pem -CAkey nifi.key -CAcreateserial -out nifi.crt -days 730
  19. sudo keytool -import -keystore keystore.jks -file nifi.pem
  20. sudo keytool -import -trustcacerts -alias nifi -file nifi.crt -keystore keystore.jks
  21.  
  22. sudo chown -R root:hduser /etc/security/serverKeys/*
  23. sudo chmod 750 /etc/security/serverKeys/*

nifi.properties

  1. cd /usr/local/nifi/conf/
  2. nano nifi.properties
  3.  
  4. #Find "# Site to Site properties" and change the following properties to what is below
  5.  
  6. nifi.remote.input.host=
  7. nifi.remote.input.secure=true
  8. nifi.remote.input.socket.port=9096
  9. nifi.remote.input.http.enabled=false
  10.  
  11. #Find "# web properties #" and change the following properties to what is below
  12.  
  13. nifi.web.http.host=
  14. nifi.web.http.port=
  15. nifi.web.https.host=0.0.0.0
  16. nifi.web.https.port=9095
  17.  
  18. #Find "# security properties #" and change the following properties to what is below
  19.  
  20. nifi.security.keystore=/etc/security/serverKeys/keystore.jks
  21. nifi.security.keystoreType=JKS
  22. nifi.security.keystorePasswd=PASSWORD
  23. nifi.security.keyPasswd=PASSWORD
  24. nifi.security.truststore=/etc/security/serverKeys/truststore.jks
  25. nifi.security.truststoreType=JKS
  26. nifi.security.truststorePasswd=PASSWORD
  27. nifi.security.needClientAuth=true
  28. nifi.security.user.authorizer=managed-authorizer
  29. nifi.security.user.login.identity.provider=kerberos-provider
  30.  
  31. #Find "# Core Properties #" and change the following properties to what is below
  32.  
  33. nifi.authorizer.configuration.file=./conf/authorizers.xml
  34. nifi.login.identity.provider.configuration.file=./conf/login-identity-providers.xml
  35.  
  36. #Find "# kerberos #" and change the following properties to what is below
  37.  
  38. nifi.kerberos.krb5.file=/etc/krb5.conf
  39.  
  40. #Find "# kerberos service principal #" and change the following properties to what is below
  41.  
  42. nifi.kerberos.service.principal=nifi/hadoop@REALM.CA
  43. nifi.kerberos.service.keytab.location=/etc/security/keytabs/nifi.service.keytab
  44.  
  45. #Find "# kerberos spnego principal #" and change the following properties to what is below
  46.  
  47. nifi.kerberos.spnego.principal=nifi-spnego/hadoop@REALM.CA
  48. nifi.kerberos.spnego.keytab.location=/etc/security/keytabs/nifi-spnego.service.keytab
  49. nifi.kerberos.spnego.authentication.expiration=12 hours
  50.  
  51. #Find "# cluster common properties (all nodes must have same values) #" and change the following properties to what is below
  52.  
  53. nifi.cluster.protocol.is.secure=true

login-identity-providers.xml

  1. nano login-identity-providers.xml
  2.  
  3. #Find "kerberos-provider"
  4. <provider>
  5. <identifier>kerberos-provider</identifier>
  6. <class>org.apache.nifi.kerberos.KerberosProvider</class>
  7. <property name="Default Realm">REALM.CA</property>
  8. <property name="Kerberos Config File">/etc/krb5.conf</property>
  9. <property name="Authentication Expiration">12 hours</property>
  10. </provider>

authorizers.xml

  1. nano authorizers.xml
  2.  
  3. #Find "file-provider"
  4. <authorizer>
  5. <identifier>file-provider</identifier>
  6. <class>org.apache.nifi.authorization.FileAuthorizer</class>
  7. <property name="Authorizations File">./conf/authorizations.xml</property>
  8. <property name="Users File">./conf/users.xml</property>
  9. <property name="Initial Admin Identity">admin/hadoop@REALM.CA</property>
  10. <property name="Legacy Authorized Users File"></property>
  11.  
  12. <property name="Node Identity 1"></property>
  13. </authorizer>

Start Nifi

  1. sudo service nifi start

NiFi Web Login

Issues:

  • If you get the error “No applicable policies could be found” after logging in and no GUI is shown stop the NiFi service and restart. Then you should be good.
  • If you can then login but you don’t have any policies still you will need to update “authorizations.xml” and add the below lines. Making sure to change the resource process group id to the root process group id and the user id to the user id
  1. nano /usr/local/nifi/conf/authorizations.xml
  2.  
  3. <policy identifier="1c897e9d-3dd5-34ca-ae3d-75fb5ee3e1a5" resource="/data/process-groups/##CHANGE TO ROOT ID##" action="R">
  4. <user identifier="##CHANGE TO USER ID##"/>
  5. </policy>
  6. <policy identifier="91c64c2d-7848-371d-9d5f-db71138b152f" resource="/data/process-groups/##CHANGE TO ROOT ID##" action="W">
  7. <user identifier="##CHANGE TO USER ID##"/>
  8. </policy>
  9. <policy identifier="7aeb4d67-e2e1-3a3e-a8fa-94576f35539e" resource="/process-groups/##CHANGE TO ROOT ID##" action="R">
  10. <user identifier="##CHANGE TO USER ID##"/>
  11. </policy>
  12. <policy identifier="f5b620e0-b094-3f70-9542-dd6920ad5bd9" resource="/process-groups/##CHANGE TO ROOT ID##" action="W">
  13. <user identifier="##CHANGE TO USER ID##"/>
  14. </policy>

References

https://community.hortonworks.com/articles/34147/nifi-security-user-authentication-with-kerberos.html

https://community.hortonworks.com/content/supportkb/151106/nifi-how-to-create-your-own-certs-for-securing-nif.html

Scala: Basic Class Creation

In this tutorial I will show you how to create your first Scala class and then use it. I am just beginning with Scala during the time of this writing. Review the Scala style guide.

So the first thing we want to do is determine what we want to create a class to represent. In this tutorial I am just going to play around and use Person. We will want to create a constructor, getters, setters, toString and then finally a method to combine some properties.

Create your class.

  1. class Person {
  2. }

We could have added variables to the Person declaration. But I thought I’d leave that out for now.

Create our private first and last name.

  1. private var _firstName: String = null

When we set variables as private in the class they are not accessible from outside the class. Notice how the variable starts with “_” this is just one of the Scala naming conventions.

Create our constructor

  1. /**
  2. * @constructor Creates a person with first/last name
  3. * @param firstName the persons first name
  4. * @param lastName the persons last name
  5. */
  6. def this(firstName: String, lastName: String) {
  7. this()
  8. _firstName = firstName
  9. _lastName = lastName
  10. }

This is where we can set the first and last name when we instantiate our object.

Create a getter

  1. def firstName = _firstName

Create a setter

  1. def firstName_=(firstName: String) {
  2. _firstName = firstName
  3. }

Override toString

  1. override def toString = s"firstName = $firstName"

Notice how their is “s” before the string and we have $firstname there. That will reference the variable itself.

Create a Method

  1. def fullName: String = {
  2. return s"$firstName $lastName"
  3. }

This will just give you the full name of the person.

Putting it all together

  1. package models
  2.  
  3. class Person {
  4. private var _firstName: String = null
  5. private var _lastName: String = null
  6. /**
  7. * @constructor Creates a person with first/last name
  8. * @param firstName the persons first name
  9. * @param lastName the persons last name
  10. */
  11. def this(firstName: String, lastName: String) {
  12. this()
  13. _firstName = firstName
  14. _lastName = lastName
  15. }
  16. //Getter
  17. def firstName = _firstName
  18. def lastName = _lastName
  19. //Setter
  20. def firstName_=(firstName: String) {
  21. _firstName = firstName
  22. }
  23. def lastName_=(lastName: String) {
  24. _lastName = lastName
  25. }
  26. def fullName: String = {
  27. return s"$firstName $lastName"
  28. }
  29. override def toString = s"firstName = $firstName, lastName = $lastName"
  30. }

So what I have shown you above will get you started on creating your first class but you could make it alot cleaner with less code. it’s entirely up to you how you want to proceed and what you feel comfortable with.

  1. package models
  2.  
  3. class PersonCondensed {
  4. var firstName:String = null
  5. var lastName:String = null
  6. /**
  7. * @constructor Creates a person with first/last name
  8. * @param firstName the persons first name
  9. * @param lastName the persons last name
  10. */
  11. def this(firstName: String, lastName: String) {
  12. this()
  13. this.firstName = firstName
  14. this.lastName = lastName
  15. }
  16. def fullName: String = {
  17. return s"$firstName $lastName"
  18. }
  19. override def toString = s"firstName = $firstName, lastName = $lastName"
  20. }

Using our class

Here are the three different ways of calling our classes we did above.

  1. import models.Person
  2. import models.PersonCondensed
  3.  
  4. object Test {
  5. def main(args: Array[String]) {
  6. val person = new Person()
  7. person.firstName_=("John")
  8. person.lastName_=("Smith")
  9. person.value=(234)
  10. println(person.fullName)
  11. println(person.toString())
  12. val person2 = new Person("John", "Smith")
  13. println(person2.fullName)
  14. println(person2.toString())
  15. val person3 = new PersonCondensed()
  16. person3.firstName=("John")
  17. person3.lastName=("Smith")
  18. println(person3.firstName)
  19. println(person3.lastName)
  20. println(person3.fullName)
  21. println(person3.toString())
  22. }
  23. }

Hadoop & Java: Connect to Remote Kerberos HDFS using KeyTab

In this tutorial I will show you how to connect to remote Kerberos HDFS cluster using Java.  If you haven’t install hdfs with kerberos yet follow the tutorial.

Import SSL Cert to Java:

Follow this tutorial to “Installing unlimited strength encryption Java libraries

If on Windows do the following

  1. #Import it
  2. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -import -file hadoop.csr -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts" -alias "hadoop"
  3.  
  4. #Check it
  5. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -list -v -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"
  6.  
  7. #If you want to delete it
  8. "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -delete -alias hadoop -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"

POM.xml:

  1. <dependency>
  2. <groupId>org.apache.hadoop</groupId>
  3. <artifactId>hadoop-client</artifactId>
  4. <version>2.9.1</version>
  5. </dependency>

Imports:

  1. import org.apache.hadoop.conf.Configuration;
  2. import org.apache.hadoop.fs.FileStatus;
  3. import org.apache.hadoop.fs.FileSystem;
  4. import org.apache.hadoop.fs.Path;
  5. import org.apache.hadoop.security.UserGroupInformation;

Connect:

  1. // Setup the configuration object.
  2. final Configuration config = new Configuration();
  3.  
  4. config.set("fs.defaultFS", "swebhdfs://hadoop:50470");
  5. config.set("hadoop.security.authentication", "kerberos");
  6. config.set("hadoop.rpc.protection", "integrity");
  7.  
  8. System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2");
  9. System.setProperty("java.security.krb5.conf", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\krb5.conf");
  10. System.setProperty("java.security.krb5.realm", "REALM.CA");
  11. System.setProperty("java.security.krb5.kdc", "REALM.CA");
  12. System.setProperty("sun.security.krb5.debug", "true");
  13. System.setProperty("javax.net.debug", "all");
  14. System.setProperty("javax.net.ssl.keyStorePassword","YOURPASSWORD");
  15. System.setProperty("javax.net.ssl.keyStore","C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
  16. System.setProperty("javax.net.ssl.trustStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts");
  17. System.setProperty("javax.net.ssl.trustStorePassword","YOURPASSWORD");
  18. System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
  19.  
  20. UserGroupInformation.setConfiguration(config);
  21. UserGroupInformation.setLoginUser(UserGroupInformation.loginUserFromKeytabAndReturnUGI("myuser/hadoop@REALM.CA", "c:\\data\\myuser.keytab"));
  22.  
  23. System.out.println(UserGroupInformation.getLoginUser());
  24. System.out.println(UserGroupInformation.getCurrentUser());

HDFS/Yarn/MapRed: Kerberize/SSL

In this tutorial I will show you how to use Kerberos/SSL with HDFS/Yarn/MapRed. I will use self signed certs for this example. Before you begin ensure you have installed Kerberos Server and Hadoop.

This assumes your hostname is “hadoop”

Create Kerberos Principals

  1. cd /etc/security/keytabs/
  2.  
  3. sudo kadmin.local
  4.  
  5. #You can list princepals
  6. listprincs
  7.  
  8. #Create the following principals
  9. addprinc -randkey nn/hadoop@REALM.CA
  10. addprinc -randkey jn/hadoop@REALM.CA
  11. addprinc -randkey dn/hadoop@REALM.CA
  12. addprinc -randkey sn/hadoop@REALM.CA
  13. addprinc -randkey nm/hadoop@REALM.CA
  14. addprinc -randkey rm/hadoop@REALM.CA
  15. addprinc -randkey jhs/hadoop@REALM.CA
  16. addprinc -randkey HTTP/hadoop@REALM.CA
  17.  
  18. #We are going to create a user to access with later
  19. addprinc -pw hadoop myuser/hadoop@REALM.CA
  20. xst -k myuser.keytab myuser/hadoop@REALM.CA
  21.  
  22. #Create the keytab files.
  23. #You will need these for Hadoop to be able to login
  24. xst -k nn.service.keytab nn/hadoop@REALM.CA
  25. xst -k jn.service.keytab jn/hadoop@REALM.CA
  26. xst -k dn.service.keytab dn/hadoop@REALM.CA
  27. xst -k sn.service.keytab sn/hadoop@REALM.CA
  28. xst -k nm.service.keytab nm/hadoop@REALM.CA
  29. xst -k rm.service.keytab rm/hadoop@REALM.CA
  30. xst -k jhs.service.keytab jhs/hadoop@REALM.CA
  31. xst -k spnego.service.keytab HTTP/hadoop@REALM.CA

Set Keytab Permissions/Ownership

  1. sudo chown root:hadoopuser /etc/security/keytabs/*
  2. sudo chmod 750 /etc/security/keytabs/*

Stop the Cluster

  1. stop-dfs.sh
  2. stop-yarn.sh
  3. mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver

Hosts Update

  1. sudo nano /etc/hosts
  2.  
  3. #Remove 127.0.1.1 line
  4.  
  5. #Change 127.0.0.1 to the following
  6. #Notice how realm.ca is there its because we need to tell where that host resides
  7. 127.0.0.1 realm.ca hadoop localhost

hadoop-env.sh

We don’t set the HADOOP_SECURE_DN_USER because we are going to use Kerberos

  1. sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
  2.  
  3. #Locate "export ${HADOOP_SECURE_DN_USER}=${HADOOP_SECURE_DN_USER}"
  4. #and change to
  5.  
  6. export HADOOP_SECURE_DN_USER=

core-site.xml

  1. nano /usr/local/hadoop/etc/hadoop/core-site.xml
  2.  
  3. <configuration>
  4. <property>
  5. <name>fs.defaultFS</name>
  6. <value>hdfs://NAMENODE:54310</value>
  7. <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming
  8. the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
  9. </property>
  10. <property>
  11. <name>hadoop.tmp.dir</name>
  12. <value>/app/hadoop/tmp</value>
  13. </property>
  14. <property>
  15. <name>hadoop.proxyuser.hadoopuser.hosts</name>
  16. <value>*</value>
  17. </property>
  18. <property>
  19. <name>hadoop.proxyuser.hadoopuser.groups</name>
  20. <value>*</value>
  21. </property>
  22. <property>
  23. <name>hadoop.security.authentication</name>
  24. <value>kerberos</value> <!-- A value of "simple" would disable security. -->
  25. </property>
  26. <property>
  27. <name>hadoop.security.authorization</name>
  28. <value>true</value>
  29. </property>
  30. <property>
  31. <name>hadoop.security.auth_to_local</name>
  32. <value>
  33. RULE:[2:$1@$0](nn/.*@.*REALM.TLD)s/.*/hdfs/
  34. RULE:[2:$1@$0](jn/.*@.*REALM.TLD)s/.*/hdfs/
  35. RULE:[2:$1@$0](dn/.*@.*REALM.TLD)s/.*/hdfs/
  36. RULE:[2:$1@$0](sn/.*@.*REALM.TLD)s/.*/hdfs/
  37. RULE:[2:$1@$0](nm/.*@.*REALM.TLD)s/.*/yarn/
  38. RULE:[2:$1@$0](rm/.*@.*REALM.TLD)s/.*/yarn/
  39. RULE:[2:$1@$0](jhs/.*@.*REALM.TLD)s/.*/mapred/
  40. DEFAULT
  41. </value>
  42. </property>
  43. <property>
  44. <name>hadoop.rpc.protection</name>
  45. <value>integrity</value>
  46. </property>
  47. <property>
  48. <name>hadoop.ssl.require.client.cert</name>
  49. <value>false</value>
  50. </property>
  51. <property>
  52. <name>hadoop.ssl.hostname.verifier</name>
  53. <value>DEFAULT</value>
  54. </property>
  55. <property>
  56. <name>hadoop.ssl.keystores.factory.class</name>
  57. <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
  58. </property>
  59. <property>
  60. <name>hadoop.ssl.server.conf</name>
  61. <value>ssl-server.xml</value>
  62. </property>
  63. <property>
  64. <name>hadoop.ssl.client.conf</name>
  65. <value>ssl-client.xml</value>
  66. </property>
  67. <property>
  68. <name>hadoop.rpc.protection</name>
  69. <value>integrity</value>
  70. </property>
  71. </configuration>

ssl-server.xml

Change ssl-server.xml.example to ssl-server.xml

  1. cp /usr/local/hadoop/etc/hadoop/ssl-server.xml.example /usr/local/hadoop/etc/hadoop/ssl-server.xml
  2.  
  3. nano /usr/local/hadoop/etc/hadoop/ssl-server.xml

Update properties

  1. <configuration>
  2. <property>
  3. <name>ssl.server.truststore.location</name>
  4. <value>/etc/security/serverKeys/truststore.jks</value>
  5. <description>Truststore to be used by NN and DN. Must be specified.</description>
  6. </property>
  7. <property>
  8. <name>ssl.server.truststore.password</name>
  9. <value>PASSWORD</value>
  10. <description>Optional. Default value is "".</description>
  11. </property>
  12. <property>
  13. <name>ssl.server.truststore.type</name>
  14. <value>jks</value>
  15. <description>Optional. The keystore file format, default value is "jks".</description>
  16. </property>
  17. <property>
  18. <name>ssl.server.truststore.reload.interval</name>
  19. <value>10000</value>
  20. <description>Truststore reload check interval, in milliseconds. Default value is 10000 (10 seconds).</description>
  21. </property>
  22. <property>
  23. <name>ssl.server.keystore.location</name>
  24. <value>/etc/security/serverKeys/keystore.jks</value>
  25. <description>Keystore to be used by NN and DN. Must be specified.</description>
  26. </property>
  27. <property>
  28. <name>ssl.server.keystore.password</name>
  29. <value>PASSWORD</value>
  30. <description>Must be specified.</description>
  31. </property>
  32. <property>
  33. <name>ssl.server.keystore.keypassword</name>
  34. <value>PASSWORD</value>
  35. <description>Must be specified.</description>
  36. </property>
  37. <property>
  38. <name>ssl.server.keystore.type</name>
  39. <value>jks</value>
  40. <description>Optional. The keystore file format, default value is "jks".</description>
  41. </property>
  42. <property>
  43. <name>ssl.server.exclude.cipher.list</name>
  44. <value>TLS_ECDHE_RSA_WITH_RC4_128_SHA,SSL_DHE_RSA_EXPORT_WITH_DES40_CBC_SHA,
  45. SSL_RSA_WITH_DES_CBC_SHA,SSL_DHE_RSA_WITH_DES_CBC_SHA,
  46. SSL_RSA_EXPORT_WITH_RC4_40_MD5,SSL_RSA_EXPORT_WITH_DES40_CBC_SHA,
  47. SSL_RSA_WITH_RC4_128_MD5</value>
  48. <description>Optional. The weak security cipher suites that you want excluded from SSL communication.</description>
  49. </property>
  50. </configuration>

ssl-client.xml

Change ssl-client.xml.example to ssl-client.xml

  1. cp /usr/local/hadoop/etc/hadoop/ssl-client.xml.example /usr/local/hadoop/etc/hadoop/ssl-client.xml
  2.  
  3. nano /usr/local/hadoop/etc/hadoop/ssl-client.xml

Update properties

  1. <configuration>
  2. <property>
  3. <name>ssl.client.truststore.location</name>
  4. <value>/etc/security/serverKeys/truststore.jks</value>
  5. <description>Truststore to be used by clients like distcp. Must be specified.</description>
  6. </property>
  7. <property>
  8. <name>ssl.client.truststore.password</name>
  9. <value>PASSWORD</value>
  10. <description>Optional. Default value is "".</description>
  11. </property>
  12. <property>
  13. <name>ssl.client.truststore.type</name>
  14. <value>jks</value>
  15. <description>Optional. The keystore file format, default value is "jks".</description>
  16. </property>
  17. <property>
  18. <name>ssl.client.truststore.reload.interval</name>
  19. <value>10000</value>
  20. <description>Truststore reload check interval, in milliseconds. Default value is 10000 (10 seconds).</description>
  21. </property>
  22. <property>
  23. <name>ssl.client.keystore.location</name>
  24. <value></value>
  25. <description>Keystore to be used by clients like distcp. Must be specified.</description>
  26. </property>
  27. <property>
  28. <name>ssl.client.keystore.password</name>
  29. <value></value>
  30. <description>Optional. Default value is "".</description>
  31. </property>
  32. <property>
  33. <name>ssl.client.keystore.keypassword</name>
  34. <value></value>
  35. <description>Optional. Default value is "".</description>
  36. </property>
  37. <property>
  38. <name>ssl.client.keystore.type</name>
  39. <value>jks</value>
  40. <description>Optional. The keystore file format, default value is "jks".</description>
  41. </property>
  42. </configuration>

mapred-site.xml

Just add the following to the config to let it know the Kerberos keytabs to use.

  1. nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
  2.  
  3. <property>
  4. <name>mapreduce.jobhistory.keytab</name>
  5. <value>/etc/security/keytabs/jhs.service.keytab</value>
  6. </property>
  7. <property>
  8. <name>mapreduce.jobhistory.principal</name>
  9. <value>jhs/_HOST@REALM.CA</value>
  10. </property>
  11. <property>
  12. <name>mapreduce.jobhistory.http.policy</name>
  13. <value>HTTPS_ONLY</value>
  14. </property>

hdfs-site.xml

Add the following properties

  1. nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
  2.  
  3. <property>
  4. <name>dfs.http.policy</name>
  5. <value>HTTPS_ONLY</value>
  6. </property>
  7. <property>
  8. <name>hadoop.ssl.enabled</name>
  9. <value>true</value>
  10. </property>
  11. <property>
  12. <name>dfs.datanode.https.address</name>
  13. <value>NAMENODE:50475</value>
  14. </property>
  15. <property>
  16. <name>dfs.namenode.https-address</name>
  17. <value>NAMENODE:50470</value>
  18. <description>Your NameNode hostname for http access.</description>
  19. </property>
  20. <property>
  21. <name>dfs.namenode.secondary.https-address</name>
  22. <value>NAMENODE:50091</value>
  23. <description>Your Secondary NameNode hostname for http access.</description>
  24. </property>
  25. <property>
  26. <name>dfs.namenode.https-bind-host</name>
  27. <value>0.0.0.0</value>
  28. </property>
  29. <property>
  30. <name>dfs.block.access.token.enable</name>
  31. <value>true</value>
  32. <description> If "true", access tokens are used as capabilities for accessing datanodes. If "false", no access tokens are checked on accessing datanod</description>
  33. </property>
  34. <property>
  35. <name>dfs.namenode.kerberos.principal</name>
  36. <value>nn/_HOST@REALM.CA</value>
  37. <description> Kerberos principal name for the NameNode</description>
  38. </property>
  39. <property>
  40. <name>dfs.secondary.namenode.kerberos.principal</name>
  41. <value>sn/_HOST@REALM.CA</value>
  42. <description>Kerberos principal name for the secondary NameNode.</description>
  43. </property>
  44. <property>
  45. <name>dfs.web.authentication.kerberos.keytab</name>
  46. <value>/etc/security/keytabs/spnego.service.keytab</value>
  47. <description>The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.</description>
  48. </property>
  49. <property>
  50. <name>dfs.namenode.keytab.file</name>
  51. <value>/etc/security/keytabs/nn.service.keytab</value>
  52. <description>Combined keytab file containing the namenode service and host principals.</description>
  53. </property>
  54. <property>
  55. <name>dfs.datanode.keytab.file</name>
  56. <value>/etc/security/keytabs/dn.service.keytab</value>
  57. <description>The filename of the keytab file for the DataNode.</description>
  58. </property>
  59. <property>
  60. <name>dfs.datanode.kerberos.principal</name>
  61. <value>dn/_HOST@REALM.CA</value>
  62. <description>The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real host name.</description>
  63. </property>
  64. <property>
  65. <name>dfs.namenode.kerberos.internal.spnego.principal</name>
  66. <value>${dfs.web.authentication.kerberos.principal}</value>
  67. </property>
  68. <property>
  69. <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
  70. <value>>${dfs.web.authentication.kerberos.principal}</value>
  71. </property>
  72. <property>
  73. <name>dfs.web.authentication.kerberos.principal</name>
  74. <value>HTTP/_HOST@REALM.CA</value>
  75. <description>The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.</description>
  76. </property>
  77. <property>
  78. <name>dfs.data.transfer.protection</name>
  79. <value>integrity</value>
  80. </property>
  81. <property>
  82. <name>dfs.datanode.address</name>
  83. <value>NAMENODE:50010</value>
  84. </property>
  85. <property>
  86. <name>dfs.secondary.namenode.keytab.file</name>
  87. <value>/etc/security/keytabs/sn.service.keytab</value>
  88. </property>
  89. <property>
  90. <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
  91. <value>HTTP/_HOST@REALM.CA</value>
  92. </property>
  93. <property>
  94. <name>dfs.webhdfs.enabled</name>
  95. <value>true</value>
  96. </property>

Remove the following properties

dfs.namenode.http-address
dfs.namenode.secondary.http-address
dfs.namenode.http-bind-host

yarn-site.xml

Add the following properties

nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

<property>
	<name>yarn.http.policy</name>
	<value>HTTPS_ONLY</value>
</property>
<property>
	<name>yarn.resourcemanager.webapp.https.address</name>
	<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>NAMENODE</value>
</property>
<property>
	<name>yarn.nodemanager.bind-host</name>
	<value>0.0.0.0</value>
</property>
<property>
	<name>yarn.nodemanager.webapp.address</name>
	<value>${yarn.nodemanager.hostname}:8042</value>
</property>
<property>
	<name>yarn.resourcemanager.principal</name>
	<value>rm/_HOST@REALM.CA</value>
</property>
<property>
	<name>yarn.resourcemanager.keytab</name>
	<value>/etc/security/keytabs/rm.service.keytab</value>
</property>
<property>
	<name>yarn.nodemanager.principal</name>
	<value>nm/_HOST@REALM.CA</value>
</property>
<property>
	<name>yarn.nodemanager.keytab</name>
	<value>/etc/security/keytabs/nm.service.keytab</value>
</property>
<property>
	<name>yarn.nodemanager.hostname</name>
	<value>NAMENODE</value>
</property>
<property>
	<name>yarn.resourcemanager.bind-host</name>
	<value>0.0.0.0</value>
</property>
<property>
	<name>yarn.timeline-service.bind-host</name>
	<value>0.0.0.0</value>
</property>

Remove the following properties

yarn.resourcemanager.webapp.address

SSL

Setup SSL Directories

sudo mkdir -p /etc/security/serverKeys
sudo chown -R root:hadoopuser /etc/security/serverKeys/
sudo chmod 755 /etc/security/serverKeys/

cd /etc/security/serverKeys

Setup Keystore

sudo keytool -genkey -alias NAMENODE -keyalg RSA -keysize 1024 -dname "CN=NAMENODE,OU=ORGANIZATION_UNIT,C=canada" -keypass PASSWORD -keystore /etc/security/serverKeys/keystore.jks -storepass PASSWORD
sudo keytool -export -alias NAMENODE -keystore /etc/security/serverKeys/keystore.jks -rfc -file /etc/security/serverKeys/NAMENODE.csr -storepass PASSWORD

Setup Truststore

sudo keytool -import -noprompt -alias NAMENODE -file /etc/security/serverKeys/NAMENODE.csr -keystore /etc/security/serverKeys/truststore.jks -storepass PASSWORD

Generate Self Signed Certifcate

sudo openssl genrsa -out /etc/security/serverKeys/NAMENODE.key 2048

sudo openssl req -x509 -new -key /etc/security/serverKeys/NAMENODE.key -days 300 -out /etc/security/serverKeys/NAMENODE.pem

sudo keytool -keystore /etc/security/serverKeys/keystore.jks -alias NAMENODE -certreq -file /etc/security/serverKeys/NAMENODE.cert -storepass PASSWORD -keypass PASSWORD

sudo openssl x509 -req -CA /etc/security/serverKeys/NAMENODE.pem -CAkey /etc/security/serverKeys/NAMENODE.key -in /etc/security/serverKeys/NAMENODE.cert -out /etc/security/serverKeys/NAMENODE.signed -days 300 -CAcreateserial

Setup File Permissions

sudo chmod 440 /etc/security/serverKeys/*
sudo chown root:hadoopuser /etc/security/serverKeys/*

Start the Cluster

start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver

Create User Directory

kinit -kt /etc/security/keytabs/myuser.keytab myuser/hadoop@REALM.CA
#ensure the login worked
klist

#Create hdfs directory now
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/myuser

#remove kerberos ticket
kdestroy

URL

https://NAMENODE:50470
https://NAMENODE:50475
https://NAMENODE:8090

References

https://www.ibm.com/support/knowledgecenter/en/SSPT3X_4.2.0/com.ibm.swg.im.infosphere.biginsights.admin.doc/doc/admin_ssl_hbase_mr_yarn_hdfs_web.html

ElasticSearch: High Level Client Search Scrolling

This entry is part 4 of 4 in the series ElasticSearch High Level Rest Client

In this tutorial I will show you how to perform a search scroll using the high level client. If you have not already done so please follow the search tutorial.

The reason you following the search tutorial first is that sets up the search. So you just have to do a few more steps.

Imports:

import org.elasticsearch.action.search.SearchScrollRequest;
import org.elasticsearch.common.unit.TimeValue;

Modify the “SearchRequest”. A recommended timeout is 60000 or 1m.

request.scroll(new TimeValue(60000));

Once you perform the initial search now you will get a “scrollId”. Use that to generate your new “SearchScrollRequest” using that scrollId. One thing to note is the “scrollRequest” timeout value. Set this or it may not work.

final SearchScrollRequest searchScrollRequest = new SearchScrollRequest(scrollId);
searchScrollRequest.scroll(new TimeValue(60000));

Now the searchResponse that we used initially we can repurpose to continue scrolling the results.

searchResponse = client.searchScroll(searchScrollRequest);

We know that their are no more results when the scrollId is null or when getHits length is 0.

searchResponse.getHits().getHits().length > 0

ElasticSearch: High Level Client Search

This entry is part 3 of 4 in the series ElasticSearch High Level Rest Client

In this tutorial I will show you how to perform a search using the high level client. If you have not already done so please connect to ElasticSearch.

Imports

import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.action.search.SearchType;

Now we can perform the search.

final SearchRequest request = new SearchRequest();
request.searchType(SearchType.QUERY_THEN_FETCH);

final String[] types = { "doc" };
final String[] indexes = { "index" };

//Specify the types that your search applies to.
//Note that this is not needed. If ommitted it will search all.
request.types(types);

//Specify the indexes that your search applies to.
//Note that this is not needed. If ommitted it will search all.
request.indices(indexes);

final SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//You can add any type of query into this query. Adjust to what you need.
searchSourceBuilder.query(MyQuery);
request.source(searchSourceBuilder);

final SearchResponse searchResponse = client.search(request);

//This will let us know if the search was terminated early.
final Boolean terminatedEarly = searchResponse.isTerminatedEarly();
//This will let us know if it timed out.
final boolean timedOut = searchResponse.isTimedOut();

//Now to loop through our hits to do what we need to
final SearchHits searchHits = searchResponse.getHits();
for (final SearchHit hit : searchHits) {
  //Do work
}

 

 

 

ElasticSearch: High Level Client Post

This entry is part 2 of 4 in the series ElasticSearch High Level Rest Client

In this tutorial I will show you how to perform a POST request. If you have not connected first please do so before continuing.

Imports

import org.apache.http.HttpEntity;
import org.apache.http.nio.entity.NStringEntity;
import org.apache.http.entity.ContentType;
import org.apache.http.util.EntityUtils;
import org.elasticsearch.client.Response;

Now we can perform the POST to ElasticSearch.

final Integer id = 1;
final String document = "{\"key\": 1 }";
final HttpEntity httpEntity = new NStringEntity(document, ContentType.APPLICATION_JSON);

final Response response = restHighLevelClient.getLowLevelClient().performRequest("POST", "/indexName/indexType/" + id, Collections.<String, String>emptyMap(), httpEntity);

//Now you can print the response
System.out.println(EntityUtils.toString(response.getEntity()));

ElasticSearch: Low Level Client Get

This entry is part 3 of 3 in the series ElasticSearch Low Level Rest Client

In this tutorial I will show you how to put a json document into ElasticSearch. If you have not first connected to ElasticSearch please do so before continuing.

POM.xml

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.9.5</version>
</dependency>

Imports

import org.apache.http.HttpEntity;
import org.apache.http.nio.entity.NStringEntity;
import org.apache.http.entity.ContentType;
import org.elasticsearch.client.Response;
import org.apache.http.util.EntityUtils;

Now perform the GET request using the low level client.

ObjectMapper objectMapper = new ObjectMapper();
final String document = "{\"key\": 1 }";
final JsonNode document = objectMapper.readTree("{" +
   " \"query\": {" +
   " \"match\" : {" +
   " \"key\" : 1 }}}");
final HttpEntity httpEntity = new NStringEntity(document.toString(), ContentType.APPLICATION_JSON);
final Response response = restClient.performRequest("GET", "/indexName/indexType/_search", Collections.<String, String>emptyMap(), httpEntity);
 
//Now you can print the response
System.out.println(EntityUtils.toString(response.getEntity()));

//OR get the content
final JsonNode content = objectMapper.readTree(response.getEntity().getContent());
System.out.println(content);

ElasticSearch: Low Level Client Put

This entry is part 2 of 3 in the series ElasticSearch Low Level Rest Client

In this tutorial I will show you how to put a json document into ElasticSearch. If you have not first connected to ElasticSearch please do so before continuing.

Imports

import org.apache.http.HttpEntity;
import org.apache.http.nio.entity.NStringEntity;
import org.elasticsearch.client.Response;
import org.apache.http.entity.ContentType;
import org.apache.http.util.EntityUtils;

Now perform the PUT request using the low level client.

final String document = "{\"key\": 1 }";
final HttpEntity httpEntity = new NStringEntity(document, ContentType.APPLICATION_JSON);
final Integer id = 1;
final Response response = restClient.performRequest("PUT", "/indexName/indexType/" + id, Collections.<String, String>emptyMap(), httpEntity);

//Now you can print the response
System.out.println(EntityUtils.toString(response.getEntity()));

ElasticSearch: High Level Rest Client Connection

This entry is part 1 of 4 in the series ElasticSearch High Level Rest Client

In this tutorial I will show you how to use the ElasticSearch high level rest client.

First you will need to add the low level rest to the pom.

<properties>
	<elasticSearch.version>6.2.4</elasticSearch.version>
</properties>

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>${elasticSearch.version}</version>
</dependency>

Next you will need to specify the imports.

import java.util.List;
import java.util.ArrayList;
import java.util.Arrays;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;

Now you can connect to ElasticSearch.

final List hosts = new ArrayList<>(Arrays.asList("localhost"));
final Integer port = 9200;
final String scheme = "http";
		
final HttpHost[] httpHosts = hosts.stream().map(host -> new HttpHost(host, port, scheme)).toArray(HttpHost[]::new);

final RestClientBuilder restClientBuilder = RestClient.builder(httpHosts);
final RestHighLevelClient restHighLevelClient = new RestHighLevelClient(restClientBuilder);

Now you can do whatever you need to!