Hive Kerberos Installation

(Last Updated On: )

We are going to install Hive over Hadoop and perform a basic query. Ensure you install Kerberos and Hadoop with Kerberos.

This assumes your hostname is “hadoop”

Download Hive:

  1. wget http://apache.forsale.plus/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz
  2. tar -xzf apache-hive-2.3.3-bin.tar.gz
  3. sudo mv apache-hive-2.3.3-bin /usr/local/hive
  4. sudo chown -R root:hadoopuser /usr/local/hive/

Setup .bashrc:

  1. sudo nano ~/.bashrc

Add the following to the end of the file.

#HIVE VARIABLES START
export HIVE_HOME=/usr/local/hive
export HIVE_CONF_DIR=/usr/local/hive/conf
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hadoop/lib/*:/usr/local/hive/lib/*
#HIVE VARIABLES STOP

  1. source ~/.bashrc

Create warehouse on hdfs

  1. kinit -kt /etc/security/keytabs/myuser.keytab myuser/hadoop@REAL.CA
  2. hdfs dfs -mkdir -p /user/hive/warehouse
  3. hdfs dfs -mkdir /tmp
  4. hdfs dfs -chmod g+w /tmp
  5. hdfs dfs -chmod g+w /user/hive/warehouse

Create Kerberos Principals

  1. cd /etc/security/keytabs
  2. sudo kadmin.local
  3. addprinc -randkey hive/hadoop@REALM.CA
  4. addprinc -randkey hivemetastore/hadoop@REALM.CA
  5. addprinc -randkey hive-spnego/hadoop@REALM.CA
  6. xst -kt hive.service.keytab hive/hadoop@REALM.CA
  7. xst -kt hivemetastore.service.keytab hivemetastore/hadoop@REALM.CA
  8. xst -kt hive-spnego.service.keytab hive-spnego/hadoop@REALM.CA
  9. q

Set Keytab Permissions/Ownership

  1. sudo chown root:hadoopuser /etc/security/keytabs/*
  2. sudo chmod 750 /etc/security/keytabs/*

hive-env.sh

  1. cd $HIVE_HOME/conf
  2. sudo cp hive-env.sh.template hive-env.sh
  3.  
  4. sudo nano /usr/local/hive/conf/hive-env.sh
  5.  
  6. #locate "HADOOP_HOME" and change to be this
  7. export HADOOP_HOME=/usr/local/hadoop
  8.  
  9. #locate "HIVE_CONF_DIR" and change to be this
  10. export HIVE_CONF_DIR=/usr/local/hive/conf

hive-site.xml

Chekck out this link for the configuration properties.

  1. sudo cp /usr/local/hive/conf/hive-default.xml.template /usr/local/hive/conf/hive-site.xml
  2.  
  3. sudo nano /usr/local/hive/conf/hive-site.xml
  4.  
  5. #Modify the following properties
  6.  
  7. <property>
  8. <name>system:user.name</name>
  9. <value>${user.name}</value>
  10. </property>
  11. <property>
  12. <name>javax.jdo.option.ConnectionURL</name>
  13. <value>jdbc:postgresql://myhost:5432/metastore</value>
  14. </property>
  15. <property>
  16. <name>javax.jdo.option.ConnectionDriverName</name>
  17. <value>org.postgresql.Driver</value>
  18. </property>
  19. <property>
  20. <name>hive.metastore.warehouse.dir</name>
  21. <value>/user/hive/warehouse</value>
  22. </property>
  23. <property>
  24. <name>javax.jdo.option.ConnectionUserName</name>
  25. <value>hiveuser</value>
  26. </property>
  27. <property>
  28. <name>javax.jdo.option.ConnectionPassword</name>
  29. <value>PASSWORD</value>
  30. </property>
  31. <property>
  32. <name>hive.exec.local.scratchdir</name>
  33. <value>/tmp/${system:user.name}</value>
  34. <description>Local scratch space for Hive jobs</description>
  35. </property>
  36. <property>
  37. <name>hive.querylog.location</name>
  38. <value>/tmp/${system:user.name}</value>
  39. <description>Location of Hive run time structured log file</description>
  40. </property>
  41. <property>
  42. <name>hive.downloaded.resources.dir</name>
  43. <value>/tmp/${hive.session.id}_resources</value>
  44. <description>Temporary local directory for added resources in the remote file system.</description>
  45. </property>
  46. <property>
  47. <name>hive.server2.logging.operation.log.location</name>
  48. <value>/tmp/${system:user.name}/operation_logs</value>
  49. <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  50. </property>
  51. <property>
  52. <name>hive.metastore.uris</name>
  53. <value>thrift://0.0.0.0:9083</value>
  54. <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
  55. </property>
  56. <property>
  57. <name>hive.server2.webui.host</name>
  58. <value>0.0.0.0</value>
  59. </property>
  60. <property>
  61. <name>hive.server2.webui.port</name>
  62. <value>10002</value>
  63. </property>
  64. <property>
  65. <name>hive.metastore.port</name>
  66. <value>9083</value>
  67. </property>
  68. <property>
  69. <name>hive.server2.transport.mode</name>
  70. <value>binary</value>
  71. </property>
  72. <property>
  73. <name>hive.server2.thrift.sasl.qop</name>
  74. <value>auth-int</value>
  75. </property>
  76. <property>
  77. <name>hive.server2.authentication</name>
  78. <value>KERBEROS</value>
  79. <description>authenticationtype</description>
  80. </property>
  81. <property>
  82. <name>hive.server2.authentication.kerberos.principal</name>
  83. <value>hive/_HOST@REALM.CA</value>
  84. <description>HiveServer2 principal. If _HOST is used as the FQDN portion, it will be replaced with the actual hostname of the running instance.</description>
  85. </property>
  86. <property>
  87. <name>hive.server2.authentication.kerberos.keytab</name>
  88. <value>/etc/security/keytabs/hive.service.keytab</value>
  89. <description>Keytab file for HiveServer2 principal</description>
  90. </property>
  91. <property>
  92. <name>hive.metastore.sasl.enabled</name>
  93. <value>true</value>
  94. <description>If true, the metastore thrift interface will be secured with SASL. Clients
  95. must authenticate with Kerberos.</description>
  96. </property>
  97. <property>
  98. <name>hive.metastore.kerberos.keytab.file</name>
  99. <value>/etc/security/keytabs/hivemetastore.service.keytab</value>
  100. <description>The path to the Kerberos Keytab file containing the metastore thrift
  101. server's service principal.</description>
  102. </property>
  103. <property>
  104. <name>hive.metastore.kerberos.principal</name>
  105. <value>hivemetastore/_HOST@REALM.CA</value>
  106. <description>The service principal for the metastore thrift server. The special string _HOST will be replaced automatically with the correct host name.</description>
  107. </property>
  108. <property>
  109. <name>hive.security.authorization.enabled</name>
  110. <value>true</value>
  111. <description>enable or disable the hive client authorization</description>
  112. </property>
  113. <property>
  114. <name>hive.metastore.pre.event.listeners</name>
  115. <value>org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener</value>
  116. <description>List of comma separated listeners for metastore events.</description>
  117. </property>
  118. <property>
  119. <name>hive.security.metastore.authorization.manager</name>
  120. <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value>
  121. <description>
  122. Names of authorization manager classes (comma separated) to be used in the metastore
  123. for authorization. The user defined authorization class should implement interface
  124. org.apache.hadoop.hive.ql.security.authorization.HiveMetastoreAuthorizationProvider.
  125. All authorization manager classes have to successfully authorize the metastore API
  126. call for the command execution to be allowed.
  127. </description>
  128. </property>
  129. <property>
  130. <name>hive.security.metastore.authenticator.manager</name>
  131. <value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value>
  132. <description>
  133. authenticator manager class name to be used in the metastore for authentication.
  134. The user defined authenticator should implement interface org.apache.hadoop.hive.ql.security.HiveAuthenticationProvider.
  135. </description>
  136. </property>
  137. <property>
  138. <name>hive.security.metastore.authorization.auth.reads</name>
  139. <value>true</value>
  140. <description>If this is true, metastore authorizer authorizes read actions on database, table</description>
  141. </property>
  142. <property>
  143. <name>datanucleus.autoCreateSchema</name>
  144. <value>false</value>
  145. </property>

Hadoop core-site.xml

Notice here how it’s .hive. that is used with the storage based authentication.

  1. sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml
  2.  
  3. <property>
  4. <name>hadoop.proxyuser.hive.hosts</name>
  5. <value>*</value>
  6. </property>
  7. <property>
  8. <name>hadoop.proxyuser.hive.groups</name>
  9. <value>*</value>
  10. </property>

Install Postgres 9.6

Follow this install for installing Psotgresql 9.6

  1. sudo su - postgres
  2. psql
  3.  
  4. CREATE USER hiveuser WITH PASSWORD 'PASSWORD';
  5. CREATE DATABASE metastore;
  6. GRANT ALL PRIVILEGES ON DATABASE metastore TO hiveuser;
  7. \q
  8. exit

Initiate Postgres Schema

  1. schematool -dbType postgres -initSchema

Start Metastore & HiveServer2

  1. nohup /usr/local/hive/bin/hive --service metastore --hiveconf hive.log.file=hivemetastore.log >/var/log/hive/hivemetastore.out 2>/var/log/hive/hivemetastoreerr.log &
  2.  
  3. nohup /usr/local/hive/bin/hiveserver2 --hiveconf hive.metastore.uris=" " --hiveconf hive.log.file=hiveserver2.log >/var/log/hive/hiveserver2.out 2> /var/log/hive/hiveserver2err.log &

Auto Start

  1. sudo mkdir /var/log/hive/
  2. sudo chown root:hduser /var/log/hive
  3. sudo chmod 777 /var/log/hive
  4.  
  5. crontab -e
  6.  
  7. #Add the following
  8. @reboot nohup /usr/local/hive/bin/hive --service metastore --hiveconf hive.log.file=hivemetastore.log >/var/log/hive/hivemetastore.out 2>/var/log/hive/hivemetastoreerr.log &
  9. @reboot nohup /usr/local/hive/bin/hiveserver2 --hiveconf hive.metastore.uris=" " --hiveconf hive.log.file=hiveserver2.log >/var/log/hive/hiveserver2.out 2> /var/log/hive/hiveserver2err.log &

Now you can check the hive version

  1. hive --version

Hive Web URL

http://hadoop:10002/

Beeline

  1. #We first need to have a ticket to access beeline using the hive kerberos user we setup earlier.
  2. kinit -kt /etc/security/keytabs/hive.service.keytab hive/hadoop@REALM.CA
  3.  
  4. #Now we can get into beeline using that principal
  5. beeline -u "jdbc:hive2://0.0.0.0:10000/default;principal=hive/hadoop@REALM.CA;"
  6.  
  7. #You can also just get into beeline then connect from there
  8. beeline
  9. beeline>!connect jdbc:hive2://0.0.0.0:10000/default;principal=hive/hadoop@REALM.CA
  10.  
  11. #Disconnect from beeline
  12. !q

References

http://www.bogotobogo.com/Hadoop/BigData_hadoop_Hive_Install_On_Ubuntu_16_04.php
https://maprdocs.mapr.com/home/Hive/Config-RemotePostgreSQLForHiveMetastore.html
https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool#HiveSchemaTool-TheHiveSchemaTool
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration