This tutorial will show you some common usage for working with tables. If you have no installed Hive yet please follow this tutorial.
Create Database
CREATE DATABASE test;
Use Database
use test;
A place for tutorials on programming and other such works.
This tutorial will show you some common usage for working with tables. If you have no installed Hive yet please follow this tutorial.
CREATE DATABASE test;
use test;
This tutorial will show you how to use struct. If you have no installed Hive yet please follow this tutorial.
Create Table with Struct:
CREATE TABLE test_struct ( columnA STRING, columnB VARCHAR(15), columnC INT, columnD TIMESTAMP, columnE DATE, columnF STRUCT<key:STRING, value:INT> ) STORED AS ORC;
Insert Data:
INSERT INTO test_struct SELECT '1', '2', 1, '2019-02-07 20:58:27', '2019-02-07', NAMED_STRUCT('key', 'val', 'value', 1);
Select Data:
This will give back the value of “key” and “value” in columnF.
SELECT columnF.key, columnF.value FROM test_struct;
This tutorial will show you how to use map. If you have no installed Hive yet please follow this tutorial.
Create Table with Map:
CREATE TABLE test_map ( columnA STRING, columnB VARCHAR(15), columnC INT, columnD TIMESTAMP, columnE DATE, columnF MAP<STRING, INT> ) STORED AS ORC;
Insert Data:
INSERT INTO test_map SELECT '1', '2', 1, '2019-02-07 20:58:27', '2019-02-07', MAP('Val', 1);
Select Data:
This will give back the value of “Val” in columnF.
SELECT columnF['Val'] FROM test_map;
This tutorial will show you some misc usage for working with Hive. If you have no installed Hive yet please follow this tutorial.
NVL:
Check if value is null then substitute other value.
SELECT NVL(columnA, 'was null') FROM test;
CAST:
If columnE was a date and you wanted it to be a string.
SELECT CAST(columnE AS STRING) FROM test;
Concat:
This will concat the strings together giving “Test the code!”
SELECT CONCAT('Test', ' the ', 'code!');
CONCAT_WS:
This will concat the strings together starting at “Test” and use the first position as the seperator giving “Test the code”
SELECT CONCAT_WS(' ', 'Test', 'the', 'code');
MIN:
This will give the minimum value of columnC
SELECT MIN(columnC) AS Max_ColumnC FROM test;
MAX:
This will give the maximum value of columnC
SELECT MAX(columnC) AS Max_ColumnC FROM test;
DISTINCT:
This will select the distinct columnA and columnB
SELECT DISTINCT columnA, columnB FROM test;
CASE:
SELECT CASE WHEN columnA = 'val' THEN 'Is Val' ELSE 'Not Val' END FROM test;
COLLECT_SET:
This will collect all values in columnC and select the first index.
SELECT COLLECT_SET(columnC)[0] FROM test;
CURRENT_DATE:
This will give you the current date in the format YYYY-MM-dd.
SELECT CURRENT_DATE();
UNIX_TIMESTAMP:
This will give you the current timestamp from EPOCH. IE: 1549591492
SELECT UNIX_TIMESTAMP();
FROM_UNIXTIME:
This will take a timestamp and display it in the format YYYY-MM-dd HH:MM:SS.
SELECT FROM_UNIXTIME(1549591492);
TO_DATE:
This will convert a date to YYYY-MM-dd.
SELECT TO_DATE('2019-02-01 01:01:01');
YEAR:
SELECT YEAR(columnE) FROM test;
MONTH:
SELECT MONTH(columnE) FROM test;
DAY:
SELECT DAY(columnE) FROM test;
DATE_ADD:
SELECT DATE_ADD('2019-01-01', 4);
UPPER:
This will upper case columnA.
SELECT UPPER(columnA) FROM test;
LOWER:
This will lower case columnA.
SELECT LOWER(columnA) FROM test;
TRIM:
This will trim leading and trailing spaces in columnA.
SELECT TRIM(columnA) FROM test;
LITERALS:
If a column contains a space you will need to use literal in order to use the AS keyword or when you are defining it in the create table command. Although I don’t recommend this it is possible.
SELECT columnA AS `test column` FROM test;
This tutorial will show you some common usage for working with Hive variables. If you have no installed Hive yet please follow this tutorial.
Create HiveVar:
SET hivevar:MY_DATE='2019-02-01';
Show HiveVar:
This will show you the value of the variable.
SET MY_DATE;
Use HiveVar:
SELECT ${MY_DATE};
This tutorial will show you some common usage for creating views. If you have no installed Hive yet please follow this tutorial.
Create View:
CREATE VIEW IF NOT EXISTS test_view AS SELECT * FROM test;
Drop View:
DROP VIEW IF EXISTS test_view;
This tutorial will show you some common usage for working with tables. If you have no installed Hive yet please follow this tutorial.
Show Tables:
SHOW TABLES;
SHOW TABLES LIKE '*test*';
Table Creation:
CREATE TABLE test (
columnA STRING,
columnB VARCHAR(15),
columnC INT,
columnD TIMESTAMP,
columnE DATE
)
STORED AS ORC;
Table Creation with Partitioning:
CREATE TABLE test_partition (
columnA STRING,
columnB VARCHAR(15),
columnC INT,
columnD TIMESTAMP,
columnE DATE
)
PARTITIONED BY (columnF INT)
STORED AS ORC;
Inline Table Creation:
CREATE TABLE test_inline STORED AS ORC AS
SELECT *
FROM test;
Temporary Table Creation:
CREATE TEMPORARY TABLE temp (
columnA STRING,
columnB VARCHAR(15),
columnC INT,
columnD TIMESTAMP,
columnE DATE
)
STORED AS ORC;
DESC Table:
This will show you the basic definition of a table.
DESC test;
DESC EXTENDED Table:
This will show you the extended definition of a table.
DESC EXTENDED test;
Drop Table:
DROP TABLE IF EXISTS temp;
In this tutorial I will show you how to connect to remote Kerberos Hive cluster using Java. If you haven’t install Hive yet follow the tutorial.
Follow this tutorial to “Installing unlimited strength encryption Java libraries”
If on Windows do the following
#Import it "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -import -file hadoop.csr -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts" -alias "hadoop" #Check it "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -list -v -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts" #If you want to delete it "C:\Program Files\Java\jdk1.8.0_171\bin\keytool" -delete -alias hadoop -keystore "C:\Program Files\Java\jdk1.8.0_171\jre\lib\security\cacerts"
<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>2.3.3</version> <exclusions> <exclusion> <groupId>jdk.tools</groupId> <artifactId>jdk.tools</artifactId> </exclusion> </exclusions> </dependency>
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.security.UserGroupInformation; import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager;
// Setup the configuration object. final Configuration config = new Configuration(); config.set("fs.defaultFS", "swebhdfs://hadoop:50470"); config.set("hadoop.security.authentication", "kerberos"); config.set("hadoop.rpc.protection", "integrity"); System.setProperty("https.protocols", "TLSv1,TLSv1.1,TLSv1.2"); System.setProperty("java.security.krb5.conf", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\krb5.conf"); System.setProperty("java.security.krb5.realm", "REALM.CA"); System.setProperty("java.security.krb5.kdc", "REALM.CA"); System.setProperty("sun.security.krb5.debug", "true"); System.setProperty("javax.net.debug", "all"); System.setProperty("javax.net.ssl.keyStorePassword","changeit"); System.setProperty("javax.net.ssl.keyStore","C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts"); System.setProperty("javax.net.ssl.trustStore", "C:\\Program Files\\Java\\jdk1.8.0_171\\jre\\lib\\security\\cacerts"); System.setProperty("javax.net.ssl.trustStorePassword","changeit"); System.setProperty("javax.security.auth.useSubjectCredsOnly", "false"); UserGroupInformation.setConfiguration(config); UserGroupInformation.setLoginUser(UserGroupInformation.loginUserFromKeytabAndReturnUGI("hive/hadoop@REALM.CA", "c:\\data\\hive.service.keytab")); System.out.println(UserGroupInformation.getLoginUser()); System.out.println(UserGroupInformation.getCurrentUser()); //Add the hive driver Class.forName("org.apache.hive.jdbc.HiveDriver"); //Connect to hive jdbc Connection connection = DriverManager.getConnection("jdbc:hive2://hadoop:10000/default;principal=hive/hadoop@REALM.CA"); Statement statement = connection.createStatement(); //Create a table String createTableSql = "CREATE TABLE IF NOT EXISTS " +" employee ( eid int, name String, " +" salary String, designation String)" +" COMMENT 'Employee details'" +" ROW FORMAT DELIMITED" +" FIELDS TERMINATED BY '\t'" +" LINES TERMINATED BY '\n'" +" STORED AS TEXTFILE"; System.out.println("Creating Table: " + createTableSql); statement.executeUpdate(createTableSql); //Show all the tables to ensure we successfully added the table String showTablesSql = "show tables"; System.out.println("Show All Tables: " + showTablesSql); ResultSet res = statement.executeQuery(showTablesSql); while (res.next()) { System.out.println(res.getString(1)); } //Drop the table String dropTablesSql = "DROP TABLE IF EXISTS employee"; System.out.println("Dropping Table: " + dropTablesSql); statement.executeUpdate(dropTablesSql); System.out.println("Finish!");
We are going to install Hive over Hadoop and perform a basic query. Ensure you install Kerberos and Hadoop with Kerberos.
This assumes your hostname is “hadoop”
wget http://apache.forsale.plus/hive/hive-2.3.3/apache-hive-2.3.3-bin.tar.gz tar -xzf apache-hive-2.3.3-bin.tar.gz sudo mv apache-hive-2.3.3-bin /usr/local/hive sudo chown -R root:hadoopuser /usr/local/hive/
sudo nano ~/.bashrc
Add the following to the end of the file.
#HIVE VARIABLES START
export HIVE_HOME=/usr/local/hive
export HIVE_CONF_DIR=/usr/local/hive/conf
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/hadoop/lib/*:/usr/local/hive/lib/*
#HIVE VARIABLES STOP
source ~/.bashrc
kinit -kt /etc/security/keytabs/myuser.keytab myuser/hadoop@REAL.CA hdfs dfs -mkdir -p /user/hive/warehouse hdfs dfs -mkdir /tmp hdfs dfs -chmod g+w /tmp hdfs dfs -chmod g+w /user/hive/warehouse
cd /etc/security/keytabs sudo kadmin.local addprinc -randkey hive/hadoop@REALM.CA addprinc -randkey hivemetastore/hadoop@REALM.CA addprinc -randkey hive-spnego/hadoop@REALM.CA xst -kt hive.service.keytab hive/hadoop@REALM.CA xst -kt hivemetastore.service.keytab hivemetastore/hadoop@REALM.CA xst -kt hive-spnego.service.keytab hive-spnego/hadoop@REALM.CA q
sudo chown root:hadoopuser /etc/security/keytabs/* sudo chmod 750 /etc/security/keytabs/*
cd $HIVE_HOME/conf sudo cp hive-env.sh.template hive-env.sh sudo nano /usr/local/hive/conf/hive-env.sh #locate "HADOOP_HOME" and change to be this export HADOOP_HOME=/usr/local/hadoop #locate "HIVE_CONF_DIR" and change to be this export HIVE_CONF_DIR=/usr/local/hive/conf
Chekck out this link for the configuration properties.
sudo cp /usr/local/hive/conf/hive-default.xml.template /usr/local/hive/conf/hive-site.xml sudo nano /usr/local/hive/conf/hive-site.xml #Modify the following properties <property> <name>system:user.name</name> <value>${user.name}</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:postgresql://myhost:5432/metastore</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>org.postgresql.Driver</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>PASSWORD</value> </property> <property> <name>hive.exec.local.scratchdir</name> <value>/tmp/${system:user.name}</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>hive.querylog.location</name> <value>/tmp/${system:user.name}</value> <description>Location of Hive run time structured log file</description> </property> <property> <name>hive.downloaded.resources.dir</name> <value>/tmp/${hive.session.id}_resources</value> <description>Temporary local directory for added resources in the remote file system.</description> </property> <property> <name>hive.server2.logging.operation.log.location</name> <value>/tmp/${system:user.name}/operation_logs</value> <description>Top level directory where operation logs are stored if logging functionality is enabled</description> </property> <property> <name>hive.metastore.uris</name> <value>thrift://0.0.0.0:9083</value> <description>IP address (or fully-qualified domain name) and port of the metastore host</description> </property> <property> <name>hive.server2.webui.host</name> <value>0.0.0.0</value> </property> <property> <name>hive.server2.webui.port</name> <value>10002</value> </property> <property> <name>hive.metastore.port</name> <value>9083</value> </property> <property> <name>hive.server2.transport.mode</name> <value>binary</value> </property> <property> <name>hive.server2.thrift.sasl.qop</name> <value>auth-int</value> </property> <property> <name>hive.server2.authentication</name> <value>KERBEROS</value> <description>authenticationtype</description> </property> <property> <name>hive.server2.authentication.kerberos.principal</name> <value>hive/_HOST@REALM.CA</value> <description>HiveServer2 principal. If _HOST is used as the FQDN portion, it will be replaced with the actual hostname of the running instance.</description> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/etc/security/keytabs/hive.service.keytab</value> <description>Keytab file for HiveServer2 principal</description> </property> <property> <name>hive.metastore.sasl.enabled</name> <value>true</value> <description>If true, the metastore thrift interface will be secured with SASL. Clients must authenticate with Kerberos.</description> </property> <property> <name>hive.metastore.kerberos.keytab.file</name> <value>/etc/security/keytabs/hivemetastore.service.keytab</value> <description>The path to the Kerberos Keytab file containing the metastore thrift server's service principal.</description> </property> <property> <name>hive.metastore.kerberos.principal</name> <value>hivemetastore/_HOST@REALM.CA</value> <description>The service principal for the metastore thrift server. The special string _HOST will be replaced automatically with the correct host name.</description> </property> <property> <name>hive.security.authorization.enabled</name> <value>true</value> <description>enable or disable the hive client authorization</description> </property> <property> <name>hive.metastore.pre.event.listeners</name> <value>org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener</value> <description>List of comma separated listeners for metastore events.</description> </property> <property> <name>hive.security.metastore.authorization.manager</name> <value>org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider</value> <description> Names of authorization manager classes (comma separated) to be used in the metastore for authorization. The user defined authorization class should implement interface org.apache.hadoop.hive.ql.security.authorization.HiveMetastoreAuthorizationProvider. All authorization manager classes have to successfully authorize the metastore API call for the command execution to be allowed. </description> </property> <property> <name>hive.security.metastore.authenticator.manager</name> <value>org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator</value> <description> authenticator manager class name to be used in the metastore for authentication. The user defined authenticator should implement interface org.apache.hadoop.hive.ql.security.HiveAuthenticationProvider. </description> </property> <property> <name>hive.security.metastore.authorization.auth.reads</name> <value>true</value> <description>If this is true, metastore authorizer authorizes read actions on database, table</description> </property> <property> <name>datanucleus.autoCreateSchema</name> <value>false</value> </property>
Notice here how it’s .hive. that is used with the storage based authentication.
sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml <property> <name>hadoop.proxyuser.hive.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hive.groups</name> <value>*</value> </property>
Follow this install for installing Psotgresql 9.6
sudo su - postgres psql CREATE USER hiveuser WITH PASSWORD 'PASSWORD'; CREATE DATABASE metastore; GRANT ALL PRIVILEGES ON DATABASE metastore TO hiveuser; \q exit
schematool -dbType postgres -initSchema
nohup /usr/local/hive/bin/hive --service metastore --hiveconf hive.log.file=hivemetastore.log >/var/log/hive/hivemetastore.out 2>/var/log/hive/hivemetastoreerr.log & nohup /usr/local/hive/bin/hiveserver2 --hiveconf hive.metastore.uris=" " --hiveconf hive.log.file=hiveserver2.log >/var/log/hive/hiveserver2.out 2> /var/log/hive/hiveserver2err.log &
sudo mkdir /var/log/hive/ sudo chown root:hduser /var/log/hive sudo chmod 777 /var/log/hive crontab -e #Add the following @reboot nohup /usr/local/hive/bin/hive --service metastore --hiveconf hive.log.file=hivemetastore.log >/var/log/hive/hivemetastore.out 2>/var/log/hive/hivemetastoreerr.log & @reboot nohup /usr/local/hive/bin/hiveserver2 --hiveconf hive.metastore.uris=" " --hiveconf hive.log.file=hiveserver2.log >/var/log/hive/hiveserver2.out 2> /var/log/hive/hiveserver2err.log &
Now you can check the hive version
hive --version
#We first need to have a ticket to access beeline using the hive kerberos user we setup earlier. kinit -kt /etc/security/keytabs/hive.service.keytab hive/hadoop@REALM.CA #Now we can get into beeline using that principal beeline -u "jdbc:hive2://0.0.0.0:10000/default;principal=hive/hadoop@REALM.CA;" #You can also just get into beeline then connect from there beeline beeline>!connect jdbc:hive2://0.0.0.0:10000/default;principal=hive/hadoop@REALM.CA #Disconnect from beeline !q
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Hive_Install_On_Ubuntu_16_04.php
https://maprdocs.mapr.com/home/Hive/Config-RemotePostgreSQLForHiveMetastore.html
https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool#HiveSchemaTool-TheHiveSchemaTool
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration
You must be logged in to post a comment.