ElasticSearch: Low Level Rest Client Connection

This entry is part 1 of 3 in the series ElasticSearch Low Level Rest Client

In this tutorial I will show you how to use the ElasticSearch low level rest client.

First you will need to add the low level rest to the pom.

<properties>
	<elasticSearch.version>6.2.4</elasticSearch.version>
</properties>
 
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-client</artifactId>
    <version>${elasticSearch.version}</version>
</dependency>

Next you will need to specify the imports.

import org.apache.http.HttpHost;
import org.elasticsearch.client.Response;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;

Now you can connect to ElasticSearch.

final RestClientBuilder restClientBuilder = RestClient.builder(new HttpHost("localhost", 9200, "http"));
 
final RestClient restClient = builder.build();

Now you can do whatever you need to!

NiFi: Rest API

NiFi has a bunch of Rest API’s that you can use. They are located here.

They are very comprehensive. The only thing that I would say is missing is getting the root process group of NiFi. It is not documented what the api call would be. All api calls must be authenticated as well.

The api call to get the root process group called “NiFi Flow” which is the main process group is.

https://lcoalhost/nifi-api/process-groups/root

ElasticSearch Installation

To install ElasticSearch is really straight forward. I will be using Ubuntu 16.04 for this installation.

Java 8

java -version
#if not installed run the following
sudo apt-get install openjdk-8-jdk

Download

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.3.rpm

Directories

It is recommended to change the log and data directory from default implementations.

#create log and data directory
sudo mkdir /my/dir/log/elasticsearch
sudo mkdir /my/dir/elasticsearch
 
# Change owner
sudo chown -R elasticsearch /my/dir/log/elasticsearch
sudo chown -R elasticsearch /my/dir/elasticsearch

Install

sudo rpm -ivh elasticsearch-6.2.3.rpm

Change Settings

sudo vi /etc/elasticsearch/elasticsearch.yml
 
#Change the following settings
#----------SETTINGS-----------------
cluster.name: logsearch
node.name: ##THE_HOST_NAME##
node.master: true #The node is master eligable
node.data: true #Hold data and perform data related operations
path.data: /my/dir/elasticsearch
path.logs: /my/dir/log/elasticsearch
network.host: ##THE_HOST_NAME##
http.port: 9200
discovery.zen.ping.unicast.hosts: ["##THE_HOST_NAME##"]
#----------SETTINGS-----------------

Start/Stop/Status ElasticSearch

sudo service elasticsearch start
sudo service elasticsearch stop
sudo service elasticsearch status

Rest API

http://localhost:9200

Avro & Java: Record Parsing

This tutorial will guide you through how to convert json to avro and then back to json. I suggest you first read through the documentation on Avro to familiarize yourself with it. This tutorial assumes you have a maven project already setup and a resources folder.

POM:

Add Avro Dependency

Add Jackson Dependency

Avro Schema File:

Next you need to create the avro schema file in your resources folder. Name the file “schema.avsc”. The extension avsc is the Avro schema extension.

{
    "namespace": "test.avro",
    "type": "record",
    "name": "MY_NAME",
    "fields": [
        {"name": "name_1", "type": "int"},
        {"name": "name_2", "type": {"type": "array", "items": "float"}},
        {"name": "name_3", "type": "float"}
    ]
}

Json Record to Validate:

Next you need to create a json file that conforms to your schema you just made. Name the file “record.json” and put it in your resources folder. The contents can be whatever you want as long as it conforms to your schema above.

{ "name_1": 234, "name_2": [23.34,654.98], "name_3": 234.7}

It’s Avro Time:

Imports:

import java.io.ByteArrayOutputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
 
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
 
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

Conversion to Avro and Back:

private void run() throws IOException {
	//Get the schema and json record from resources
	final ClassLoader loader = getClass().getClassLoader();
	final File schemaFile = new File(loader.getResource("schema.avsc").getFile());
	final InputStream record = loader.getResourceAsStream("record.json");
	
	//Create avro schema
	final Schema schema = new Schema.Parser().parse(schemaFile);
 
	//Encode to avro
	final byte[] avro = encodeToAvro(schema, record);
 
	//Decode back to json
	final JsonNode node = decodeToJson(schema, avro);
 
	System.out.println(node);
	System.out.println("done");
}
 
/**
 * Encode json to avro
 * 
 * @param schema the schema the avro pertains to
 * @param record the data to convert to avro
 * @return the avro bytes
 * @throws IOException if decoding fails
 */
private byte[] encodeToAvro(Schema schema, InputStream record) throws IOException {
	final DatumReader<GenericData.Record> reader = new GenericDatumReader<>(schema);
	final DataInputStream din = new DataInputStream(record);
	final Decoder decoder = new DecoderFactory().jsonDecoder(schema, din);
	final Object datum = reader.read(null, decoder);
	final GenericDatumWriter<Object> writer = new GenericDatumWriter<>(schema);
	final ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
	final Encoder encoder = new EncoderFactory().binaryEncoder(outputStream, null);
	writer.write(datum, encoder);
	encoder.flush();
 
	return outputStream.toByteArray();
}
 
/**
 * Decode avro back to json.
 * 
 * @param schema the schema the avro pertains to
 * @param avro the avro bytes
 * @return the json
 * @throws IOException if jackson fails
 */
private JsonNode decodeToJson(Schema schema, byte[] avro) throws IOException {
	final ObjectMapper mapper = new ObjectMapper();
	final DatumReader<GenericData.Record> reader = new GenericDatumReader<>(schema);
	final Decoder decoder = new DecoderFactory().binaryDecoder(avro, null);
	final JsonNode node = mapper.readTree(reader.read(null, decoder).toString());
 
	return node;
}

HBASE & Java: Scan Filters

This tutorial will guide you through how to use filtering when scanning a HBASE table using Java 8. Make sure you first follow this tutorial on connecting to HBASE and this tutorial on scanning HBase.

Row Key Filter (PrefixFilter):

final PrefixFilter prefixFilter = new PrefixFilter(Bytes.toBytes(myRoKey));
scan.addFilter(prefixFilter);

Column Value Filter:

final SingleColumnValueFilter columnValueFilter = new SingleColumnValueFilter(myColumnFamily, myColumnName, CompareOp.EQUAL, Bytes.toBytes(myValue));
scan.addFilter(columnValueFilter);

Regex Filter:

final RegexStringComparator regexStringComparator = new RegexStringComparator(".*");
final SingleColumnValueFilter singleColumnValueFilter = new SingleColumnValueFilter(myColumnFamily, myColumnName, CompareOp.EQUAL, regexStringComparator);
scan.addFilter(singleColumnValueFilter);

HBASE & Java: Delete a Table

This tutorial will guide you through how to delete a HBASE table using Java 8. Make sure you first follow this tutorial on connecting to HBASE.

Import:

import org.apache.hadoop.hbase.client.Admin;

Delete:

//You must first disable the table
conn.getAdmin().disableTable(TableName.valueOf("myTable"));
 
//Now you can delete the table
conn.getAdmin().deleteTable(TableName.valueOf("myTable"));

NiFi: Custom Processor

The following tutorial shows you how to create a custom nifi processor.

Create Project:

Install Maven
Create a folder called “nifi”

navigate into “nifi” folder and run

mvn archetype:generate -DarchetypeGroupId=org.apache.nifi -DarchetypeArtifactId=nifi-processor-bundle-archetype -DarchetypeVersion=1.0.0 -DnifiVersion=1.0.0

Put in your “groupId” when it asks.
1. I used “com.test”
Put in your “artifactId” when it asks.
1. I used “processor”
You can accept the default “version”.
Put in your “artifactBaseName” when it asks.
1. I used “MyProcessor”
Once it completes you can import the maven project into Eclipse.
You will get two projects
1. nar
2. processor
You should then have two files like below created.

MyProcessor.java:

package com.test.processors;
 
import org.apache.nifi.components.PropertyDescriptor;
import org.apache.nifi.flowfile.FlowFile;
import org.apache.nifi.processor.*;
import org.apache.nifi.annotation.behavior.ReadsAttribute;
import org.apache.nifi.annotation.behavior.ReadsAttributes;
import org.apache.nifi.annotation.behavior.WritesAttribute;
import org.apache.nifi.annotation.behavior.WritesAttributes;
import org.apache.nifi.annotation.lifecycle.OnScheduled;
import org.apache.nifi.annotation.documentation.CapabilityDescription;
import org.apache.nifi.annotation.documentation.SeeAlso;
import org.apache.nifi.annotation.documentation.Tags;
import org.apache.nifi.processor.exception.ProcessException;
import org.apache.nifi.processor.util.StandardValidators;
 
import java.util.*;
 
@Tags({"example"})
@CapabilityDescription("Provide a description")
@SeeAlso({})
@ReadsAttributes({@ReadsAttribute(attribute="", description="")})
@WritesAttributes({@WritesAttribute(attribute="", description="")})
public class MyProcessor extends AbstractProcessor {
 
    public static final PropertyDescriptor MY_PROPERTY = new PropertyDescriptor
            .Builder().name("My Property")
            .description("Example Property")
            .required(true)
            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
            .build();
 
    public static final Relationship MY_RELATIONSHIP = new Relationship.Builder()
            .name("my_relationship")
            .description("Example relationship")
            .build();
 
    private List descriptors;
 
    private Set relationships;
 
    @Override
    protected void init(final ProcessorInitializationContext context) {
        final List descriptors = new ArrayList();
        descriptors.add(MY_PROPERTY);
        this.descriptors = Collections.unmodifiableList(descriptors);
 
        final Set relationships = new HashSet();
        relationships.add(MY_RELATIONSHIP);
        this.relationships = Collections.unmodifiableSet(relationships);
    }
 
    @Override
    public Set getRelationships() {
        return this.relationships;
    }
 
    @Override
    public final List getSupportedPropertyDescriptors() {
        return descriptors;
    }
 
    @OnScheduled
    public void onScheduled(final ProcessContext context) {
 
    }
 
    @Override
    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
        FlowFile flowFile = session.get();
        if ( flowFile == null ) {
            return;
        }
        // TODO implement
        session.transfer(flowFile, MY_RELATIONSHIP);
    }
}

MyProcessorTest.java:

This is the unit test for nifi.

package com.test.processors;
 
import static org.junit.Assert.*;
 
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.util.List;
 
import org.apache.nifi.util.MockFlowFile;
import org.apache.nifi.util.TestRunner;
import org.apache.nifi.util.TestRunners;
import org.junit.Before;
import org.junit.Test;
 
public class MyProcessorTest {
    private TestRunner testRunner;
 
    @Before
    public void init() {
        testRunner = TestRunners.newTestRunner(MyProcessor.class);
    }
 
    @Test
    public void testProcessor() {
    	final InputStream content = new ByteArrayInputStream(new byte[0]);
    	testRunner.setProperty("My Property", "test");
        testRunner.enqueue(content);
        testRunner.run(1);
        testRunner.assertQueueEmpty();
        
        final List results = testRunner.getFlowFilesForRelationship(MyProcessor.MY_RELATIONSHIP);
        assertTrue("1 match", results.size() == 1);
    }
}

Optional:

Nar Directory:

You can create a custom nar directory to deploy your custom nifi processors to. You can either use the nifi/lib directory or specify your own. To specify your own edit the “nifi.properties” file.

cd /nifi/conf/
nano nifi.properties

Look for “nifi.nar.library.directory.”.
Add the following: nifi.nar.library.directory.anyname=/your/directory/

HBASE Phoenix & Java: Unsecure Connection

In this tutorial I will show you how to do a basic connection to remote unsecure HBase Pheonix Query Server using Java. Phoenix allows you to run SQL commands over top HBASE. You can find the commands listed here.

POM.xml:

<dependency>
	<groupId>org.apache.phoenix</groupId>
	<artifactId>phoenix-server-client</artifactId>
	<version>4.7.0-HBase-1.1</version>
</dependency>

Imports:

import java.sql.DriverManager;
import java.sql.SQLException;

Connect:

Class.forName("org.apache.phoenix.queryserver.client.Driver");
Connection conn = DriverManager.getConnection("jdbc:phoenix:thin:url=http://localhost:8765;serialization=PROTOBUF");

Hadoop & Java: Connect Remote Unsecured HDFS

In this tutorial I will show you how to connect to remote unsecured HDFS cluster using Java. If you haven’t install hdfs yet follow the tutorial.

POM.xml:

<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-client</artifactId>
	<version>2.9.1</version>
</dependency>

Imports:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import java.net.URI;

Connect:

//Setup the configuration object.
final Configuration config = new Configuration();
 
//If you want you can add any properties you want here.
 
//Setup the hdfs file system object.
final FileSystem fs = FileSystem.get(new URI("hdfs://localhost:50070"), config);
 
//Do whatever you need to.

HBASE & Java: Search for Data

This tutorial will give you a quick overview of how to search for data using HBASE. If you have not done so yet. Follow the following two tutorials on HBASE: Connecting and HBASE: Create a Table.

Search for Data:

Basically we have to scan the table for data. So we must first setup a scan object then search for the data.

import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.util.Bytes;
 
//Lets setup our scan object.
final Scan scan = new Scan();
//Search a particular column
scan.addColumn(Bytes.toBytes("columnFamily"), Bytes.toBytes("columnName"));
//Check the row key prefix
scan.setRowPrefixFilter(Bytes.toBytes("rowkey"));
 
final TableName table = TableName.valueOf(yourTableName);
 
//Get the table you want to work with. using the connection from the tutorial above.
final Table table = conn.getTable(table);
//Create our scanner based on the scan object above.
final ResultScanner scanner = table.getScanner(scan);
 
//Now we will loop through our results
for (Result result = scanner.next(); result != null; result = scanner.next()) {
      //Lets get our row key
      final String rowIdentifier = Bytes.toString(result.getRow());
 
      //Now based on each record found we will loop through the available cells for that record.
      for (final Cell cell : result.listCells()) {
        //now we can do whatever we need to with the data.
        log.info("column {} value {}", Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()), Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength()));
      }
}

HBASE & Java: Create a Table

This tutorial will guide you through how to create a HBASE table using Java 8. Make sure you first follow this tutorial on connecting to HBASE.

Table Exists:

This checks if the table already exists in HBASE.

import org.apache.hadoop.hbase.TableName;
 
final TableName table = TableName.valueOf(yourTableName);
 
//Use the connection object to getAdmin from the connection tutorial above.
conn.getAdmin().tableExists(table);

Create Table:

In the most basic example of creating a HBASE table you need to know the name and the column families. A column family is columns grouped together. The data is related in some way and stored together on disk. Notice how we don’t define columns in the table design. Columns are added as we put data. Which I will give example below.

import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
 
final TableName table = TableName.valueOf(yourTableName);
 
final HTableDescriptor hTableBuilder = new HTableDescriptor(table);
final HColumnDescriptor column = new HColumnDescriptor(family);
hTableBuilder.addFamily(column);
 
//Use the connection object to getAdmin from the connection tutorial above.
conn.getAdmin().createTable(hTableBuilder);

Get a Table:

This will retrieve a table from HBASE so you can use it to put data, etc.

import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Table;
 
final TableName table = TableName.valueOf(yourTableName);
 
//Use the connection object from the connection tutorial above.
final Table table = conn.getTable(table);

Put Data:

Now we will put data into the table we have reference to above. Notice how the columns are referenced.

import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
 
final byte[] rowKey = Bytes.toBytes("some row identifier");
final byte[] columnFamily = Bytes.toBytes("myFamily");
final byte[] columnName = Bytes.toBytes("columnName");
final byte[] data = Bytes.toBytes(myData);
 
final Put put = new Put(rowKey);
put.addColumn(columnFamily, columnName, data);
 
//Insert the data.
table.put(put);
//Close the table.
table.close();

HBASE: Connecting Unsecure

In this tutorial I will show you how to connect to an Unsecure HBASE using Java. It’s rather straight forward. This tutorial assumes no security. There are so many different options you can set we will just take the bare minimum so you can connect.

POM:

<dependency>
	<groupId>org.apache.hbase</groupId>
	<artifactId>hbase-client</artifactId>
	<version>1.4.1</version>
</dependency>
<dependency>
	<groupId>org.apache.hbase</groupId>
	<artifactId>hbase</artifactId>
	<version>1.4.1</version>
	<type>pom</type>
</dependency>

Imports:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;

Config:

We will use the basic configuration here. You should secure the cluster and use appropriate settings for that.

final Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "myurl.com"); //Can be comma seperated if you have more than 1
config.set("hbase.zookeeper.property.clientPort", "2181");
config.set("zookeeper.znode.parent", "/hbase-unsecure");

Connect:

Now we create the connection.

Connection conn = ConnectionFactory.createConnection(config);
 
//Later when we are done we will want to close the connection.
conn.close();

Hbase Admin:

Retrieve an Admin implementation to administer an HBase cluster. If you need it.

Admin admin = conn.getAdmin();
//Later when we are done we will want to close the connection.
admin.close();

NiFi Installation (Basic)

In this tutorial I will guide you through installing NiFi on Ubuntu 16.04 and setting to run as a service. We will assume you have a user called “hduser”.

Install Java 8

sudo apt-get install openjdk-8-jdk

Install NiFi

wget http://mirror.dsrg.utoronto.ca/apache/nifi/1.8.0/nifi-1.8.0-bin.tar.gz
tar -xzf nifi-1.8.0-bin.tar.gz
sudo mv nifi-1.8.0/ /usr/local/nifi

Set Ownership:

 sudo chown -R hduser:hduser /usr/local/nifi

Setup .bashrc:

 sudo nano ~/.bashrc

Add the following to the end of the file.

#NIFI VARIABLES START
export NIFI_HOME=/usr/local/nifi
export NIFI_CONF_DIR=/usr/local/nifi/conf
export PATH=$PATH:$NIFI_HOME/bin
#NIFI VARIABLES STOP

 source ~/.bashrc

Install NiFi As Service

cd /usr/local/nifi/bin
sudo ./nifi.sh install
reboot

Start/Stop/Status Service

sudo service nifi start
sudo service nifi stop
sudo service nifi status

Your site is now available http://localhost:8080/nifi

Uninstall

sudo rm /etc/rc2.d/S65nifi
sudo rm /etc/init.d/nifi
sudo rm /etc/rc2.d/K65nifi
 
sudo rm -R /usr/local/nifi/

Kafka & Java: Consumer List Topics

In this tutorial I will show you how to list all topics in Kafka. Before you begin you will need Maven/Eclipse all setup and a project ready to go. Also you should go through this tutorial to setup the consumer.

Imports

import java.util.Map;
import java.util.List;
import org.apache.kafka.common.PartitionInfo;

Consumer List Topics

Map<String, List> listTopics = consumer.listTopics();
System.out.println("list of topic size :" + listTopics.size());
 
for (String topic : listTopics.keySet()) {
	System.out.println("topic name :" + topic);
}

Kafka & Java: Unsecure Consumer Read Record

In this tutorial I will show you how to read a record to Kafka. Before you begin you will need Maven/Eclipse all setup and a project ready to go. If you haven’t installed Kafka yet please do so.

POM.xml

<dependency>
	<groupId>org.apache.kafka</groupId>
	<artifactId>kafka-clients</artifactId>
	<version>1.1.0</version>
</dependency>

Imports

import org.apache.kafka.clients.consumer.*;
import java.util.Properties;
import java.io.InputStream;
import java.util.Arrays;

Consumer Props File

You can go here to view all the options for consumer properties.

# The url to kafka
bootstrap.servers=localhost:9092
 
#identify consumer group
group.id=test
 
#offset will be periodically committed in the background
enable.auto.commit=true
 
# The serializer for the key
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
 
# The serializer for the value
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
 
# heartbeat to detect worker failures
session.timeout.ms=10000
 
#Automatically reset offset to earliest offset
auto.offset.reset=earliest

Consumer Connection/Send

The record we will read will just be a string for both key and value.

Consumer<String, String> consumer = null;
 
try {
	ClassLoader classLoader = getClass().getClassLoader();
 
	try (InputStream props = classLoader.getResourceAsStream("consumer.props")) {
		Properties properties = new Properties();
		properties.load(props);
		consumer = new KafkaConsumer<>(properties);
	}
	
	System.out.println("Consumer Created");
 
	// Subscribe to the topic.
	consumer.subscribe(Arrays.asList("testTopic"));
 
	while (true) {
		final ConsumerRecords<String, String> consumerRecords = consumer.poll(1000);
		
		if (consumerRecords.count() == 0) {
			//Keep reading till no records
			break;
		}
 
		consumerRecords.forEach(record -> {
			System.out.printf("Consumer Record:(%s, %s, %d, %d)\n", record.key(), record.value(), record.partition(), record.offset());
		});
 
		//Commit offsets returned on the last poll() for all the subscribed list of topics and partition
		consumer.commitAsync();
	}
} finally {
	consumer.close();
}
System.out.println("Consumer Closed");

References

I used kafka-sample-programs as a guide for setting up props.

Kafka & Java: Unsecure Producer Send Record

In this tutorial I will show you how to put a record to Kafka. Before you begin you will need Maven/Eclipse all setup and a project ready to go. If you haven’t installed Kafka yet please do so.

POM.xml

<dependency>
	<groupId>org.apache.kafka</groupId>
	<artifactId>kafka-clients</artifactId>
	<version>1.1.0</version>
</dependency>

Imports

import org.apache.kafka.clients.producer.*;
import java.util.Properties;
import java.io.InputStream;
import java.util.Arrays;

Producer Props File

You can go here to view all the options for producer properties.

# The url to kafka
bootstrap.servers=localhost:9092
 
# The number of acknowledgments the producer requires the leader to have received before considering a request complete
acks=all
 
# Setting a value greater than zero will cause the client to resend any record whose send fails with a potentially transient error
retries=1
 
# The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. 
batch.size=16384
 
# The frequency in milliseconds that the consumer offsets are auto-committed to Kafka 
auto.commit.interval.ms=1000
 
# The producer groups together any records that arrive in between request transmissions into a single batched request. Normally this occurs only under load when records arrive faster than they can be sent out
linger.ms=0
 
# The serializer for the key
key.serializer=org.apache.kafka.common.serialization.StringSerializer
 
# The serializer for the value
value.serializer=org.apache.kafka.common.serialization.StringSerializer
 
 
# The configuration controls the maximum amount of time the client will wait for the response of a request
request.timeout.ms=60000
 
# If true the consumer's offset will be periodically committed in the background.
enable.auto.commit=true

Producer Connection/Send

The record we will send will just be a string for both key and value.

Producer<String, String> producer = null;
 
try {
	ClassLoader classLoader = getClass().getClassLoader();
	//Get the props file and load to the producer.
	try (InputStream props = classLoader.getResourceAsStream("producer.props")) {
		Properties properties = new Properties();
		properties.load(props);
		producer = new KafkaProducer<>(properties);
	}
	//Setting up the record to send
	ProducerRecord<String, String> rec = new ProducerRecord<String, String>("testTopic", "Key", "Value");
	//Send the record and get the response
	RecordMetadata recMetaData = producer.send(rec).get();
 
	//You can now print out any relavent information you want about the RecordMetaData
 
	System.out.println("Producer Record Sent");
} finally {
	producer.flush();
	producer.close();
}

References

I used kafka-sample-programs as a guide for setting up props.

Kafka: Installation (Basic)

To install Kafka is really straight forward. There is a quick start guide you can follow. The only thing I found was that it didn’t call out Java 8. I will be using Ubuntu 16.04 for this installation.

Install Java 8

sudo apt-get install openjdk-8-jdk

Install Kafka

wget http://apache.forsale.plus/kafka/1.1.0/kafka_2.11-1.1.0.tgz 
tar -xzf kafka_2.11-1.1.0.tgz
sudo mv kafka_2.11-1.1.0/ /usr/local/kafka
cd /usr/local/kafka/

Setup .bashrc:

 sudo nano ~/.bashrc

Add the following to the end of the file.

#KAFKA VARIABLES START
export KAFKA_HOME=/usr/local/kafka
export KAFKA_CONF_DIR=/usr/local/kafka/conf
export PATH=$PATH:$KAFKA_HOME/bin
#KAFKA VARIABLES STOP

 source ~/.bashrc

ZooKeeper

Zookeeper comes pre-installed with kafka but you can run your own. For the purposes of this we just use the built in zookeeper.

bin/zookeeper-server-start.sh config/zookeeper.properties

Kafka Server

Now we can run the kafka server and start receiving messages on topics.

bin/kafka-server-start.sh config/server.properties

List Topics

/usr/local/kafka/bin/kafka-topics.sh --list --zookeeper hadoop:2181

Create Topic

/usr/local/kafka/bin/kafka-topics.sh --create --zookeeper hadoop:2181 --replication-factor 1 --partitions 1 --topic test

Auto Start

So if you want Kafka to run at startup then do the following.

touch kafka_start.sh
sudo chmod +x kafka_start.sh
touch kafka_stop.sh
sudo chmod +x kafka_stop.sh
crontab -e

Add the following and save.

@reboot /home/kafka/kafka_start.sh

kafka_start.sh

#!/bin/bash
 
/usr/local/kafka/bin/zookeeper-server-start.sh -daemon /usr/local/kafka/config/zookeeper.properties
sleep 2
/usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties

kafka_stop.sh

#!/bin/bash
 
/usr/local/kafka/bin/zookeeper-server-stop.sh
sleep 2
/usr/local/kafka/bin/kafka-server-stop.sh

Avro & Python: How to Schema, Write, Read

I have been experimenting with Apache Avro and Python. Below is what I have learned thus far.

Pip Install

At the time of this writing I am using 1.8.2.

pip install avro-python3

Schema

There are so many different ways to work with the schema definition. There are primitive and complex types. You can find way more documentation on the schema definition here.

import json
import avro.schema
 
my_schema = avro.schema.Parse(json.dumps(
{
    'namespace': 'test.avro',
    'type': 'record',
    'name': 'MY_NAME',
    'fields': [
        {'name': 'name_1', 'type': 'int'},
        {'name': 'name_2', 'type': {'type': 'array', 'items': 'float'}},
        {'name': 'name_3', 'type': 'float'},
    ]
}))

Method 1

Write

from avro.datafile import DataFileWriter
from avro.io import DatumWriter
import io
 
#write binary
file = open(filename, 'wb')
 
datum_writer = DatumWriter()
fwriter = DataFileWriter(file, datum_writer, my_schema)
fwriter.append({'name_1': 645645, 'name_2': [5.6,34.7], 'name_3': 644.5645})
fwriter.close()

Write Deflate

from avro.datafile import DataFileWriter
from avro.io import DatumWriter
 
#write binary
file = open(filename, 'wb')
 
datum_writer = DatumWriter()
fwriter = DataFileWriter(file, datum_writer, my_schema, codec = 'deflate')
fwriter.append({'name_1': 645645, 'name_2': [5.6,34.7], 'name_3': 644.5645})
fwriter.close()

Append

from avro.datafile import DataFileWriter
from avro.io import DatumWriter
import io
 
#append binary
file = open(filename, 'a+b')
 
datum_writer = DatumWriter()
#Notice that the schema is not added the the datafilewriter. This is because you are appending to an existing avro file
fwriter = DataFileWriter(file, datum_writer)
fwriter.append({'name_1': 645675, 'name_2': [5.6,34.9], 'name_3': 649.5645})
fwriter.close()

Read Schema

from avro.datafile import DataFileReader
from avro.io import DatumReader
 
file = open(filename, 'rb')
datum_reader = DatumReader()
file_reader = DataFileReader(file, datum_reader)
 
print(file_reader .meta)

Read

from avro.datafile import DataFileReader
from avro.io import DatumReader
 
#read binary
fd = open(filename, 'rb')
datum_reader = DatumReader()
file_reader = DataFileReader(fd, datum_reader)
 
for datum in file_reader:
	print(datum['name_1'])
	print(datum['name_2'])
	print(datum['name_3'])
file_reader.close()

Method 2

Write/Append BinaryEncoder

import io
from avro.io import DatumWriter, BinaryEncoder
 
#write binary
file = open(filename, 'wb')
#append binary
file = open(filename, 'a+b')
bytes_writer = io.BytesIO()
encoder = BinaryEncoder(bytes_writer)
writer_binary = DatumWriter(my_schema)
writer_binary.write({'name_1': 645645, 'name_2': [5.6,34.7], 'name_3': 644.5645}, encoder)
file.write(bytes_writer.getvalue())

Read BinaryDecoder

import io
from avro.io import DatumReader, BinaryDecoder
 
file = open(filename, 'rb')
bytes_reader = io.BytesIO(file.read())
decoder = BinaryDecoder(bytes_reader)
reader = DatumReader(my_schema)
 
while True:
	try:
		rec = reader.read(decoder)
		print(rec['name_1'])
		print(rec['name_2'])
		print(rec['name_3'])
	except:
		break

HortonWorks: Kerberize Ambari Server

This entry is part 7 of 7 in the series HortonWorks

You may want to integrate Kerberos authentication into your Ambari Server implementation. If you do follow the next few steps. It’s that easy.

Step 1: Stop Ambari Server

sudo ambari-server stop

Step 2: Create keytab file

ktutil
 
addent -password -p ##USER##@##DOMAIN##.COM -k 1 -e RC4-HMAC
 
# Enter password
 
wkt ##USER##.keytab
q
 
$ sudo mkdir /etc/security/keytabs
$ mv ##USER##.keytab /etc/security/keytabs

Step 3: Test Keytab. You should see the ticket once you klist.

kinit -kt /etc/security/keytabs/ambarisa.keytab -a ambarisa@AERYON.COM
klist

Step 4: Run Ambari Server Kerberos Setup

sudo ambari-server setup-kerberos

Follow the prompts. Say true to enabling kerberos. The keytab file will be the /etc/security/##USER##.keytab file. You should be able to leave the rest defaults. Save the settings and you are done.

Step 5: Remove the kinit ticket you created that way you can make sure you kerberos authentication is working correctly.

kdestroy

Step 6: Start Ambari Server

sudo ambari-server start

Step 7: Validate Kerberos. You should see your ticket get created and you should now be able to login with no issues.

klist

HortonWorks: Install YARN/MR

This entry is part 6 of 7 in the series HortonWorks

This tutorial guides you through installing YARN/MapReduce on Hortonworks using a multi node cluster setup with Ubuntu OS.

Step 1: Go to “Stack and Version”. Then click “Add Service” on YARN. You will notice that “MapReduce2” comes with it.

Step 2: Assign Masters I usually put the ResourceManager, History Server and App Timeline Server all on the secondary namenode. But it is totally up to you how you setup your environment.

Step 3: Assign Slaves and Clients I put NodeManagers on all the datanodes and Client’s on all servers. Up to you though. This is what worked for me and my requirements.

Step 4: During Customize Services you may get the warning that Ambari Metrics “hbase_master_heapsize” needs to be increased. I recommend doing this change but it’s up to you and what makes sense in your environment.

Step 5: Follow the remaining steps and installation should complete with no issues. Should an issue arise review the error and if it was just a turning on connection error then you may not have any issues and it just needs all services to be stopped and started again. Please not Ambari Metrics may report errors but they should clear in around 15 minutes.