PIG: Testing

(Last Updated On: )

Apache PIG analyzes large data sets. There are a variety of ways of processing data using it. As I learn more about it I will put use cases below.

JSON:

REGISTER 'hdfs:///elephant-bird-core-4.15.jar';
REGISTER 'hdfs:///elephant-bird-hadoop-compat-4.15.jar';
REGISTER 'hdfs:///elephant-bird-pig-4.15.jar';
REGISTER 'hdfs:///json-simple-1.1.1.jar';

loadedJson = LOAD '/hdfs_dir/MyFile.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);
rec = FOREACH loadedJson GENERATE json#'my_key' as (m:chararray);
DESCRIBE rec;
DUMP rec;

--Store the results in a hdfs dir. You can have HIVE query that directory
STORE rec INTO '/hdfs_dir' USING PigStorage();