PIG: Testing

(Last Updated On: )

Apache PIG analyzes large data sets. There are a variety of ways of processing data using it. As I learn more about it I will put use cases below.

JSON:

  1. REGISTER 'hdfs:///elephant-bird-core-4.15.jar';
  2. REGISTER 'hdfs:///elephant-bird-hadoop-compat-4.15.jar';
  3. REGISTER 'hdfs:///elephant-bird-pig-4.15.jar';
  4. REGISTER 'hdfs:///json-simple-1.1.1.jar';
  5.  
  6. loadedJson = LOAD '/hdfs_dir/MyFile.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);
  7. rec = FOREACH loadedJson GENERATE json#'my_key' as (m:chararray);
  8. DESCRIBE rec;
  9. DUMP rec;
  10.  
  11. --Store the results in a hdfs dir. You can have HIVE query that directory
  12. STORE rec INTO '/hdfs_dir' USING PigStorage();