Apache PIG analyzes large data sets. There are a variety of ways of processing data using it. As I learn more about it I will put use cases below.
JSON:
REGISTER 'hdfs:///elephant-bird-core-4.15.jar'; REGISTER 'hdfs:///elephant-bird-hadoop-compat-4.15.jar'; REGISTER 'hdfs:///elephant-bird-pig-4.15.jar'; REGISTER 'hdfs:///json-simple-1.1.1.jar'; loadedJson = LOAD '/hdfs_dir/MyFile.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]); rec = FOREACH loadedJson GENERATE json#'my_key' as (m:chararray); DESCRIBE rec; DUMP rec; --Store the results in a hdfs dir. You can have HIVE query that directory STORE rec INTO '/hdfs_dir' USING PigStorage();