Python: MRJob

(Last Updated On: )

If you use hadoop and you want to run a map reduce type job using Python you can use MRJob.

Installation:

  1. pip install mrjob

Here is an example if you run just the mapper code and you load a json file. yield writes the data out.

  1. from mrjob.job import MRJob, MRStep
  2. import json
  3.  
  4. class MRTest(MRJob):
  5. def steps(self):
  6. return [
  7. MRStep(mapper=self.mapper_test)
  8. ]
  9.  
  10. def mapper_test(self, _, line):
  11. result = {}
  12. doc = json.loads(line)
  13.  
  14. yield key, result
  15.  
  16. if __name__ == '__main__':
  17. MRTest.run()