If you use hadoop and you want to run a map reduce type job using Python you can use MRJob.
Installation:
- pip install mrjob
Here is an example if you run just the mapper code and you load a json file. yield writes the data out.
- from mrjob.job import MRJob, MRStep
- import json
- class MRTest(MRJob):
- def steps(self):
- return [
- MRStep(mapper=self.mapper_test)
- ]
- def mapper_test(self, _, line):
- result = {}
- doc = json.loads(line)
- yield key, result
- if __name__ == '__main__':
- MRTest.run()