If you use hadoop and you want to run a map reduce type job using Python you can use MRJob.
Installation:
pip install mrjob
Here is an example if you run just the mapper code and you load a json file. yield writes the data out.
from mrjob.job import MRJob, MRStep import json class MRTest(MRJob): def steps(self): return [ MRStep(mapper=self.mapper_test) ] def mapper_test(self, _, line): result = {} doc = json.loads(line) yield key, result if __name__ == '__main__': MRTest.run()
You must be logged in to post a comment.