Python: MRJob

(Last Updated On: )

If you use hadoop and you want to run a map reduce type job using Python you can use MRJob.

Installation:

pip install mrjob

Here is an example if you run just the mapper code and you load a json file. yield writes the data out.

from mrjob.job import MRJob, MRStep
import json
 
class MRTest(MRJob):
    def steps(self):
        return [
            MRStep(mapper=self.mapper_test)
        ]
 
    def mapper_test(self, _, line):
        result = {}
        doc = json.loads(line)
 
        yield key, result
 
if __name__ == '__main__':
    MRTest.run()

Related