Python: MRJob

If you use hadoop and you want to run a map reduce type job using Python you can use MRJob.

Installation:

pip install mrjob

Here is an example if you run just the mapper code and you load a json file. yield writes the data out.

from mrjob.job import MRJob, MRStep
import json

class MRTest(MRJob):
    def steps(self):
        return [
            MRStep(mapper=self.mapper_test)
        ]

    def mapper_test(self, _, line):
        result = {}
        doc = json.loads(line)

        yield key, result

if __name__ == '__main__':
    MRTest.run()