PySpark: Eclipse Integration

(Last Updated On: )

This tutorial will guide you through configuring PySpark on Eclipse.

First you need to install Eclipse.

You need to add “pyspark.zip” and “py4j-0.10.7-src.zip” to “Libraries” for the Python Interpreter.

Next you need to configure the Environment variables for PySpark.

Test that it works!

  1. from pyspark import SparkConf, SparkContext
  2. from pyspark.sql import SparkSession
  3.  
  4. def init_spark():
  5. spark = SparkSession.builder.appName("HelloWorld").getOrCreate()
  6. sc = spark.sparkContext
  7. return spark,sc
  8.  
  9. if __name__ == '__main__':
  10. spark,sc = init_spark()
  11. nums = sc.parallelize([1,2,3,4])
  12. print(nums.map(lambda x: x*x).collect())