This tutorial will guide you through configuring PySpark on Eclipse.
First you need to install Eclipse.
You need to add “pyspark.zip” and “py4j-0.10.7-src.zip” to “Libraries” for the Python Interpreter.
Next you need to configure the Environment variables for PySpark.
Test that it works!
- from pyspark import SparkConf, SparkContext
- from pyspark.sql import SparkSession
- def init_spark():
- spark = SparkSession.builder.appName("HelloWorld").getOrCreate()
- sc = spark.sparkContext
- return spark,sc
- if __name__ == '__main__':
- spark,sc = init_spark()
- nums = sc.parallelize([1,2,3,4])
- print(nums.map(lambda x: x*x).collect())
You must be logged in to post a comment.