This tutorial will guide you through configuring PySpark on Eclipse.
First you need to install Eclipse.
You need to add “pyspark.zip” and “py4j-0.10.7-src.zip” to “Libraries” for the Python Interpreter.
Next you need to configure the Environment variables for PySpark.
Test that it works!
from pyspark import SparkConf, SparkContext from pyspark.sql import SparkSession def init_spark(): spark = SparkSession.builder.appName("HelloWorld").getOrCreate() sc = spark.sparkContext return spark,sc if __name__ == '__main__': spark,sc = init_spark() nums = sc.parallelize([1,2,3,4]) print(nums.map(lambda x: x*x).collect())
You must be logged in to post a comment.