This post is how to create a DataFrame in pyspark.
First we need a spark Session. See PySpark: Create a Spark Session for my details on that.
Next we need to import
from pyspark.sql import Row from pyspark.sql.types import StringType, DecimalType, TimestampType, FloatType, IntegerType, LongType, StructField, StructType
Then you create the schema
schema = StructType([ StructField('id', IntegerType()), ..... ]) data = [Row(id=1)]
Create the DataFrame
df = spark.createDataFrame(data, schema=schema)
If you want to use a JSON file to build your schema do the following
import json from pyspark.sql.types import StructType data = { "fields": [ { "metadata": {}, "name": "column_a", "nullable": false, "type": "string" } ], "type": "struct" } json_schema = json.loads(data) table_schema = StructType.fromJson(dict(json_schema)) df = spark.createDataFrame(data, schema=table_schema)