PySpark DataFrame Methods

(Last Updated On: )

This post shows different methods of a DataFrame.

Get the first value in a column

df = some_dataframe_definition

value = df.select("SOME_COLUMN_NAME").first()[0]

Convert Dataframe to JSON

df = some_dataframe_definition

result_json = df.toJSON()

Get a Row

df = some_dataframe_definition

row = df.collect()[0]      #You can switch out 0 for whatever row you want.

Count rows of Dataframe

df = some_dataframe_definition

num_rows = df.count()