PySpark: Read From ADLS to DataFrame

(Last Updated On: )

This how-to is how to read from ADLS to a DataFrame.

First we need a spark Session. See PySpark: Create a Spark Session for my details on that.

Read a CSV from ADLS

  1. path = 'abfss://my_container@my_storage_account.dfs.core.windows.net/my_folder/'
  2. format = 'csv'
  3.  
  4. #you don't need "header" if it is not CSV
  5.  
  6. dataframe = spark.read.format(format) \
  7. .option('header', True) \
  8. .schema(schema) \
  9. .load(path)

Read Parquet from ADLS

  1. path = 'abfss://my_container@my_storage_account.dfs.core.windows.net/my_folder/' format = 'parquet'
  2.  
  3. dataframe = spark.read.format(format) \
  4. .load(path)
  5.  

Read Delta from ADLS

  1. path = 'abfss://my_container@my_storage_account.dfs.core.windows.net/my_folder/' format = 'delta'
  2.  
  3. dataframe = spark.read.format(format) \
  4. .load(path)