This how-to is how to read from ADLS to a DataFrame.
First we need a spark Session. See PySpark: Create a Spark Session for my details on that.
Read a CSV from ADLS
- path = 'abfss://my_container@my_storage_account.dfs.core.windows.net/my_folder/'
- format = 'csv'
- #you don't need "header" if it is not CSV
- dataframe = spark.read.format(format) \
- .option('header', True) \
- .schema(schema) \
- .load(path)
Read Parquet from ADLS
- path = 'abfss://my_container@my_storage_account.dfs.core.windows.net/my_folder/' format = 'parquet'
- dataframe = spark.read.format(format) \
- .load(path)
Read Delta from ADLS
- path = 'abfss://my_container@my_storage_account.dfs.core.windows.net/my_folder/' format = 'delta'
- dataframe = spark.read.format(format) \
- .load(path)