Synapse: Get KeyVault Properties Using Token Library

This post is how to get the key vault properties using the token library.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

  1. import sys
  2. from pyspark.sql import SparkSession
  3.  
  4. linked_service_name = '<KEYVAULT_LINKED_SERVICE_NAME>'
  5. spark = <GET_SPARK_SESSION>
  6. token_library = spark._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary
  7. key_vault_url = token_library.getFullConnectionStringAsMap(linked_service_name).get('url')
  8.  
  9. print(key_vault_url)
  10. print(token_library.getFullConnectionStringAsMap(linked_service_name))

 

Synapse: SAS Token

This post is how to get the SAS token from a notebook.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

  1. from notebookutils import mssparkutils
  2.  
  3. linked_service_storage_account_name = '<LINKED_SERVICE_STORAGE_NAME>'
  4. blob_sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_storage_account_name)
  5.  
  6. spark.conf.set('fs.azure.sas.<CONTAINER_NAME>.<ADLS_STORAGE_ACCOUNT_NAME>.blob.core.windows.net', blob_sas_token

 

Synapse: Environment Variables

This post is how to work with environment variables in Synapse.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

Get Environment Variable

It should be noted that “str” is the type that variable is. You can change it to whatever is required.

  1. var: str = spark.conf.get('spark.executorEnv.<ENV_NAME>')
Set Environment Variable
  1. spark.conf.set('spark.executorEnv.<ENV_NAME>', '<VALUE>')

 

Synapse: List Python Packages

This post is how to list the python packages in various ways.

You can use %pip to list the python packages that are installed.

  1. %pip freeze

However doing it that way may not give you the exact versions that are installed. To get a comprehensive list do the following.

  1. import pkg_resources
  2.  
  3. for package in pkg_resources.working_set:
  4. print(package)

 

Synapse: Help Command

This post is just how to use the help command from mssparkutils.

You can use help at various levels of Synapse.

Root

The following command will tell you what areas help can assist you in. This will respond with

  • fs
  • notebook
  • credentials
  • env
  1. from notebookutils import mssparkutils
  2.  
  3. mssparkutils.help()
FileSystem

If you leave the help command empty it will just return all options that are available for help. If you put a command in then it will explain that command in greater detail.

  1. from notebookutils import mssparkutils
  2.  
  3. mssparkutils.fs.help()
  4.  
  5. mssparkutils.fs.help('cp')

 

Synapse: Mounts

This post is how to work with mounts on Synapse.

I suggest mounting to an ADLS storage account. That is what I will assume in the below examples.

List Mounts
  1. from notebookutils import mssparkutils
  2.  
  3. mssparkutils.fs.mounts()
Get Mount Path

The output of this command will produce ‘/synfs/<number>/mnt/<CONTAINER_NAME>’

  1. from notebookutils import mssparkutils
  2.  
  3. mount_name = "/mnt/<CONTAINER_NAME>"
  4. mount_path = mssparkutils.fs.getMountPath(mount_name)
Unmount
  1. from notebookutils import mssparkutils
  2.  
  3. mount_name = "/mnt/<CONTAINER_NAME>"
  4. mssparkutils.fs.unmount(mount_name)
Mount Using a Linked Service

First you must have a linked service created to the storage account. This linked service must be hard-coded and not parameterized in any way.

  1. from notebookutils import mssparkutils
  2.  
  3. container = '<CONTAINER_NAME>'
  4. storage_account = '<STORAGE_ACCOUNT_NAME>'
  5. sub_folder = '<SUB_FOLDER>' #it should be noted that this isn't required.
  6. linked_service_name = '<LINKED_SERVICE_NAME>'
  7.  
  8. mssparkutils.fs.mount(
  9. source='abfss://%s@%s.dfs.core.windows.net/%s/' % (container, storage_account, sub_folder),
  10. mountPoint='/mnt/%s' % (container),
  11. {'linkedService':linked_service_name, 'fileCacheTimeout': 120, 'timeout': 120}
  12. )
Mount Using Configs

You will need to get the secret. Refer to Synapse: Get Secret

  1. from notebookutils import mssparkutils
  2.  
  3. client_id = '<CLIENT_ID>'
  4. tenant_id = '<TENANT_ID>'
  5. container = '<CONTAINER_NAME>'
  6. storage_account = '<STORAGE_ACCOUNT_NAME>'
  7. sub_folder = '<SUB_FOLDER>' #it should be noted that this isn't required.
  8.  
  9. configs = {
  10. "fs.azure.account.auth.type": "OAuth",
  11. "fs.azure.account.oauth.provider.type": "org.apache.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  12. "fs.azure.account.oauth2.client.id": client_id,
  13. "fs.azure.account.oauth2.client.secret": secret,
  14. "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/" tenant_id + "/oauth2/token"
  15. }
  16.  
  17. mssparkutils.fs.mount(
  18. source='abfss://%s@%s.dfs.core.windows.net/%s' % (container, storage_account, sub_folder),
  19. mountPoint='/mnt/%s' % (container),
  20. extraConfigs=configs
  21. )

 

Synapse: Get Secret

This post is how to get a secret from a key vault in Synapse.

If you have Data Exfiltration enabled (which is recommended) then you need to have a Managed Private Endpoint setup to your KeyVault.

You also need to ensure your Synapse Managed Identity has access to your Key Vault.

You also need a un-parameterized Linked Service Created.

Then you can query your Key Vault to get the secret with the following command.

  1. from notebookutils import mssparkutils
  2.  
  3. secret = mssparkutils.credentials.getSecret('<KEY_VAULT_NAME>', '<SECRET_KEY>', '<LINKED_SERVICE_KEYVAULT_NAME>')

 

Databricks: Set Spark Configs

This post is how to set the spark configs on Databricks or Synapse Notebooks.

First you will need a spark session. Refer to PySpark: Create a Spark Session for more details.

  1. secret = 'value' #I highly suggest you get the password from the keyvault
  2. storage_account = ''
  3. application_id = ''
  4. tenant_id = ''
  5.  
  6. spark.config.set('fs.azure.account.auth.type.{}.dfs.core.windows.net'.format(storage_account), 'OAuth')
  7.  
  8. spark.config.set('fs.azure.account.oauth.provider.type.{}.dfs.core.windows.net'.format(storage_account), 'org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider')
  9.  
  10. spark.config.set('fs.azure.account.oauth2.client.id.{}.dfs.core.windows.net'.format(storage_account), application_id)
  11.  
  12. spark.config.set('fs.azure.account.oauth2.client.secret.{}.dfs.core.windows.net'.format(storage_account), secret)
  13.  
  14. spark.config.set('fs.azure.account.oauth2.client.endpoint.{}.dfs.core.windows.net'.format(storage_account), 'https://login.microsoftonline.com/{}/oauth2/token'.format(tenant_id))
  15.  

If you are running in Databricks you could add them to cluster start. Although I recommand doing it in a notebook instead.

  1. spark.hadoop.fs.azure.account.auth.type.<STORAGE_ACCOUNT>.dfs.core.windows.net OAuth
  2. fs.azure.account.oauth.provider.type.<STORAGE_ACCOUNT>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
  3. fs.azure.account.oauth2.client.id.<STORAGE_ACCOUNT>.dfs.core.windows.net <CLIENT_ID>
  4. fs.azure.account.oauth2.client.secret.<STORAGE_ACCOUNT>.dfs.core.windows.net secret
  5. fs.azure.account.oauth2.client.endpoint.<STORAGE_ACCOUNT>.dfs.core.windows.net https://login.microsoftonline.com/<TENANT_ID>/oauth2/token