Azure: Tags

This post is how to work with tags for Azure resources.

Install Graph Extension

  1. az extension add --name resource-graph

Tag List

  1. az tag list --subscription <NAME>

Query for Specific Tag Value

  1. az graph query -q "project name, resourceGroup, type, tags | where tags.<TAGNAME>=~'<VALUE>'"

Query for Multiple Tags

  1. az graph query -q "project name, resourceGroup, type, tags | where tags.<TAGNAME>=~'<VALUE>' | where tags.<TAGNAME>=~'<VALUE>'"

Query for Resource Groups

  1. az graph query -q "ResourceContainers | project name, type, tags | where tags.<TAGNAME>=~'<VALUE>'"

Query For Multiple Resource Types

  1. az graph query -q "project name, resourceGroup, type, tags | where tags.<TAGNAME>=~'<VALUE>' | where type =~ 'microsoft.sql/servers/databases' or type =~ 'microsoft.storage/storageaccounts'"

 

Synapse: Get KeyVault Properties Using Token Library

This post is how to get the key vault properties using the token library.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

  1. import sys
  2. from pyspark.sql import SparkSession
  3.  
  4. linked_service_name = '<KEYVAULT_LINKED_SERVICE_NAME>'
  5. spark = <GET_SPARK_SESSION>
  6. token_library = spark._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary
  7. key_vault_url = token_library.getFullConnectionStringAsMap(linked_service_name).get('url')
  8.  
  9. print(key_vault_url)
  10. print(token_library.getFullConnectionStringAsMap(linked_service_name))

 

Synapse: SAS Token

This post is how to get the SAS token from a notebook.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

  1. from notebookutils import mssparkutils
  2.  
  3. linked_service_storage_account_name = '<LINKED_SERVICE_STORAGE_NAME>'
  4. blob_sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_storage_account_name)
  5.  
  6. spark.conf.set('fs.azure.sas.<CONTAINER_NAME>.<ADLS_STORAGE_ACCOUNT_NAME>.blob.core.windows.net', blob_sas_token

 

Synapse: Environment Variables

This post is how to work with environment variables in Synapse.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

Get Environment Variable

It should be noted that “str” is the type that variable is. You can change it to whatever is required.

  1. var: str = spark.conf.get('spark.executorEnv.<ENV_NAME>')
Set Environment Variable
  1. spark.conf.set('spark.executorEnv.<ENV_NAME>', '<VALUE>')

 

Synapse: List Python Packages

This post is how to list the python packages in various ways.

You can use %pip to list the python packages that are installed.

  1. %pip freeze

However doing it that way may not give you the exact versions that are installed. To get a comprehensive list do the following.

  1. import pkg_resources
  2.  
  3. for package in pkg_resources.working_set:
  4. print(package)

 

Synapse: Help Command

This post is just how to use the help command from mssparkutils.

You can use help at various levels of Synapse.

Root

The following command will tell you what areas help can assist you in. This will respond with

  • fs
  • notebook
  • credentials
  • env
  1. from notebookutils import mssparkutils
  2.  
  3. mssparkutils.help()
FileSystem

If you leave the help command empty it will just return all options that are available for help. If you put a command in then it will explain that command in greater detail.

  1. from notebookutils import mssparkutils
  2.  
  3. mssparkutils.fs.help()
  4.  
  5. mssparkutils.fs.help('cp')

 

Synapse: Mounts

This post is how to work with mounts on Synapse.

I suggest mounting to an ADLS storage account. That is what I will assume in the below examples.

List Mounts
  1. from notebookutils import mssparkutils
  2.  
  3. mssparkutils.fs.mounts()
Get Mount Path

The output of this command will produce ‘/synfs/<number>/mnt/<CONTAINER_NAME>’

  1. from notebookutils import mssparkutils
  2.  
  3. mount_name = "/mnt/<CONTAINER_NAME>"
  4. mount_path = mssparkutils.fs.getMountPath(mount_name)
Unmount
  1. from notebookutils import mssparkutils
  2.  
  3. mount_name = "/mnt/<CONTAINER_NAME>"
  4. mssparkutils.fs.unmount(mount_name)
Mount Using a Linked Service

First you must have a linked service created to the storage account. This linked service must be hard-coded and not parameterized in any way.

  1. from notebookutils import mssparkutils
  2.  
  3. container = '<CONTAINER_NAME>'
  4. storage_account = '<STORAGE_ACCOUNT_NAME>'
  5. sub_folder = '<SUB_FOLDER>' #it should be noted that this isn't required.
  6. linked_service_name = '<LINKED_SERVICE_NAME>'
  7.  
  8. mssparkutils.fs.mount(
  9. source='abfss://%s@%s.dfs.core.windows.net/%s/' % (container, storage_account, sub_folder),
  10. mountPoint='/mnt/%s' % (container),
  11. {'linkedService':linked_service_name, 'fileCacheTimeout': 120, 'timeout': 120}
  12. )
Mount Using Configs

You will need to get the secret. Refer to Synapse: Get Secret

  1. from notebookutils import mssparkutils
  2.  
  3. client_id = '<CLIENT_ID>'
  4. tenant_id = '<TENANT_ID>'
  5. container = '<CONTAINER_NAME>'
  6. storage_account = '<STORAGE_ACCOUNT_NAME>'
  7. sub_folder = '<SUB_FOLDER>' #it should be noted that this isn't required.
  8.  
  9. configs = {
  10. "fs.azure.account.auth.type": "OAuth",
  11. "fs.azure.account.oauth.provider.type": "org.apache.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  12. "fs.azure.account.oauth2.client.id": client_id,
  13. "fs.azure.account.oauth2.client.secret": secret,
  14. "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/" tenant_id + "/oauth2/token"
  15. }
  16.  
  17. mssparkutils.fs.mount(
  18. source='abfss://%s@%s.dfs.core.windows.net/%s' % (container, storage_account, sub_folder),
  19. mountPoint='/mnt/%s' % (container),
  20. extraConfigs=configs
  21. )

 

Synapse: Get Secret

This post is how to get a secret from a key vault in Synapse.

If you have Data Exfiltration enabled (which is recommended) then you need to have a Managed Private Endpoint setup to your KeyVault.

You also need to ensure your Synapse Managed Identity has access to your Key Vault.

You also need a un-parameterized Linked Service Created.

Then you can query your Key Vault to get the secret with the following command.

  1. from notebookutils import mssparkutils
  2.  
  3. secret = mssparkutils.credentials.getSecret('<KEY_VAULT_NAME>', '<SECRET_KEY>', '<LINKED_SERVICE_KEYVAULT_NAME>')

 

Databricks: Get Secret

This post is how to get a secret from a key vault in Databricks.

First you need to setup dbutils.

Next you have to make sure your Databricks installation has a Key Vault integrated Scope setup.

Then you need to make sure that Databricks is allowed to communicate with your KeyVault.

Then you can query your Key Vault to get the secret with the following command.

  1. secret = dbutils.secrets.get(scope='<SCOPE>', key='<SECRET_KEY>')

 

Azure: Python SDK

This post is how to use the Azure Python SDK.

If you are using Databricks you can get the secret by using the following Databricks: Get Secret

If you are using Synapse you can get the secret by using the following Synapse: Get Secret

Package Installations

  1. pip install azure-identity
  2. pip install azure-storage-file
  3. pip install azure-storage-file-datalake

Setup Credentials

Service Principal

  1. from azure.common.credentials import ServicePrincipalCredentials
  2. secret = "<GET_SECRET_SECURELY>"
  3. credential = ServicePrincipalCredential("<SPN_CLIENT_ID>", secret, tenant="<TENANT_ID>")

Token Credential

  1. from azure.identity import ClientSecretCredential
  2. secret = "<GET_SECRET_SECURELY>"
  3. token_credential = ClientSecretCredential("<TENANT_ID>", "<SPN_CLIENT_ID>", secret)

Subscription Client

Client

  1. from azure.mgmt.resource import SubscriptionClient
  2. subscription_client = SubscriptionClient(credential)

Get List

  1. subscriptions = subscription_client.subscriptions.list()
  2. for subscription in subscriptions:
  3. print(subscription.display_name)

Storage Account

Client

  1. from azure.mgmt.storage import StorageManagementClient
  2. storage_client = StorageManagementClient(credential, "<SUBSCRIPTION_ID>")

Get List by Resource Group

  1. storage_accounts = storage_client.storage_accounts.list_by_resource_group("<RESOURCE_GROUP_NAME>")
  2. for sa in storage_accounts:
  3. print(sa.name)

List Containers in Storage Account

  1. containers = storage_client.blob_containers.list("<RESOURCE_GROUP_NAME>", sa.name)

Containers

Client

  1. from azure.storage.blob import ContainerClient
  2. account_url_blob = f"https://{sa.name}.blob.core.windows.net"
  3. container_client = ContainerClient.from_container_url(
  4. container_url=account_url_blob + "/" + container.name,
  5. credential=token_credential
  6. )

Get Container Properties

  1. container_client.get_container_properties()

List Blobs

  1. for b in container_client.list_blobs():
  2. print(b)

Data Lake Service

Client

  1. from azure.storage.filedatalake import DataLakeServiceClient
  2. storage_account_url_dfs = f"https://{sa.name}.df.core.windows.net"
  3. data_lake_service_client = DataLakeServiceClient(storage_account_url_dfs, token_credential)

DataLake Directory

  1. from azure.storage.filedatalake import DataLakeDirectoryClient
  2. data_lake_directory_client = DataLakeDirectoryClient(account_url=account_url_dfs, credential=credential)

FileSystem

Client

  1. file_system_client = data_lake_service_client.get_file_system_client(file_system="<CONTAINER_NAME>")

Get Directory Client

  1. directory_client = file_system_client.get_directory_client("<CONTAINER_SUB_FOLDER>")

Get Directory Access Control

  1. acl_props = directory_client.get_access_control()

Microsoft Graph Client

Package Installations

  1. pip install msgraph-sdk
  2. pip install msrestazure
  3. pip install azure-identity

Credential

  1. from azure.identity.aio import ClientSecretCredential
  2.  
  3. secret = "<GET_SECRET_SECURELY>"
  4. credential = ClientSecretCredential('<TENANT_ID>', '<CLIENT_ID>', secret)

Client

  1. from msgraph import GraphServiceClient
  2.  
  3. def create_session(credential):
  4. scopes = ['https://graph.microsoft.com/.default']
  5. graph_client = GraphServiceClient(credential, scopes)
  6. return graph_client
  7.  
  8. graph_client = create_session(credential)

Get Groups

  1. #This will only get you the first 100 groups. If you have more then you need to check again
  2. groups = await graph_client.groups.get()
  3. print(len(groups))
  4.  
  5. while groups is not None and groups.odata_next_link is not None:
  6. groups = await graph_client.groups.with_url(groups.odata_next_link).get()
  7. print(len(groups))

Get Group Members

  1. id = '<GROUP_ID>'
  2. group_members = await graph_client.groups.by_group_id(id).members.get()

 

Databricks: Notebook SQL

This post is how to work with Databricks SQL through a Notebook.

Create a Temp View of a DataFrame.

  1. df = <SOMETHING>
  2. df.createOrReplaceTempView("<TABLE_NAME>")

Drop a Table

  1. %sql
  2. drop table <SCHEMA>.<TABLE>;

Describe Table

  1. %sql
  2. desc table extended <SCHEMA>.<TABLE>;

Describe Detail

  1. %sql
  2. describe detail <SCHEMA>.<TABLE>;

Show Table Properties

  1. %sql
  2. SHOW TBLPROPERTIES <SCHEMA>.<TABLE>;

Describe History

  1. %sql
  2. describe history <SCHEMA>.<TABLE>;

Create Schema

  1. %sql
  2. CREATE SCHEMA IF NOT EXISTS <SCHEMA>;

Create Parquet Table

  1. %sql
  2. CREATE TABLE <SCHEMA>.<TABLE> USING PARQUET LOCATION 'abfss://<COTNAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/'

Create Delta Table

  1. %sql
  2. CREATE TABLE <SCHEMA>.<TABLE> USING DELTA LOCATION 'abfss://<COTNAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/'

Upsert

  1. MERGE INTO schema.table t \
  2. USING ( \
  3. SELECT columns \
  4. FROM table \
  5. ) AS source ON (source.column = t.column) \
  6. WHEN NOT MATCHED THEN \
  7. INSERT ( \
  8. ( \
  9. column, column2 \
  10. ) \
  11. VALUES ( \
  12. source.column, source.column2 \
  13. ) \
  14. WHEN MATCHED THEN \
  15. UPDATE SET \
  16. t.column = source.column \

 

Databricks: Mounts

This post is how to mount on Databricks.

Notes

  • Security Issue: They are shared across all clusters and users
  • Should always be unmounted after use
    • Due to Service Prinicpal password rotations
    • Reliability esspecially in BCDR
  • Databricks recommends using Unity Catalog instead of mounts as they are legacy.
  • Could be conflicts in other projects due to naming
  • Do not create mounts manually. Always have your project mount and unmount at the end

List Mounts

  1. dbutils.fs.mounts()

Unmount

  1. dbutils.fs.unmount("<MOUNT>")

Mount

  1. client_id = "<CLIENTID>"
  2. secret = dbutils.secrets.get(scope = "<SCOPE_NAME>", key = "<SECRET_NAME>")
  3. tenant_id = "<TENANT_ID>"
  4. storage_account_name = "<STORAGE_ACCOUNT_NAME>"
  5. container_name = "<CONTAINER_NAME>"
  6.  
  7. configs = {
  8. "fs.azure.account.auth.type": "OAuth",
  9. "fs.azure.account.oauth.provider.type": "org.apache.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  10. "fs.azure.account.oauth2.client.id": client_id,
  11. "fs.azure.account.oauth2.client.secret": secret,
  12. "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/" tenant_id + "/oauth2/token"
  13. }
  14.  
  15. path = "abfss://%s@%s.dfs.core.windows.net/" % (container_name, storage_account_name)
  16.  
  17. dbutils.fs.mount(
  18. source = path,
  19. mount_point = "/mnt/<MOUNT_NAME>",
  20. extra_configs = configs
  21. )

 

 

 

 

 

Databricks: Notebook Commands

This post is all about notebook commands.

List a directory on DBFS using Shell

  1. %sh
  2. ls /dbfs

List a Directory on DBFS using FS

  1. %fs
  2. ls "<DIRECTORY>"

List Python Packages

  1. %pip list

Install a Python Requirements.txt

  1. %pip install --index <URL> -r requirements.txt

Install a Single Python Package

  1. %pip install --index <URL> <PACKAGE>==<VERSION>

 

Databricks: Bearer Token CLI

This post is how to get the bearer token using the CLI and setting the env variable.

First install Azure CLI.

Databricks Resource ID = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

Get Access Token

  1. az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

Set Access Token

Linux

  1. export DATABRICKS_AAD_TOKEN="<TOKEN>"

Windows

  1. set DATABRICKS_AAD_TOKEN="<TOKEN>"

Set Config File

Linux

  1. export DATABRICKS_CONFIG_FILE="<LOCATION>"

Windows

  1. set DATABRICKS_CONFIG_FILE="<LOCATION>"