Azure: Tags

This post is how to work with tags for Azure resources.

Install Graph Extension

az extension add --name resource-graph

Tag List

az tag list --subscription <NAME>

Query for Specific Tag Value

az graph query -q "project name, resourceGroup, type, tags | where tags.<TAGNAME>=~'<VALUE>'"

Query for Multiple Tags

az graph query -q "project name, resourceGroup, type, tags | where tags.<TAGNAME>=~'<VALUE>' | where tags.<TAGNAME>=~'<VALUE>'"

Query for Resource Groups

az graph query -q "ResourceContainers | project name, type, tags | where tags.<TAGNAME>=~'<VALUE>'"

Query For Multiple Resource Types

az graph query -q "project name, resourceGroup, type, tags | where tags.<TAGNAME>=~'<VALUE>' | where type =~ 'microsoft.sql/servers/databases' or type =~ 'microsoft.storage/storageaccounts'"

 

Synapse: Get KeyVault Properties Using Token Library

This post is how to get the key vault properties using the token library.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

import sys
from pyspark.sql import SparkSession

linked_service_name = '<KEYVAULT_LINKED_SERVICE_NAME>'
spark = <GET_SPARK_SESSION>
token_library = spark._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary
key_vault_url = token_library.getFullConnectionStringAsMap(linked_service_name).get('url')

print(key_vault_url)
print(token_library.getFullConnectionStringAsMap(linked_service_name))

 

Synapse: SAS Token

This post is how to get the SAS token from a notebook.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

from notebookutils import mssparkutils

linked_service_storage_account_name = '<LINKED_SERVICE_STORAGE_NAME>'
blob_sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_storage_account_name)

spark.conf.set('fs.azure.sas.<CONTAINER_NAME>.<ADLS_STORAGE_ACCOUNT_NAME>.blob.core.windows.net', blob_sas_token

 

Synapse: Environment Variables

This post is how to work with environment variables in Synapse.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

Get Environment Variable

It should be noted that “str” is the type that variable is. You can change it to whatever is required.

var: str = spark.conf.get('spark.executorEnv.<ENV_NAME>')
Set Environment Variable
spark.conf.set('spark.executorEnv.<ENV_NAME>', '<VALUE>')

 

Synapse: List Python Packages

This post is how to list the python packages in various ways.

You can use %pip to list the python packages that are installed.

%pip freeze

However doing it that way may not give you the exact versions that are installed. To get a comprehensive list do the following.

import pkg_resources

for package in pkg_resources.working_set:
    print(package)

 

Synapse: Help Command

This post is just how to use the help command from mssparkutils.

You can use help at various levels of Synapse.

Root

The following command will tell you what areas help can assist you in. This will respond with

  • fs
  • notebook
  • credentials
  • env
from notebookutils import mssparkutils

mssparkutils.help()
FileSystem

If you leave the help command empty it will just return all options that are available for help. If you put a command in then it will explain that command in greater detail.

from notebookutils import mssparkutils

mssparkutils.fs.help()

mssparkutils.fs.help('cp')

 

Synapse: Mounts

This post is how to work with mounts on Synapse.

I suggest mounting to an ADLS storage account. That is what I will assume in the below examples.

List Mounts
from notebookutils import mssparkutils

mssparkutils.fs.mounts()
Get Mount Path

The output of this command will produce ‘/synfs/<number>/mnt/<CONTAINER_NAME>’

from notebookutils import mssparkutils

mount_name = "/mnt/<CONTAINER_NAME>"
mount_path = mssparkutils.fs.getMountPath(mount_name)
Unmount
from notebookutils import mssparkutils

mount_name = "/mnt/<CONTAINER_NAME>"
mssparkutils.fs.unmount(mount_name)
Mount Using a Linked Service

First you must have a linked service created to the storage account. This linked service must be hard-coded and not parameterized in any way.

from notebookutils import mssparkutils

container = '<CONTAINER_NAME>'
storage_account = '<STORAGE_ACCOUNT_NAME>'
sub_folder = '<SUB_FOLDER>' #it should be noted that this isn't required.
linked_service_name = '<LINKED_SERVICE_NAME>'

mssparkutils.fs.mount(
    source='abfss://%s@%s.dfs.core.windows.net/%s/' % (container, storage_account, sub_folder),
    mountPoint='/mnt/%s' % (container),
    {'linkedService':linked_service_name, 'fileCacheTimeout': 120, 'timeout': 120}
)
Mount Using Configs

You will need to get the secret. Refer to Synapse: Get Secret

from notebookutils import mssparkutils

client_id = '<CLIENT_ID>'
tenant_id = '<TENANT_ID>'
container = '<CONTAINER_NAME>'
storage_account = '<STORAGE_ACCOUNT_NAME>'
sub_folder = '<SUB_FOLDER>' #it should be noted that this isn't required.

configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.oauth.provider.type": "org.apache.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id": client_id,
  "fs.azure.account.oauth2.client.secret": secret,
  "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/" tenant_id + "/oauth2/token"
}

mssparkutils.fs.mount(
  source='abfss://%s@%s.dfs.core.windows.net/%s' % (container, storage_account, sub_folder),
  mountPoint='/mnt/%s' % (container),
  extraConfigs=configs
)

 

Synapse: Get Secret

This post is how to get a secret from a key vault in Synapse.

If you have Data Exfiltration enabled (which is recommended) then you need to have a Managed Private Endpoint setup to your KeyVault.

You also need to ensure your Synapse Managed Identity has access to your Key Vault.

You also need a un-parameterized Linked Service Created.

Then you can query your Key Vault to get the secret with the following command.

from notebookutils import mssparkutils

secret = mssparkutils.credentials.getSecret('<KEY_VAULT_NAME>', '<SECRET_KEY>', '<LINKED_SERVICE_KEYVAULT_NAME>')

 

Databricks: Get Secret

This post is how to get a secret from a key vault in Databricks.

First you need to setup dbutils.

Next you have to make sure your Databricks installation has a Key Vault integrated Scope setup.

Then you need to make sure that Databricks is allowed to communicate with your KeyVault.

Then you can query your Key Vault to get the secret with the following command.

secret = dbutils.secrets.get(scope='<SCOPE>', key='<SECRET_KEY>')

 

Azure: Python SDK

This post is how to use the Azure Python SDK.

If you are using Databricks you can get the secret by using the following Databricks: Get Secret

If you are using Synapse you can get the secret by using the following Synapse: Get Secret

Package Installations

pip install azure-identity
pip install azure-storage-file
pip install azure-storage-file-datalake

Setup Credentials

Service Principal

from azure.common.credentials import ServicePrincipalCredentials
secret = "<GET_SECRET_SECURELY>"
credential = ServicePrincipalCredential("<SPN_CLIENT_ID>", secret, tenant="<TENANT_ID>")

Token Credential

from azure.identity import ClientSecretCredential
secret = "<GET_SECRET_SECURELY>"
token_credential = ClientSecretCredential("<TENANT_ID>", "<SPN_CLIENT_ID>", secret)

Subscription Client

Client

from azure.mgmt.resource import SubscriptionClient
subscription_client = SubscriptionClient(credential)

Get List

subscriptions = subscription_client.subscriptions.list()
for subscription in subscriptions:
    print(subscription.display_name)

Storage Account

Client

from azure.mgmt.storage import StorageManagementClient
storage_client = StorageManagementClient(credential, "<SUBSCRIPTION_ID>")

Get List by Resource Group

storage_accounts = storage_client.storage_accounts.list_by_resource_group("<RESOURCE_GROUP_NAME>")
for sa in storage_accounts:
    print(sa.name)

List Containers in Storage Account

containers = storage_client.blob_containers.list("<RESOURCE_GROUP_NAME>", sa.name)

Containers

Client

from azure.storage.blob import ContainerClient
account_url_blob = f"https://{sa.name}.blob.core.windows.net"
container_client = ContainerClient.from_container_url(
    container_url=account_url_blob + "/" + container.name,
    credential=token_credential
)

Get Container Properties

container_client.get_container_properties()

List Blobs

for b in container_client.list_blobs():
    print(b)

Data Lake Service

Client

from azure.storage.filedatalake import DataLakeServiceClient
storage_account_url_dfs = f"https://{sa.name}.df.core.windows.net"
data_lake_service_client = DataLakeServiceClient(storage_account_url_dfs, token_credential)

DataLake Directory

from azure.storage.filedatalake import DataLakeDirectoryClient
data_lake_directory_client = DataLakeDirectoryClient(account_url=account_url_dfs, credential=credential)

FileSystem

Client

file_system_client = data_lake_service_client.get_file_system_client(file_system="<CONTAINER_NAME>")

Get Directory Client

directory_client = file_system_client.get_directory_client("<CONTAINER_SUB_FOLDER>")

Get Directory Access Control

acl_props = directory_client.get_access_control()

Microsoft Graph Client

Package Installations

pip install msgraph-sdk
pip install msrestazure
pip install azure-identity

Credential

from azure.identity.aio import ClientSecretCredential

secret = "<GET_SECRET_SECURELY>"
credential = ClientSecretCredential('<TENANT_ID>', '<CLIENT_ID>', secret)

Client

from msgraph import GraphServiceClient

def create_session(credential):
  scopes = ['https://graph.microsoft.com/.default']
  graph_client = GraphServiceClient(credential, scopes)
  return graph_client

graph_client = create_session(credential)

Get Groups

#This will only get you the first 100 groups. If you have more then you need to check again
groups = await graph_client.groups.get()
print(len(groups))

while groups is not None and groups.odata_next_link is not None:
  groups = await graph_client.groups.with_url(groups.odata_next_link).get()
  print(len(groups))

Get Group Members

id = '<GROUP_ID>'
group_members = await graph_client.groups.by_group_id(id).members.get()

 

Databricks: Notebook SQL

This post is how to work with Databricks SQL through a Notebook.

Create a Temp View of a DataFrame.

df = <SOMETHING>
df.createOrReplaceTempView("<TABLE_NAME>")

Drop a Table

%sql
drop table <SCHEMA>.<TABLE>;

Describe Table

%sql
desc table extended <SCHEMA>.<TABLE>;

Describe Detail

%sql
describe detail <SCHEMA>.<TABLE>;

Show Table Properties

%sql
SHOW TBLPROPERTIES <SCHEMA>.<TABLE>;

Describe History

%sql
describe history <SCHEMA>.<TABLE>;

Create Schema

%sql
CREATE SCHEMA IF NOT EXISTS <SCHEMA>;

Create Parquet Table

%sql
CREATE TABLE <SCHEMA>.<TABLE> USING PARQUET LOCATION 'abfss://<COTNAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/'

Create Delta Table

%sql
CREATE TABLE <SCHEMA>.<TABLE> USING DELTA LOCATION 'abfss://<COTNAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/'

Upsert

MERGE INTO schema.table t \
USING ( \
  SELECT columns \
  FROM table \
) AS source ON (source.column = t.column) \
WHEN NOT MATCHED THEN \
  INSERT ( \
    ( \
      column, column2 \
    ) \
  VALUES ( \
    source.column, source.column2 \
  ) \
WHEN MATCHED THEN \
  UPDATE SET \
    t.column = source.column \

 

Databricks: Mounts

This post is how to mount on Databricks.

Notes

  • Security Issue: They are shared across all clusters and users
  • Should always be unmounted after use
    • Due to Service Prinicpal password rotations
    • Reliability esspecially in BCDR
  • Databricks recommends using Unity Catalog instead of mounts as they are legacy.
  • Could be conflicts in other projects due to naming
  • Do not create mounts manually. Always have your project mount and unmount at the end

List Mounts

dbutils.fs.mounts()

Unmount

dbutils.fs.unmount("<MOUNT>")

Mount

client_id = "<CLIENTID>"
secret = dbutils.secrets.get(scope = "<SCOPE_NAME>", key = "<SECRET_NAME>")
tenant_id = "<TENANT_ID>"
storage_account_name = "<STORAGE_ACCOUNT_NAME>"
container_name = "<CONTAINER_NAME>"

configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.oauth.provider.type": "org.apache.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id": client_id,
  "fs.azure.account.oauth2.client.secret": secret,
  "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/"  tenant_id + "/oauth2/token"
}

path = "abfss://%s@%s.dfs.core.windows.net/" % (container_name, storage_account_name)

dbutils.fs.mount(
    source = path,
    mount_point = "/mnt/<MOUNT_NAME>",
    extra_configs = configs
)

 

 

 

 

 

Databricks: Notebook Commands

This post is all about notebook commands.

List a directory on DBFS using Shell

%sh
ls /dbfs

List a Directory on DBFS using FS

%fs
ls "<DIRECTORY>"

List Python Packages

%pip list

Install a Python Requirements.txt

%pip install --index <URL> -r requirements.txt

Install a Single Python Package

%pip install --index <URL> <PACKAGE>==<VERSION>

 

Databricks: Bearer Token CLI

This post is how to get the bearer token using the CLI and setting the env variable.

First install Azure CLI.

Databricks Resource ID = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

Get Access Token

az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

Set Access Token

Linux

export DATABRICKS_AAD_TOKEN="<TOKEN>"

Windows

set DATABRICKS_AAD_TOKEN="<TOKEN>"

Set Config File

Linux

export DATABRICKS_CONFIG_FILE="<LOCATION>"

Windows

set DATABRICKS_CONFIG_FILE="<LOCATION>"