Databricks: Python SDK

This post is how to use the Databricks Python SDK.

Install the Package

pip install databricks-sdk

Update Package

pip install databricks-sdk --upgrade

Check Package Version

pip show databricks-sdk | grep -oP '(?<=Version: )\S+'

Setup WorkspaceClient

from databricks.sdk import WorkspaceClient

secret = dbutils.secrets.get(scope = "<SCOPE>", key = "<KEY>")

w = WorkspaceClient(
  host = 'https://<URL>/'
  azure_workspace_resource_id = '<RESOURCE_ID_OF_DATABRICKS>',
  azure_tenant_id = '<TENANT_ID>',
  azure_client_id = '<CLIENT_ID>',
  azure_client_secret = secret
)

Setup AccountClient

You can get the account_id from the databricks account portal. By your id in the top right hand corner.

from databricks.sdk import AccountClient

secret = dbutils.secrets.get(scope = "<SCOPE>", key = "<KEY>")

a = AccountClient(
  host = 'https://accounts.azuredatabricks.net'
  account_id = '<ACCOUNT_ID>'
  azure_tenant_id = '<TENANT_ID>',
  azure_client_id = '<CLIENT_ID>',
  azure_client_secret = secret
)

List Workspace Groups

NOTE: You must also setup the workspaceclient to do this.

w.groups.list()

List Account Groups

NOTE: You must also setup the accountclient to do this. You must also be account admin.

a.groups.list()

Create Storage Credential

NOTE: Your SPN must be account admin to do this. You must also setup the workspaceclient to do this.

from databricks.sdk.service.catalog import AzureManagedIdentity

storage_credential_name = '<CREDENTIAL_NAME>'
comment = '<COMMENT>'
connector_id = '<DATABRICKS_ACCESS_CONNECTOR>'
az_mi = AzureManagedIdentity(access_connector_id = connector_id)

w.storage_credenditals.create(
  name = storage_credential_name,
  azure_managed_identity = az_mi
  comment = comment
)

 

Azure: Python SDK

This post is how to use the Azure Python SDK.

If you are using Databricks you can get the secret by using the following Databricks: Get Secret

If you are using Synapse you can get the secret by using the following Synapse: Get Secret

Package Installations

pip install azure-identity
pip install azure-storage-file
pip install azure-storage-file-datalake

Setup Credentials

Service Principal

from azure.common.credentials import ServicePrincipalCredentials
secret = "<GET_SECRET_SECURELY>"
credential = ServicePrincipalCredential("<SPN_CLIENT_ID>", secret, tenant="<TENANT_ID>")

Token Credential

from azure.identity import ClientSecretCredential
secret = "<GET_SECRET_SECURELY>"
token_credential = ClientSecretCredential("<TENANT_ID>", "<SPN_CLIENT_ID>", secret)

Subscription Client

Client

from azure.mgmt.resource import SubscriptionClient
subscription_client = SubscriptionClient(credential)

Get List

subscriptions = subscription_client.subscriptions.list()
for subscription in subscriptions:
    print(subscription.display_name)

Storage Account

Client

from azure.mgmt.storage import StorageManagementClient
storage_client = StorageManagementClient(credential, "<SUBSCRIPTION_ID>")

Get List by Resource Group

storage_accounts = storage_client.storage_accounts.list_by_resource_group("<RESOURCE_GROUP_NAME>")
for sa in storage_accounts:
    print(sa.name)

List Containers in Storage Account

containers = storage_client.blob_containers.list("<RESOURCE_GROUP_NAME>", sa.name)

Containers

Client

from azure.storage.blob import ContainerClient
account_url_blob = f"https://{sa.name}.blob.core.windows.net"
container_client = ContainerClient.from_container_url(
    container_url=account_url_blob + "/" + container.name,
    credential=token_credential
)

Get Container Properties

container_client.get_container_properties()

List Blobs

for b in container_client.list_blobs():
    print(b)

Data Lake Service

Client

from azure.storage.filedatalake import DataLakeServiceClient
storage_account_url_dfs = f"https://{sa.name}.df.core.windows.net"
data_lake_service_client = DataLakeServiceClient(storage_account_url_dfs, token_credential)

DataLake Directory

from azure.storage.filedatalake import DataLakeDirectoryClient
data_lake_directory_client = DataLakeDirectoryClient(account_url=account_url_dfs, credential=credential)

FileSystem

Client

file_system_client = data_lake_service_client.get_file_system_client(file_system="<CONTAINER_NAME>")

Get Directory Client

directory_client = file_system_client.get_directory_client("<CONTAINER_SUB_FOLDER>")

Get Directory Access Control

acl_props = directory_client.get_access_control()

Microsoft Graph Client

Package Installations

pip install msgraph-sdk
pip install msrestazure
pip install azure-identity

Credential

from azure.identity.aio import ClientSecretCredential

secret = "<GET_SECRET_SECURELY>"
credential = ClientSecretCredential('<TENANT_ID>', '<CLIENT_ID>', secret)

Client

from msgraph import GraphServiceClient

def create_session(credential):
  scopes = ['https://graph.microsoft.com/.default']
  graph_client = GraphServiceClient(credential, scopes)
  return graph_client

graph_client = create_session(credential)

Get Groups

#This will only get you the first 100 groups. If you have more then you need to check again
groups = await graph_client.groups.get()
print(len(groups))

while groups is not None and groups.odata_next_link is not None:
  groups = await graph_client.groups.with_url(groups.odata_next_link).get()
  print(len(groups))

Get Group Members

id = '<GROUP_ID>'
group_members = await graph_client.groups.by_group_id(id).members.get()