Synapse: Get Secret

This post is how to get a secret from a key vault in Synapse.

If you have Data Exfiltration enabled (which is recommended) then you need to have a Managed Private Endpoint setup to your KeyVault.

You also need to ensure your Synapse Managed Identity has access to your Key Vault.

You also need a un-parameterized Linked Service Created.

Then you can query your Key Vault to get the secret with the following command.

  1. from notebookutils import mssparkutils
  2.  
  3. secret = mssparkutils.credentials.getSecret('<KEY_VAULT_NAME>', '<SECRET_KEY>', '<LINKED_SERVICE_KEYVAULT_NAME>')

 

Databricks: Get Secret

This post is how to get a secret from a key vault in Databricks.

First you need to setup dbutils.

Next you have to make sure your Databricks installation has a Key Vault integrated Scope setup.

Then you need to make sure that Databricks is allowed to communicate with your KeyVault.

Then you can query your Key Vault to get the secret with the following command.

  1. secret = dbutils.secrets.get(scope='<SCOPE>', key='<SECRET_KEY>')

 

Spark Connector Connect to SQL Server

This post is how to use the Spark Connector to Connect to SQL Server.

Install Spark Connector

  1. spark-mssql-connector_2.12-1.2.0.jar

Install msal

  1. pip install msal

Connect using Azure SPN

  1. import msal
  2. global_token_cache = msal.TokenCache()
  3.  
  4. secret = "<GET SECRET SECURELY>"
  5.  
  6. global_spn_app = msal.ConfidentialClientApplication(
  7. <CLIENT_ID>, Authority='https://login.microsoftonline.com/<TENANT_ID>',
  8. client_credential=secret,
  9. token_cache=global_token_cache,
  10. )
  11.  
  12. result = global_spn_app.acquire_token_for_client(scopes=['https://database.windows.net//.default'])
  13.  
  14. jdbc_df = spark.read \
  15. .format("com.microsoft.sqlserver.jdbc.spark") \
  16. .option("url", 'jdbc:sqlserver://<SERVER_NAME>:<PORT>;database=<DATABASE>;') \
  17. .option("query", "SELECT * FROM SOMETHING") \
  18. .option("accessToken", result['access_token']) \
  19. .option("encrypt", "true") \
  20. .option("hostNameInCertificate", "*.database.windows.net") \
  21. .load()

Connect using Domain Auth

  1. secret = "<GET SECRET SECURELY>"
  2.  
  3. jdbc_df = spark.read \
  4. .format("com.microsoft.sqlserver.jdbc.spark") \
  5. .option("url", 'jdbc:sqlserver://<SERVER_NAME>:<PORT>;database=<DATABASE>;') \
  6. .option("query", "SELECT * FROM SOMETHING") \
  7. .option("authentication", "ActiveDirectoryPassword") \
  8. .option("user", "<USER>@<DOMAIN>") \
  9. .option("password", "<SECRET>") \
  10. .load()

Connect using SQL Auth

I do not recommend SQL Auth

  1. secret = "<GET SECRET SECURELY>"
  2.  
  3. jdbc_df = spark.read \
  4. .format("com.microsoft.sqlserver.jdbc.spark") \
  5. .option("url", 'jdbc:sqlserver://<SERVER_NAME>:<PORT>;database=<DATABASE>;') \
  6. .option("query", "SELECT * FROM SOMETHING") \
  7. .option("user", "<USER>") \
  8. .option("password", "<SECRET>") \
  9. .load()

 

 

 

Python: pyodbc with SQL Server

This post is in regards to connecting to SQL Server using pyodbc.

Install package

  1. pip install pyodbc

If you are running in Databricks then the current driver will be “{ODBC Driver 17 for SQL Server}”.

If you are running in Synapse then the current driver will be “{ODBC Driver 18 for SQL Server}”.

Check pyodbc Version

  1. import pyodbc
  2. pyodbc.drivers()

Check Which Version of pyodbc in Databricks

  1. %sh
  2. cat /etc/odbcinst.ini

Install Databricks driver 17

  1. curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
  2. curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
  3. apt-get update
  4. ACCEPT_EULA=Y apt-get install msodbcsql17
  5. apt-get -y install unixodbc-dev

Connect using SQL Auth

I do not recommend SQL Auth

  1. import pyodbc
  2.  
  3. secret = "<GET SECRET SECURELY>"
  4.  
  5. connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};Server=tcp:<SERVER_NAME>;PORT=<PORT>;Database=<DATABASE>;Uid=<USER>;Pwd=<SECRET>;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=<TIMEOUT>;')

Connect Using Domain Auth

  1. import pyodbc
  2.  
  3. secret = "<GET SECRET SECURELY>"
  4.  
  5. connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};Server=tcp:<SERVER_NAME>;PORT=<PORT>;Database=<DATABASE>;Uid=<USER>;Pwd=<SECRET>;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=<TIMEOUT>;Authentication=ActiveDirectoryPassword')

Connect using Azure SPN

  1. pip install msal
  1. import struct
  2. import msal
  3.  
  4. global_token_cache = msal.TokenCache()
  5. secret = "<GET SECRET SECURELY>"
  6.  
  7. global_spn_app = msal.ConfidentialClientApplication(
  8. <CLIENT_ID>, Authority='https://login.microsoftonline.com/<TENANT_ID>',
  9. client_credential=secret,
  10. token_cache=global_token_cache,
  11. )
  12.  
  13. result = global_spn_app.acquire_token_for_client(scopes=['https://database.windows.net//.default'])
  14. SQL_COPT_SS_ACCESS_TOKEN = 1256
  15.  
  16. token = bytes(result['access_token'], 'utf-8')
  17. exptoken = b"";
  18.  
  19. for i in token:
  20. exptoken += bytes({i});
  21. exptoken += bytes(1);
  22.  
  23. token_struct = struct.pack("=i", len(exptoken)) + exptoken;
  24.  
  25. connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};Server=tcp:<SERVER_NAME>;PORT=<PORT>;Database=<DATABASE>;Uid=<USER>;Pwd=<SECRET>;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=<TIMEOUT>;' attrs_before = { SQL_COPT_SS_ACCESS_TOKEN:tokenstruct })

Once you have the connection you can setup the cursor.

  1. cursor = connection.cursor()

Then execute a command

  1. command = "<COMMAND>"
  2. params = ()
  3. cursor.execute(command, params)
  4. connection.commit()

After you Are finish Close

  1. cursor.close()
  2. connection.close()

 

Python: Arguments

This post is in how do use argparse package.

First you must import the package.

  1. import argparse

Next you setup the argument parser.

  1. parser = argparse.ArgumentParser()

Then you create a list of arguments. See the link above for more options then the below set.

  1. argument_list = [
  2. { "name": "<NAME>", "help": "<HELP_TEXT>", "type": "<TYPE>", "required": True}
  3. ]

Then we take your argument_list and create arguments and assign them to the parser.

  1. for arg in argument_list:
  2. parser.add_argument("--{}".format(arg["name"], help=arg["help"], type=arg["type"], required=arg["required"])

Then we parse the args from “sys.argv”. Parsing args this way means that if anything is unknown to your program than your program won’t fail but instead it will set those variables to the unknown variable and continue your application.

  1. args, unknown = parser.parse_known_args()

You could also parse the args from “sys.argv” this way. However that means that all the args passed to sys.argv must be known otherwise it will fail.

  1. args = parser.parse_args()

Then as a final step we set the values with their key to the config.

  1. config = vars(args)

 

 

 

Azure: Python SDK

This post is how to use the Azure Python SDK.

If you are using Databricks you can get the secret by using the following Databricks: Get Secret

If you are using Synapse you can get the secret by using the following Synapse: Get Secret

Package Installations

  1. pip install azure-identity
  2. pip install azure-storage-file
  3. pip install azure-storage-file-datalake

Setup Credentials

Service Principal

  1. from azure.common.credentials import ServicePrincipalCredentials
  2. secret = "<GET_SECRET_SECURELY>"
  3. credential = ServicePrincipalCredential("<SPN_CLIENT_ID>", secret, tenant="<TENANT_ID>")

Token Credential

  1. from azure.identity import ClientSecretCredential
  2. secret = "<GET_SECRET_SECURELY>"
  3. token_credential = ClientSecretCredential("<TENANT_ID>", "<SPN_CLIENT_ID>", secret)

Subscription Client

Client

  1. from azure.mgmt.resource import SubscriptionClient
  2. subscription_client = SubscriptionClient(credential)

Get List

  1. subscriptions = subscription_client.subscriptions.list()
  2. for subscription in subscriptions:
  3. print(subscription.display_name)

Storage Account

Client

  1. from azure.mgmt.storage import StorageManagementClient
  2. storage_client = StorageManagementClient(credential, "<SUBSCRIPTION_ID>")

Get List by Resource Group

  1. storage_accounts = storage_client.storage_accounts.list_by_resource_group("<RESOURCE_GROUP_NAME>")
  2. for sa in storage_accounts:
  3. print(sa.name)

List Containers in Storage Account

  1. containers = storage_client.blob_containers.list("<RESOURCE_GROUP_NAME>", sa.name)

Containers

Client

  1. from azure.storage.blob import ContainerClient
  2. account_url_blob = f"https://{sa.name}.blob.core.windows.net"
  3. container_client = ContainerClient.from_container_url(
  4. container_url=account_url_blob + "/" + container.name,
  5. credential=token_credential
  6. )

Get Container Properties

  1. container_client.get_container_properties()

List Blobs

  1. for b in container_client.list_blobs():
  2. print(b)

Data Lake Service

Client

  1. from azure.storage.filedatalake import DataLakeServiceClient
  2. storage_account_url_dfs = f"https://{sa.name}.df.core.windows.net"
  3. data_lake_service_client = DataLakeServiceClient(storage_account_url_dfs, token_credential)

DataLake Directory

  1. from azure.storage.filedatalake import DataLakeDirectoryClient
  2. data_lake_directory_client = DataLakeDirectoryClient(account_url=account_url_dfs, credential=credential)

FileSystem

Client

  1. file_system_client = data_lake_service_client.get_file_system_client(file_system="<CONTAINER_NAME>")

Get Directory Client

  1. directory_client = file_system_client.get_directory_client("<CONTAINER_SUB_FOLDER>")

Get Directory Access Control

  1. acl_props = directory_client.get_access_control()

Microsoft Graph Client

Package Installations

  1. pip install msgraph-sdk
  2. pip install msrestazure
  3. pip install azure-identity

Credential

  1. from azure.identity.aio import ClientSecretCredential
  2.  
  3. secret = "<GET_SECRET_SECURELY>"
  4. credential = ClientSecretCredential('<TENANT_ID>', '<CLIENT_ID>', secret)

Client

  1. from msgraph import GraphServiceClient
  2.  
  3. def create_session(credential):
  4. scopes = ['https://graph.microsoft.com/.default']
  5. graph_client = GraphServiceClient(credential, scopes)
  6. return graph_client
  7.  
  8. graph_client = create_session(credential)

Get Groups

  1. #This will only get you the first 100 groups. If you have more then you need to check again
  2. groups = await graph_client.groups.get()
  3. print(len(groups))
  4.  
  5. while groups is not None and groups.odata_next_link is not None:
  6. groups = await graph_client.groups.with_url(groups.odata_next_link).get()
  7. print(len(groups))

Get Group Members

  1. id = '<GROUP_ID>'
  2. group_members = await graph_client.groups.by_group_id(id).members.get()

 

Databricks: Notebook SQL

This post is how to work with Databricks SQL through a Notebook.

Create a Temp View of a DataFrame.

  1. df = <SOMETHING>
  2. df.createOrReplaceTempView("<TABLE_NAME>")

Drop a Table

  1. %sql
  2. drop table <SCHEMA>.<TABLE>;

Describe Table

  1. %sql
  2. desc table extended <SCHEMA>.<TABLE>;

Describe Detail

  1. %sql
  2. describe detail <SCHEMA>.<TABLE>;

Show Table Properties

  1. %sql
  2. SHOW TBLPROPERTIES <SCHEMA>.<TABLE>;

Describe History

  1. %sql
  2. describe history <SCHEMA>.<TABLE>;

Create Schema

  1. %sql
  2. CREATE SCHEMA IF NOT EXISTS <SCHEMA>;

Create Parquet Table

  1. %sql
  2. CREATE TABLE <SCHEMA>.<TABLE> USING PARQUET LOCATION 'abfss://<COTNAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/'

Create Delta Table

  1. %sql
  2. CREATE TABLE <SCHEMA>.<TABLE> USING DELTA LOCATION 'abfss://<COTNAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/'

Upsert

  1. MERGE INTO schema.table t \
  2. USING ( \
  3. SELECT columns \
  4. FROM table \
  5. ) AS source ON (source.column = t.column) \
  6. WHEN NOT MATCHED THEN \
  7. INSERT ( \
  8. ( \
  9. column, column2 \
  10. ) \
  11. VALUES ( \
  12. source.column, source.column2 \
  13. ) \
  14. WHEN MATCHED THEN \
  15. UPDATE SET \
  16. t.column = source.column \

 

Databricks: Mounts

This post is how to mount on Databricks.

Notes

  • Security Issue: They are shared across all clusters and users
  • Should always be unmounted after use
    • Due to Service Prinicpal password rotations
    • Reliability esspecially in BCDR
  • Databricks recommends using Unity Catalog instead of mounts as they are legacy.
  • Could be conflicts in other projects due to naming
  • Do not create mounts manually. Always have your project mount and unmount at the end

List Mounts

  1. dbutils.fs.mounts()

Unmount

  1. dbutils.fs.unmount("<MOUNT>")

Mount

  1. client_id = "<CLIENTID>"
  2. secret = dbutils.secrets.get(scope = "<SCOPE_NAME>", key = "<SECRET_NAME>")
  3. tenant_id = "<TENANT_ID>"
  4. storage_account_name = "<STORAGE_ACCOUNT_NAME>"
  5. container_name = "<CONTAINER_NAME>"
  6.  
  7. configs = {
  8. "fs.azure.account.auth.type": "OAuth",
  9. "fs.azure.account.oauth.provider.type": "org.apache.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  10. "fs.azure.account.oauth2.client.id": client_id,
  11. "fs.azure.account.oauth2.client.secret": secret,
  12. "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/" tenant_id + "/oauth2/token"
  13. }
  14.  
  15. path = "abfss://%s@%s.dfs.core.windows.net/" % (container_name, storage_account_name)
  16.  
  17. dbutils.fs.mount(
  18. source = path,
  19. mount_point = "/mnt/<MOUNT_NAME>",
  20. extra_configs = configs
  21. )

 

 

 

 

 

Databricks: Notebook Commands

This post is all about notebook commands.

List a directory on DBFS using Shell

  1. %sh
  2. ls /dbfs

List a Directory on DBFS using FS

  1. %fs
  2. ls "<DIRECTORY>"

List Python Packages

  1. %pip list

Install a Python Requirements.txt

  1. %pip install --index <URL> -r requirements.txt

Install a Single Python Package

  1. %pip install --index <URL> <PACKAGE>==<VERSION>

 

Databricks: Bearer Token CLI

This post is how to get the bearer token using the CLI and setting the env variable.

First install Azure CLI.

Databricks Resource ID = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

Get Access Token

  1. az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

Set Access Token

Linux

  1. export DATABRICKS_AAD_TOKEN="<TOKEN>"

Windows

  1. set DATABRICKS_AAD_TOKEN="<TOKEN>"

Set Config File

Linux

  1. export DATABRICKS_CONFIG_FILE="<LOCATION>"

Windows

  1. set DATABRICKS_CONFIG_FILE="<LOCATION>"

 

Databricks: Rest API

This post is how to communicate with Databricks using Rest API’s.

Databricks Resource ID = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

Get Bearer Token for Service Principal

  1. curl -X GET https://login.microsoft.com/<TENANTID>/oauth2/token -H 'Content-Type: application/x-www-form-urlencoded' -d'grant_type=client_credential&client_id=<CLIENTID>&resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d&client_secret=<SECRET>

Get Bearer Token for Service Principal Using management.core.windows.net

  1. curl -X GET https://login.microsoftonline.com/<TENANTID>/oauth2/token -H 'Content-Type: application/x-www-form-urlencoded' -d'grant_type=client_credential&client_id=<CLIENTID>&resource=https://management.core.windows.net/&amp;client_secret=<SECRET>'

Start Cluster

  1. curl --location -g --trace -X --request POST -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/clusters/start -d '{ "cluster_id": "<CLUSTER_ID>"}'

Stop Cluster

  1. curl --location -g --trace -X --request POST -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/clusters/stop -d '{ "cluster_id": "<CLUSTER_ID>"}'

List Clusters

  1. curl --location -g --trace -X --request GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/clusters/list

Job List

  1. curl --location -g --trace -X --request GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/jobs/list

Job Python Run

  1. curl --location -g --trace -X --request POST -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/jobs/run-now -d '{"job_id": <JOB_ID>, "python_params": [] }'

Job Get

  1. curl --location -g --trace -X --request GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/jobs/runs/get?run_id=<JOB_RUN_ID>

Create Job

Databricks Create Job

  1. curl --location -g --trace -X --request POST -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/jobs/create -d '<PAYLOAD>'

Create Job Payload

  1. {
  2. "name": "<NAME>",
  3. "max_concurrent_runs": 1,
  4. "tasks": [
  5. {
  6. "task_key": "<TASK_KEY>",
  7. "run_if": "ALL_SUCCESS",
  8. "max_retries": 1,
  9. "timeout_seconds": <TIMEOUT_SECONDS>,
  10. "notebook_tasks": {
  11. "notebook_path": "<PATH>",
  12. "source": "WORKSPACE",
  13. "base_parameters": {
  14. "<KEY>": "<VALUE>",
  15. "<KEY2>": "<VALUE2>",
  16. }
  17. },
  18. "libraries": [
  19. {
  20. "pypi": {
  21. "package": "<PACKAGE_NAME==VERSION>",
  22. "coordinates": ""
  23. }
  24. },
  25. {
  26. "jar": "<LOCATION>"
  27. }
  28. ],
  29. "new_cluster": {
  30. "custom_tags": {
  31. "<TAG_NAME>": "<TAG_VALUE>"
  32. },
  33. "azure_attributes": {
  34. "first_on_demand": 1,
  35. "availability": "SPOT_AZURE",
  36. "spot_bid_max_price": 75
  37. },
  38. "instance_pool_id": "<WORKER_INSTANCE_POOL_ID>",
  39. "driver_instances_pool_id": "<DRIVER_INSTANCE_POOL_ID>",
  40. "data_security_mode": "SINGLE_USER",
  41. "spark_version": "<SPARK_VERSION>",
  42. "node_type_id": "<NODE_TYPE_ID>",
  43. "runtime_engine": "STANDARD",
  44. "policy_id": "<POLICY_ID>",
  45. "autoscale": {
  46. "min_workers": <MIN_WORKERS>,
  47. "max_workers": <MAX_WORKERS>
  48. },
  49. "spark_conf": {
  50. "<CONFIG_KEY>": "<CONFIG_VALUE>"
  51. },
  52. "cluster_log_conf": {
  53. "dbfs": {
  54. "destination": "<LOG_DESTINATION>"
  55. }
  56. },
  57. "spark_env_vars": {
  58. "<ENV_NAME>": "<ENV_VALUE>"
  59. },
  60. "init_scripts": [
  61. {
  62. "volumes": {
  63. "destination": "<INIT_SCRIPT_LOCATION>"
  64. }
  65. }
  66. ]
  67. }
  68. }
  69. ],
  70. "format": "SINGLE_TASK"
  71. }

Job Permission Patch

  1. curl --location -g --trace -X --request PATCH -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/permissions/jobs/<JOB_ID> -d '{ "access_control_list": [{ "group_name": "<GROUP_NAME>", "permission_level": "<PERMISSION>"}]}'

Get Service Principal List

  1. curl -X GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/preview/scim/v2/ServicePrincipals

Delete Service Principal List From Databricks ONLY

  1. curl --location -g --trace -X --request DELETE -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/preview/scim/v2/ServicePrincipals/<APPLICATION_ID>

Add Service Principal To Databricks

  1. curl --location --request POST 'https://<DATABRICKS_url>/api/2.0/preview/scim/v2/ServicePrincipals' --header 'Authorization: Bearer <TOKEN>' --header 'Content-Type: application/json' --data-raw '{ "schemas": ["urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal"], "applicationId": "<CLIENTID>", "displayName": "<DISPLAYNAME>", "groups": [{"value": "<GROUP_ID>"}], "entitlements": [{ "value": "allow-cluster-create"}] }'

List Secret Scopes

  1. curl --location -g --trace -X --request GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/secrets/scopes/list

Create KeyVault Secret Scope

  1. curl --location -g --trace -X --request POST -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/secrets/scopes/create -d '{"scope": "<Keyvault_name>", "scope_backend_type": "AZURE_KEYVAULT", "backend_azure_keyvault": {"resource_id": "<RESOURCE_ID>", "dns_name": "<KEYVAULT_URL>"}, "initial_manage_principal": "users"}'

IP Access Lists

  1. curl -X GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/ip-access-lists

List Git Repos

  1. curl --location -g --trace -X --request GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/repos

Update Git Repo

  1. curl --location -g --trace -X --request POST -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/repos/<REPO_ID> -d '{ "branch": "<BRANCH_NAME>" }'

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Python: lxml

This post focus’ on the lxml package.

First you need to install the package

  1. from lxml import etree

Create xml object by string

  1. xml_str = "<root><subitem attr='test'>rec</subitem></root>"
  2. root = etree.fromstring(xml_str)

Get text in node

  1. text_str = root.xpath('//root/subitem/text()')[0]

Get Attribute

  1. attr = root.xpath('//root/subitem')[0].attrib['attr']

 

Azure: Install/Configure CLI

This post will show you how to install the Azure CLI.

First you need to install the CLI.

Once it is installed you can set your config directory. This is useful for having multiple logins going at the same time.

  1. set AZURE_CONFIG_DIR=<YOUR_DIRECTORY>

You can then login. There are different ways to do that

Way 1: This will popup a login where you enter your login credentials

  1. az login

Way 2: This will ask you for password via the command line

  1. az login -u <YOUR_LOGIN>

Way 3:

  1. az login -u <YOUR_LOGIN> -p <YOUR_PASSWORD>

Way 4: logs in as a service principal

  1. az login --service-principal --user-name <SPN_ID> --password <SPN_KEY> --tenant <TENANTID>

Show your Account

  1. az account show

Set Account Subscription

  1. az account set -s <SUBSCRIPTION_ID>

List Tags For A Resource

  1. az tag list --subscription <SUBSCRIPTION_NAME>

Install Graph

  1. az extension add --name resource-graph

Query for Anything that Has a Tag

  1. az graph query -q "resourceGraoup, type, tags" | where tags.<TAG_NAME>=~'<VALUE>'

Query for More than One Tag

  1. az graph query -q "resourceGraoup, type, tags" | where tags.<TAG_NAME>=~'<VALUE>' | tags.<TAG_NAME>=='<VALUE>'

Query Type

  1. az graph query -q "resourceGroup, type, tags" | where type =~ 'microsoft.sql/servers/databases'

 

Python: Create a Logger

This post is how-to create a logger.

First we need to import

  1. import sys
  2. import logging
  3. from datetime import datetime
  4. from pytz import timezone

Then we create a class for Formatter

  1. class CustomFormatter(logging.Formatter):
  2. grey = "\x1b[38;20m"
  3. reset = "\x1b[0m"
  4. format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:)"
  5. FORMATS = {
  6. logging.DEBUG: '\x1b[38;5;23m' + format + reset,
  7. logging.INFO: grey + format + reset,
  8. logging.WARNING: '\x1b[38;5;56m' + format + reset,
  9. logging.ERROR: '\x1b[38;5;197m' + format + reset,
  10. logging.CRITICAL: '\x1b[38;5;1m' + format +reset
  11. }
  12.  
  13. def format(self, record):
  14. log_fmt = self.FORMATS.get(record.levelno)
  15. formatter = logging.Formatter(log_fmt)
  16. return formatter.format(record)

Then we create a function set our logger up.

  1. def set_logger(logging_level, name, log_dir, timezone):
  2. LOGGING_LEVELS = ['WARNING','INFO','DEBUG','ERROR']
  3. if logging_level not in LOGGING_LEVELS:
  4. logging_level = 'INFO'
  5.  
  6. level_lookup = {
  7. 'WARNING': logging.WARNING,
  8. 'INFO': logging.INFO,
  9. 'DEBUG': logging.DEBUG,
  10. 'ERROR': logging.ERROR,
  11. }
  12. logging.Formatter.converter = lambda *args: datetime.now(tz=timezone(timezone)).timetuple()
  13. logging.basicConfig(level=level_lookup[logging_level], format="[%(levelname)s] %(asctime)s - %(message)s:%(lineno)d")
  14. stream_handler = logging.StreamHandler(sys.stdout)
  15. stream_handler.setFormatter(CustomFormatter())
  16. logger = logging.getLogger(name)
  17. logger.addHandler(stream_handler)
  18. logger.setLevel(logging_level)
  19.  
  20. Path(log_dir).mkdir(parents=True, exist_ok=True)
  21.  
  22. now = datetime.now(tz=timezone(timezone))
  23. now = now.strftime("%H-%M-%S")
  24.  
  25. log_file = '%slog_%s.log' % (log_dir, now)
  26. file_handler = logging.FileHandler(log_file, mode='a')
  27. file_handler.setFormatter(logging.Formatter("[%(levelname)s] %(asctime)s - %(message)s:%(lineno)d"))
  28. logger.addHandler(file_handler)
  29.  
  30. return logger

References

https://alexandra-zaharia.github.io/posts/make-your-own-custom-color-formatter-with-python-logging/

Databricks: Set Spark Configs

This post is how to set the spark configs on Databricks or Synapse Notebooks.

First you will need a spark session. Refer to PySpark: Create a Spark Session for more details.

  1. secret = 'value' #I highly suggest you get the password from the keyvault
  2. storage_account = ''
  3. application_id = ''
  4. tenant_id = ''
  5.  
  6. spark.config.set('fs.azure.account.auth.type.{}.dfs.core.windows.net'.format(storage_account), 'OAuth')
  7.  
  8. spark.config.set('fs.azure.account.oauth.provider.type.{}.dfs.core.windows.net'.format(storage_account), 'org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider')
  9.  
  10. spark.config.set('fs.azure.account.oauth2.client.id.{}.dfs.core.windows.net'.format(storage_account), application_id)
  11.  
  12. spark.config.set('fs.azure.account.oauth2.client.secret.{}.dfs.core.windows.net'.format(storage_account), secret)
  13.  
  14. spark.config.set('fs.azure.account.oauth2.client.endpoint.{}.dfs.core.windows.net'.format(storage_account), 'https://login.microsoftonline.com/{}/oauth2/token'.format(tenant_id))
  15.  

If you are running in Databricks you could add them to cluster start. Although I recommand doing it in a notebook instead.

  1. spark.hadoop.fs.azure.account.auth.type.<STORAGE_ACCOUNT>.dfs.core.windows.net OAuth
  2. fs.azure.account.oauth.provider.type.<STORAGE_ACCOUNT>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
  3. fs.azure.account.oauth2.client.id.<STORAGE_ACCOUNT>.dfs.core.windows.net <CLIENT_ID>
  4. fs.azure.account.oauth2.client.secret.<STORAGE_ACCOUNT>.dfs.core.windows.net secret
  5. fs.azure.account.oauth2.client.endpoint.<STORAGE_ACCOUNT>.dfs.core.windows.net https://login.microsoftonline.com/<TENANT_ID>/oauth2/token

 

 

PySpark: Read From ADLS to DataFrame

This how-to is how to read from ADLS to a DataFrame.

First we need a spark Session. See PySpark: Create a Spark Session for my details on that.

Read a CSV from ADLS

  1. path = 'abfss://my_container@my_storage_account.dfs.core.windows.net/my_folder/'
  2. format = 'csv'
  3.  
  4. #you don't need "header" if it is not CSV
  5.  
  6. dataframe = spark.read.format(format) \
  7. .option('header', True) \
  8. .schema(schema) \
  9. .load(path)

Read Parquet from ADLS

  1. path = 'abfss://my_container@my_storage_account.dfs.core.windows.net/my_folder/' format = 'parquet'
  2.  
  3. dataframe = spark.read.format(format) \
  4. .load(path)
  5.  

Read Delta from ADLS

  1. path = 'abfss://my_container@my_storage_account.dfs.core.windows.net/my_folder/' format = 'delta'
  2.  
  3. dataframe = spark.read.format(format) \
  4. .load(path)