Oliver

Databricks: Check for Orphaned Workspace Directories

In this post I will show you how to check if your Databricks workspace has directories that are orphaned to users/service principals.

import requests
from requests.auth import HTTPBasicAuth
import json
 
DATABRICKS_INSTANCE = dbutils.widgets.get('url')
TOKEN = dbutils.widgets.get('token')
 
#Get list of directories
endpoint = f'{DATABRICKS_INSTANCE}/api/2.0/workspace/list'
params = {
    'path': '/Users/'
}
response = requests.get(endpoint, params=params, auth=HTTPBasicAuth('token', TOKEN))
response.raise_for_status()
json_response = response.json()
 
for i, result in json_response.items():
    for value in result:
        directory = value["path"]
        user = directory.replace("/Users/", "")
 
        if '@' in user:
            endpoint = f'{DATABRICKS_INSTANCE}/api/2.0/preview/scim/v2/Users'
            params = {
                'filter': 'userName eq "%s"' % (user)
            }
            response = requests.get(endpoint, params=params, auth=HTTPBasicAuth('token', TOKEN))
            account = response.json().get('Resources', [])
            if not account:
                print('Account %s doesn't exist' % (account))
        else:
            endpoint = f'{DATABRICKS_INSTANCE}/api/2.0/preview/scim/v2/ServicePrincipals'
            params = {
                'filter': 'applicationId eq "%s"' % (user)
            }
            response = requests.get(endpoint, params=params, auth=HTTPBasicAuth('token', TOKEN))
            sp = response.json().get('Resources', [])
            if not sp:
                print('SP %s doesn't exist' % (sp))

SourceTree Installation on Windows

This post is how to install/Configure SourceTree.

Step 1

Install SourceTree.

Step 2

Install Beyond Compare.

Step 3

Install Git

Step 4

Create or Import your SSH key by going to SourceTree and clicking Tools then click “Create or import SSH Keys”. Follow the instructions then click “Save public key” and “Save private key” next to “Save the generated key”. Save public key to C:\Users\<USER>\.ssh\id_rsa.pub. Save private key to C:\Users\<USER>\.ssh\id_rsa.

Step 5

Update to latest SourceTree build by opening SourceTree click Tools –> Options then click “Updates” tab and update if required.

Step 6

Configure any options you want by going to Tools –> Options.

General Tab

Choose your theme
Setup SSH Client Configuration
Setup Default user information
Set your remote check for updates
Reopen repository tabs at startup
Spell check commit messages

Diff Tab

Convert tabs to spaces with width 4
External Diff tool Beyond Compare

Git Tab

Set Global Ignore List
Prune tracking branches no longer present on remote(s)
Push all tags to remotes
Fetch all tags

Authentication Tab

Setup any auth you have.

Step 7

Setup C:\Users\<USER>\.gitconfig like the below.

[core]
	autocrlf = true
[user]
	name = GaudreaultPC\\Gaudreault
	email = oliver@gaudreault.ca
[difftool "sourcetree"]
	cmd = 'C:/Program Files/Beyond Compare 5/BCompare.exe' \"$LOCAL\" \"$REMOTE\"
[mergetool "sourcetree"]
	cmd = "'' "
	trustExitCode = true
[safe]
	bareRepository = explicit

Step 8

If you are trying to transfer your SourceTree from one PC to another. The below are the steps you will need to perform.

Copy your data from one PC to the other in the same exact folder structure. Include .git and .gitignore files and folders.
On original PC go to %USERPROFILE%\AppData\Local\Atlassian\SourceTree and copy the following files to the new PC.
1. bookmarks.xml
2. opentabs.xml
On the new PC close SourceTree and copy the two files from the previous step and copy to the same location as was in original PC.
Ensure the new laptop has the same .gitconfig from old PC.

Step 9

Sometimes you might have issue with the repository settings for each repo. If you do just go in and fix it. Remove the origin and add back. Ensure you select “Default Remote”.

Installing Jenkins Using Docker on Windows

This post is how to install Jenkins using Docker in Windows.

Install Docker

Go to Docker and download “Docker Desktop for Windows – x86_64”. Then run as administrator and install. If you don’t have an account you can create one.

Then open Docker Desktop and go to settings and enable the following “Enable Docker terminal”.

Once you are done you need to start the Docker Engine.

If you get the following error “Failed to initialize: unable to resolve docker endpoint: open C:\Users\<USER>\.docker\machine\machines\default\ca.pem: The system cannot find the path specified.” Then go to environment variables and modify DOCKER_CERT_PATH and set to “C:\Users\<USER>\.docker\machine\certs”.

Then open the docker terminal.

Install Jenkins

Create a network called “jenkins”. This will give you back the id.

docker network create jenkins

You can check that it created successfully by running the following command

docker network ls

Next we run a docker:dind Docker image. If you don’t have the docker image it will download it.

A dind is a Docker In Docker allows developers to run a Docker container within an already running Docker container to support CI/CD pipelines and create sandboxed container environments.

docker run --name jenkins-docker --rm --detach --privileged --network jenkins --network-alias docker --env DOCKER_TLS_CERTDIR=/certs --volume jenkins-docker-certs:/certs/client --volume jenkins-data:/var/jenkins_home --publish 2376:2376 docker:dind

Next we create a Dockerfile

FROM jenkins/jenkins:2.462.2-jdk17
USER root
RUN apt-get update && apt-get install -y lsb-release
RUN curl -fsSLo /usr/share/keyrings/docker-archive-keyring.asc \
  https://download.docker.com/linux/debian/gpg
RUN echo "deb [arch=$(dpkg --print-architecture) \
  signed-by=/usr/share/keyrings/docker-archive-keyring.asc] \
  https://download.docker.com/linux/debian \
  $(lsb_release -cs) stable" > /etc/apt/sources.list.d/docker.list
RUN apt-get update && apt-get install -y docker-ce-cli
USER jenkins
RUN jenkins-plugin-cli --plugins "blueocean docker-workflow"

Next we will build a new docker image with the Dockerfile from the previous step. Be sure to name it appropriately and include versions. If you don’t have the right image it will get it for you. Make sure you are in the directory where you saved your Dockerfile.

docker build -t jenkins-blueocean:2.462.2-1 .

Next we run the image as a container in Docker

docker run --name jenkins-blueocean --restart=on-failure --detach --network jenkins --env DOCKER_HOST=tcp://docker:2376 --env DOCKER_CERT_PATH=/certs/client --env DOCKER_TLS_VERIFY=1 --volume jenkins-data:/var/jenkins_home --volume jenkins-docker-certs:/certs/client:ro --publish 8080:8080 --publish 50000:50000 jenkins-blueocean:2.462.2-1

Docker Container Logs using the below command. This will show you the password.

docker logs jenkins-blueocean

Record the password located “/var/jenkins_home/secrets/initialAdminPassword”.

docker exec jenkins-blueocean cat /var/jenkins_home/secrets/initialAdminPassword
 
or
 
docker exec jenkins-blueocean bash
cat /var/jenkins_home/secrets/initialAdminPassword
exit

Then navigate to http://localhost:8080 and enter the password in the “Unlock Jenkins” screen.

Then select the packages you want to install and click install.

Once the packages install then you need to create the admin user.

Then select your url and click “Save and Finish”.

Oracle JDBC

This post is how to connect to oracle from JDBC.

There are four main connection examples:

TCP to Single Host

jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<HOST>)(PORT=<PORT>))(CONNECT_DATA=(SERVICE_NAME=<SERVICE_NAME>)))

TCP to Multiple Hosts

jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<HOST1>)(PORT=<PORT1>))(ADDRESS=(PROTOCOL=TCP)(HOST=<HOST2>)(PORT=<PORT2>))(CONNECT_DATA=(SERVICE_NAME=<SERVICE_NAME>)))

SSL to Single Host

jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=<HOST>)(PORT=<PORT>))(CONNECT_DATA=(SERVICE_NAME=<SERVICE_NAME>)))

SSL to Multiple Hosts

jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=<HOST1>)(PORT=<PORT1>))(ADDRESS=(PROTOCOL=TCPS)(HOST=<HOST2>)(PORT=<PORT2>))(CONNECT_DATA=(SERVICE_NAME=<SERVICE_NAME>)))

Connect Using PySpark With Databricks

You will need to get the secret from a keyvault. Refer to Databricks: Get Secret

secret = <Get_Secret>
 
df = spark.read \
    .format("jdbc") \
    .option("url", "<URL>") \
    .option("dbtable", "<SCHEMA>.<TABLE>") \
    .option("user", "<ACCOUNT>") \
    .option("password", secret) \
    .option("driver", "oracle.jdbc.driver.OracleDriver") \
    .laod()

Connect Using SQL With Databricks

It should be noted that this isn’t a really good way and I don’t believe is completely secure.

Add cluster configuration

spark.databricks.secureVariableSubstitute.enabled false
spark.<SOME_VARIABLE_NAME> {{secrets/<SCOPE>/<SECRET>}}

Code

%sql
 
CREATE SCHEMA IF NOT EXISTS <SCHEMA_NAME>;
 
DROP VIEW IF EXISTS <VIEW_NAME>;
 
CREATE TEMPORARY VIEW <SCHEMA_NAME>.<VIEW_NAME>
USING org.apache.spark.sql.jdbc
OPTIONS (
   url "<URL>",
   dbtable "<SCHEMA_NAME>.<TABLE_NAME>",
   user "<ACCOUNT>",
   password "${spark.<SOME_VARIABLE_NAME>}",
   driver "oracle.jdbc.driver.OracleDriver"
)

Azure: Tags

This post is how to work with tags for Azure resources.

Install Graph Extension

az extension add --name resource-graph

Tag List

az tag list --subscription <NAME>

Query for Specific Tag Value

az graph query -q "project name, resourceGroup, type, tags | where tags.<TAGNAME>=~'<VALUE>'"

Query for Multiple Tags

az graph query -q "project name, resourceGroup, type, tags | where tags.<TAGNAME>=~'<VALUE>' | where tags.<TAGNAME>=~'<VALUE>'"

Query for Resource Groups

az graph query -q "ResourceContainers | project name, type, tags | where tags.<TAGNAME>=~'<VALUE>'"

Query For Multiple Resource Types

az graph query -q "project name, resourceGroup, type, tags | where tags.<TAGNAME>=~'<VALUE>' | where type =~ 'microsoft.sql/servers/databases' or type =~ 'microsoft.storage/storageaccounts'"

Databricks: Python SDK

This post is how to use the Databricks Python SDK.

Install the Package

pip install databricks-sdk

Update Package

pip install databricks-sdk --upgrade

Check Package Version

pip show databricks-sdk | grep -oP '(?<=Version: )\S+'

Setup WorkspaceClient

from databricks.sdk import WorkspaceClient
 
secret = dbutils.secrets.get(scope = "<SCOPE>", key = "<KEY>")
 
w = WorkspaceClient(
  host = 'https://<URL>/'
  azure_workspace_resource_id = '<RESOURCE_ID_OF_DATABRICKS>',
  azure_tenant_id = '<TENANT_ID>',
  azure_client_id = '<CLIENT_ID>',
  azure_client_secret = secret
)

Setup AccountClient

You can get the account_id from the databricks account portal. By your id in the top right hand corner.

from databricks.sdk import AccountClient
 
secret = dbutils.secrets.get(scope = "<SCOPE>", key = "<KEY>")
 
a = AccountClient(
  host = 'https://accounts.azuredatabricks.net'
  account_id = '<ACCOUNT_ID>'
  azure_tenant_id = '<TENANT_ID>',
  azure_client_id = '<CLIENT_ID>',
  azure_client_secret = secret
)

List Workspace Groups

NOTE: You must also setup the workspaceclient to do this.

w.groups.list()

List Account Groups

NOTE: You must also setup the accountclient to do this. You must also be account admin.

a.groups.list()

Create Storage Credential

NOTE: Your SPN must be account admin to do this. You must also setup the workspaceclient to do this.

from databricks.sdk.service.catalog import AzureManagedIdentity
 
storage_credential_name = '<CREDENTIAL_NAME>'
comment = '<COMMENT>'
connector_id = '<DATABRICKS_ACCESS_CONNECTOR>'
az_mi = AzureManagedIdentity(access_connector_id = connector_id)
 
w.storage_credenditals.create(
  name = storage_credential_name,
  azure_managed_identity = az_mi
  comment = comment
)

Databricks: Get Current User

This post is how to find the current logged in user.

You can run the following command or look at the UI to see who is currently logged in.

SELECT CURRENT_USER();

PySpark DataFrame Methods

This post shows different methods of a DataFrame.

Get the first value in a column

df = some_dataframe_definition
 
value = df.select("SOME_COLUMN_NAME").first()[0]

Convert Dataframe to JSON

df = some_dataframe_definition
 
result_json = df.toJSON()

Get a Row

df = some_dataframe_definition
 
row = df.collect()[0]      #You can switch out 0 for whatever row you want.

Count rows of Dataframe

df = some_dataframe_definition
 
num_rows = df.count()

Databricks Unity Catalog SQL Commands

This post is basic commands you will need to know for working with Unity Catalog.

Display Current Metastore

SELECT CURRENT_METASTORE();

Display Current Catalog

SELECT CURRENT_CATALOG();

Create Catalog

CREATE CATALOG IF NOT EXISTS  <Catalog_Name> COMMENT 'A COMMENT';

Create Catalog With Location

CREATE CATALOG IF NOT EXISTS <Catalog_Name> MANAGED LOCATION 'abfss://<METASTORE_CONTAINER_NAME>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<CATALOG_NAME>' COMMENT 'A COMMENT';

Describe Catalog

DESCRIBE CATALOG <Catalog_Name>;

Create Schema

CREATE SCHEMA IF NOT EXISTS <SCHEMA_NAME> COMMENT '<COMMENT>';

Create Schema With Location

CREATE SCHEMA IF NOT EXISTS <SCHEMA_NAME> MANAGED LOCATION 'abfss://<METASTORE_CONTAINER_NAME>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<CATALOG_NAME>/<SCHEMA_NAME>' COMMENT '<COMMENT>';

Show All Storage Credentials

SHOW STORAGE CREDENTIALS;

Describe Credential

DESCRIBE STORAGE CREDENTIAL <CREDENTIAL_NAME>;

Create External Location

You will first need a storage credential.

You can reference down to the full table path or keep it at the container

CREATE EXTERNAL LOCATION IF NOT EXISTS <NAME>
URL 'abfss://<COTNAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/'
WITH (STORAGE CREDENTIAL <CREDENTIAL_NAME>)
COMMENT '<COMMENT>';

Create External Table

CREATE TABLE <CATALOG_NAME>.<SCHEMA_NAME>.<TABLE_NAME>
USING <FORMAT>
LOCATION 'abfss://<COTNAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/FOLDER/PATH;'

Grant Create Storage Credential on Metastore

GRANT CREATE STORAGE CREDENTIAL ON METASTORE TO `<USER>`;

Grant Permission to Create External Locations on Storage Credential

GRANT CREATE EXTERNAL LOCATION ON STORAGE CREDENTIAL <CREDENTIAL_NAME> TO `<USER>`;

Grant Permission to Create External Location On Metastored

GRANT CREATE EXTERNAL LOCATION ON METASTORE TO `<USER>`;

Grant Permission to Use Catalog

GRANT USE_CATALOG ON CATALOG <CATALOG_NAME> TO `<USER>`;

Show all Grants On Metastore

SHOW GRANTS `<USER>` ON METASTORE;

Grant Permission to Use Schema

GRANT USE_SCHEMA ON SCHEMA <CATALOG_NAME>.<SCHEMA_NAME> TO `<USER>`;

Grant Permission to Create Table

GRANT CREATE TABLE ON SCHEMA <CATALOG_NAME>.<SCHEMA_NAME> TO <USER>;

Databricks Unity Catalog Rest API’s

This post is how to work with Databricks Unity Catalog Rest API’s.

Set Catalog Isolation Mode to ISOLATED

curl --location --request PATCH 'https://<DATABRICK_URL>/api/2.1/unity-catalog/catalogs/<CATALOG_NAME>' \
--header 'Authorization: Bearer <TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
"isolation_mode": "ISOLATED"
}'

Bind Workspace to Catalog

curl --location --request PATCH 'https://<DATABRICK_URL>/api/2.1/unity-catalog/bindings/catalog/<CATALOG_NAME>' \
--header 'Authorization: Bearer <TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
"add": [{ "workspace_id": "<WORKSPACEE_ID>", "binding_type": "BINDING_TYPE_READ_WRITE" }]
"remove": []
}'

Unbind Workspace to Catalog

curl --location --request PATCH 'https://<DATABRICK_URL>/api/2.1/unity-catalog/bindings/catalog/<CATALOG_NAME>' \
--header 'Authorization: Bearer <TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
"unassign_workspaces": ["<WORKSPACE_ID>"]
}'

List Workspaces Assigned to Catalog

curl --location --request GET 'https://<DATABRICK_URL>/api/2.1/unity-catalog/bindings/catalog/<CATALOG_NAME>' \
--header 'Authorization: Bearer <TOKEN>' \
--header 'Content-Type: application/json'

Synapse: Get KeyVault Properties Using Token Library

This post is how to get the key vault properties using the token library.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

import sys
from pyspark.sql import SparkSession
 
linked_service_name = '<KEYVAULT_LINKED_SERVICE_NAME>'
spark = <GET_SPARK_SESSION>
token_library = spark._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary
key_vault_url = token_library.getFullConnectionStringAsMap(linked_service_name).get('url')
 
print(key_vault_url)
print(token_library.getFullConnectionStringAsMap(linked_service_name))

Synapse: SAS Token

This post is how to get the SAS token from a notebook.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

from notebookutils import mssparkutils
 
linked_service_storage_account_name = '<LINKED_SERVICE_STORAGE_NAME>'
blob_sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_storage_account_name)
 
spark.conf.set('fs.azure.sas.<CONTAINER_NAME>.<ADLS_STORAGE_ACCOUNT_NAME>.blob.core.windows.net', blob_sas_token

Synapse: Key Vault Linked Service

This post is how to get the key vault linked service properties.

from notebookutils import mssparkutils
import json
 
linked_service_name = '<SYNAPSE_KEYVAULT_LINKED_SERVICE_NAME>'
json.loads(mssparkutils.credentials.getPropertiesAll(linked_service_name))

Synapse: Environment Variables

This post is how to work with environment variables in Synapse.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

Get Environment Variable

It should be noted that “str” is the type that variable is. You can change it to whatever is required.

var: str = spark.conf.get('spark.executorEnv.<ENV_NAME>')

Set Environment Variable

spark.conf.set('spark.executorEnv.<ENV_NAME>', '<VALUE>')

Synapse: List Python Packages

This post is how to list the python packages in various ways.

You can use %pip to list the python packages that are installed.

%pip freeze

However doing it that way may not give you the exact versions that are installed. To get a comprehensive list do the following.

import pkg_resources
 
for package in pkg_resources.working_set:
    print(package)

Synapse: Help Command

This post is just how to use the help command from mssparkutils.

You can use help at various levels of Synapse.

Root

The following command will tell you what areas help can assist you in. This will respond with

fs
notebook
credentials
env

from notebookutils import mssparkutils
 
mssparkutils.help()

FileSystem

If you leave the help command empty it will just return all options that are available for help. If you put a command in then it will explain that command in greater detail.

from notebookutils import mssparkutils
 
mssparkutils.fs.help()
 
mssparkutils.fs.help('cp')

Synapse: Mounts

This post is how to work with mounts on Synapse.

I suggest mounting to an ADLS storage account. That is what I will assume in the below examples.

List Mounts

from notebookutils import mssparkutils
 
mssparkutils.fs.mounts()

Get Mount Path

The output of this command will produce ‘/synfs/<number>/mnt/<CONTAINER_NAME>’

from notebookutils import mssparkutils
 
mount_name = "/mnt/<CONTAINER_NAME>"
mount_path = mssparkutils.fs.getMountPath(mount_name)

Unmount

from notebookutils import mssparkutils
 
mount_name = "/mnt/<CONTAINER_NAME>"
mssparkutils.fs.unmount(mount_name)

Mount Using a Linked Service

First you must have a linked service created to the storage account. This linked service must be hard-coded and not parameterized in any way.

from notebookutils import mssparkutils
 
container = '<CONTAINER_NAME>'
storage_account = '<STORAGE_ACCOUNT_NAME>'
sub_folder = '<SUB_FOLDER>' #it should be noted that this isn't required.
linked_service_name = '<LINKED_SERVICE_NAME>'
 
mssparkutils.fs.mount(
    source='abfss://%s@%s.dfs.core.windows.net/%s/' % (container, storage_account, sub_folder),
    mountPoint='/mnt/%s' % (container),
    {'linkedService':linked_service_name, 'fileCacheTimeout': 120, 'timeout': 120}
)

Mount Using Configs

You will need to get the secret. Refer to Synapse: Get Secret

from notebookutils import mssparkutils
 
client_id = '<CLIENT_ID>'
tenant_id = '<TENANT_ID>'
container = '<CONTAINER_NAME>'
storage_account = '<STORAGE_ACCOUNT_NAME>'
sub_folder = '<SUB_FOLDER>' #it should be noted that this isn't required.
 
configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.oauth.provider.type": "org.apache.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id": client_id,
  "fs.azure.account.oauth2.client.secret": secret,
  "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/" tenant_id + "/oauth2/token"
}
 
mssparkutils.fs.mount(
  source='abfss://%s@%s.dfs.core.windows.net/%s' % (container, storage_account, sub_folder),
  mountPoint='/mnt/%s' % (container),
  extraConfigs=configs
)

PySpark: Delta Lake

This post is how to use pyspark to work with Delta Tables.

For more information on Delta Lake you can refer here.

First you need to install the “delta-spark” package for whatever version you require.

pip install delta-spark==3.1.0

Setup a Spark Session.

To read delta tables you can refer to PySpark: Read From ADLS to DataFrame.

To write delta tables you can refer to PySpark: Save a DataFrame To ADLS.

Vacuum Delta Table

from delta.tables import DeltaTable
 
vacuum_hrs = 100
path = 'abfss://<CONTAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/'
 
delta_table = DeltaTable.forPath(spark, path)
delta_table.vacuum(vacuum_hrs)

Compaction

Impoves reads by merging small files into larger ones.

from delta.tables import DeltaTable
 
path = 'abfss://<CONTAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/' 
 
delta_table = DeltaTable.forPath(spark, path)
delta_table.optimize().executeCompaction()

Z-Order

from delta.tables import DeltaTable
 
path = 'abfss://<CONTAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/' 
columns = ''
 
delta_table = DeltaTable.forPath(spark, path)
delta_table.optimize().executeZOrderBy(columns)

Delete

from delta.tables import DeltaTable
import pyspark.sql.functions as F
 
path = 'abfss://<CONTAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/' 
 
delta_table = DeltaTable.forPath(spark, path)
delta_table.delete(F.col('<MY_COL>') == '<SOME_VAL>')
 
#You can also use sql
delta_table.delete("column == 'some_VALUE'")

Modify Properties

You can refer here for more properties.

dataSkippingNumIndexedCols

You would do this if you have over the max columns that the delta lake can collect statistics on. Default value is 32.

path = 'abfss://<CONTAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/'
skip_cols = <SOME_VALUE>
 
spark.sql("ALTER TABLE delta.`%s` SET TBLPROPERTIES ('delta.dataSkippingNumIndexedCols' == '%s')" % (path, skip_cols))

Databricks: Widgets Text

This post explains how to setup a text input parameter for your Databricks Notebooks.

First you need to setup dbutils.

dbutils.widgets.text("<NAME>", "<LABEL>", "<NAME OF CALLING PROCESS>")

Get Argument

variable = dbutils.widgets.getArgument("<NAME>")

Synapse: Notebook Exit

This post is about how to exit and notebook and return a variable to the calling process.

It should be noted that Synapse can return any variable.

var = <SOME_VALUE>
 
from notebookutils import mssparkutils
mssparkutils.notebook.exit(var)