Databricks: Rest API

This post is how to communicate with Databricks using Rest API’s.

Databricks Resource ID = 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d

Get Bearer Token for Service Principal

  1. curl -X GET https://login.microsoft.com/<TENANTID>/oauth2/token -H 'Content-Type: application/x-www-form-urlencoded' -d'grant_type=client_credential&client_id=<CLIENTID>&resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d&client_secret=<SECRET>

Get Bearer Token for Service Principal Using management.core.windows.net

  1. curl -X GET https://login.microsoftonline.com/<TENANTID>/oauth2/token -H 'Content-Type: application/x-www-form-urlencoded' -d'grant_type=client_credential&client_id=<CLIENTID>&resource=https://management.core.windows.net/&amp;client_secret=<SECRET>'

Start Cluster

  1. curl --location -g --trace -X --request POST -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/clusters/start -d '{ "cluster_id": "<CLUSTER_ID>"}'

Stop Cluster

  1. curl --location -g --trace -X --request POST -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/clusters/stop -d '{ "cluster_id": "<CLUSTER_ID>"}'

List Clusters

  1. curl --location -g --trace -X --request GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/clusters/list

Job List

  1. curl --location -g --trace -X --request GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/jobs/list

Job Python Run

  1. curl --location -g --trace -X --request POST -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/jobs/run-now -d '{"job_id": <JOB_ID>, "python_params": [] }'

Job Get

  1. curl --location -g --trace -X --request GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/jobs/runs/get?run_id=<JOB_RUN_ID>

Create Job

Databricks Create Job

  1. curl --location -g --trace -X --request POST -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/jobs/create -d '<PAYLOAD>'

Create Job Payload

  1. {
  2. "name": "<NAME>",
  3. "max_concurrent_runs": 1,
  4. "tasks": [
  5. {
  6. "task_key": "<TASK_KEY>",
  7. "run_if": "ALL_SUCCESS",
  8. "max_retries": 1,
  9. "timeout_seconds": <TIMEOUT_SECONDS>,
  10. "notebook_tasks": {
  11. "notebook_path": "<PATH>",
  12. "source": "WORKSPACE",
  13. "base_parameters": {
  14. "<KEY>": "<VALUE>",
  15. "<KEY2>": "<VALUE2>",
  16. }
  17. },
  18. "libraries": [
  19. {
  20. "pypi": {
  21. "package": "<PACKAGE_NAME==VERSION>",
  22. "coordinates": ""
  23. }
  24. },
  25. {
  26. "jar": "<LOCATION>"
  27. }
  28. ],
  29. "new_cluster": {
  30. "custom_tags": {
  31. "<TAG_NAME>": "<TAG_VALUE>"
  32. },
  33. "azure_attributes": {
  34. "first_on_demand": 1,
  35. "availability": "SPOT_AZURE",
  36. "spot_bid_max_price": 75
  37. },
  38. "instance_pool_id": "<WORKER_INSTANCE_POOL_ID>",
  39. "driver_instances_pool_id": "<DRIVER_INSTANCE_POOL_ID>",
  40. "data_security_mode": "SINGLE_USER",
  41. "spark_version": "<SPARK_VERSION>",
  42. "node_type_id": "<NODE_TYPE_ID>",
  43. "runtime_engine": "STANDARD",
  44. "policy_id": "<POLICY_ID>",
  45. "autoscale": {
  46. "min_workers": <MIN_WORKERS>,
  47. "max_workers": <MAX_WORKERS>
  48. },
  49. "spark_conf": {
  50. "<CONFIG_KEY>": "<CONFIG_VALUE>"
  51. },
  52. "cluster_log_conf": {
  53. "dbfs": {
  54. "destination": "<LOG_DESTINATION>"
  55. }
  56. },
  57. "spark_env_vars": {
  58. "<ENV_NAME>": "<ENV_VALUE>"
  59. },
  60. "init_scripts": [
  61. {
  62. "volumes": {
  63. "destination": "<INIT_SCRIPT_LOCATION>"
  64. }
  65. }
  66. ]
  67. }
  68. }
  69. ],
  70. "format": "SINGLE_TASK"
  71. }

Job Permission Patch

  1. curl --location -g --trace -X --request PATCH -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/permissions/jobs/<JOB_ID> -d '{ "access_control_list": [{ "group_name": "<GROUP_NAME>", "permission_level": "<PERMISSION>"}]}'

Get Service Principal List

  1. curl -X GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/preview/scim/v2/ServicePrincipals

Delete Service Principal List From Databricks ONLY

  1. curl --location -g --trace -X --request DELETE -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/preview/scim/v2/ServicePrincipals/<APPLICATION_ID>

Add Service Principal To Databricks

  1. curl --location --request POST 'https://<DATABRICKS_url>/api/2.0/preview/scim/v2/ServicePrincipals' --header 'Authorization: Bearer <TOKEN>' --header 'Content-Type: application/json' --data-raw '{ "schemas": ["urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal"], "applicationId": "<CLIENTID>", "displayName": "<DISPLAYNAME>", "groups": [{"value": "<GROUP_ID>"}], "entitlements": [{ "value": "allow-cluster-create"}] }'

List Secret Scopes

  1. curl --location -g --trace -X --request GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/secrets/scopes/list

Create KeyVault Secret Scope

  1. curl --location -g --trace -X --request POST -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/secrets/scopes/create -d '{"scope": "<Keyvault_name>", "scope_backend_type": "AZURE_KEYVAULT", "backend_azure_keyvault": {"resource_id": "<RESOURCE_ID>", "dns_name": "<KEYVAULT_URL>"}, "initial_manage_principal": "users"}'

IP Access Lists

  1. curl -X GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/ip-access-lists

List Git Repos

  1. curl --location -g --trace -X --request GET -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/repos

Update Git Repo

  1. curl --location -g --trace -X --request POST -H 'Authorization: Bearer <TOKEN>' https://<DATABRICKS_url>/api/2.0/repos/<REPO_ID> -d '{ "branch": "<BRANCH_NAME>" }'

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Azure: Install/Configure CLI

This post will show you how to install the Azure CLI.

First you need to install the CLI.

Once it is installed you can set your config directory. This is useful for having multiple logins going at the same time.

  1. set AZURE_CONFIG_DIR=<YOUR_DIRECTORY>

You can then login. There are different ways to do that

Way 1: This will popup a login where you enter your login credentials

  1. az login

Way 2: This will ask you for password via the command line

  1. az login -u <YOUR_LOGIN>

Way 3:

  1. az login -u <YOUR_LOGIN> -p <YOUR_PASSWORD>

Way 4: logs in as a service principal

  1. az login --service-principal --user-name <SPN_ID> --password <SPN_KEY> --tenant <TENANTID>

Show your Account

  1. az account show

Set Account Subscription

  1. az account set -s <SUBSCRIPTION_ID>

List Tags For A Resource

  1. az tag list --subscription <SUBSCRIPTION_NAME>

Install Graph

  1. az extension add --name resource-graph

Query for Anything that Has a Tag

  1. az graph query -q "resourceGraoup, type, tags" | where tags.<TAG_NAME>=~'<VALUE>'

Query for More than One Tag

  1. az graph query -q "resourceGraoup, type, tags" | where tags.<TAG_NAME>=~'<VALUE>' | tags.<TAG_NAME>=='<VALUE>'

Query Type

  1. az graph query -q "resourceGroup, type, tags" | where type =~ 'microsoft.sql/servers/databases'

 

Databricks: Set Spark Configs

This post is how to set the spark configs on Databricks or Synapse Notebooks.

First you will need a spark session. Refer to PySpark: Create a Spark Session for more details.

  1. secret = 'value' #I highly suggest you get the password from the keyvault
  2. storage_account = ''
  3. application_id = ''
  4. tenant_id = ''
  5.  
  6. spark.config.set('fs.azure.account.auth.type.{}.dfs.core.windows.net'.format(storage_account), 'OAuth')
  7.  
  8. spark.config.set('fs.azure.account.oauth.provider.type.{}.dfs.core.windows.net'.format(storage_account), 'org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider')
  9.  
  10. spark.config.set('fs.azure.account.oauth2.client.id.{}.dfs.core.windows.net'.format(storage_account), application_id)
  11.  
  12. spark.config.set('fs.azure.account.oauth2.client.secret.{}.dfs.core.windows.net'.format(storage_account), secret)
  13.  
  14. spark.config.set('fs.azure.account.oauth2.client.endpoint.{}.dfs.core.windows.net'.format(storage_account), 'https://login.microsoftonline.com/{}/oauth2/token'.format(tenant_id))
  15.  

If you are running in Databricks you could add them to cluster start. Although I recommand doing it in a notebook instead.

  1. spark.hadoop.fs.azure.account.auth.type.<STORAGE_ACCOUNT>.dfs.core.windows.net OAuth
  2. fs.azure.account.oauth.provider.type.<STORAGE_ACCOUNT>.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
  3. fs.azure.account.oauth2.client.id.<STORAGE_ACCOUNT>.dfs.core.windows.net <CLIENT_ID>
  4. fs.azure.account.oauth2.client.secret.<STORAGE_ACCOUNT>.dfs.core.windows.net secret
  5. fs.azure.account.oauth2.client.endpoint.<STORAGE_ACCOUNT>.dfs.core.windows.net https://login.microsoftonline.com/<TENANT_ID>/oauth2/token

 

 

Azure: EventHub

In this tutorial I will show you how to connect to event hub from Python. Ensure you have first installed an IDE (Eclipse) and Python3.7.

Python Package Installation

  1. pip3 install azure-eventhub

Create a Producer

This will publish events to event hub. The important part here is the “EndPoint”. You need to login to Azure Portal and get the get the endpoint from the “Shared Access Policies” from the event hub namespace.

  1. from azure.eventhub import EventHubProducerClient, EventData, EventHubConsumerClient
  2.  
  3. connection_str = 'Endpoint=sb://testeventhubnamespace.servicebus.windows.net/;SharedAccessKeyName=<<THE_ACCESS_KEY_NAME>>;SharedAccessKey=<<THE_ACCESS_KEY>>'
  4. eventhub_name = '<<THE_EVENT_HUB_NAME>>'
  5. producer = EventHubProducerClient.from_connection_string(connection_str, eventhub_name=eventhub_name)
  6.  
  7. event_data_batch = producer.create_batch()
  8.  
  9. event_data_batch.add(EventData('My Test Data'))
  10.  
  11. with producer:
  12. producer.send_batch(event_data_batch)

Create a Consumer

This will monitor the event hub for new messages.

  1. from azure.eventhub import EventHubProducerClient, EventData, EventHubConsumerClient
  2.  
  3. connection_str = 'Endpoint=sb://testeventhubnamespace.servicebus.windows.net/;SharedAccessKeyName=<<THE_ACCESS_KEY_NAME>>;SharedAccessKey=<<THE_ACCESS_KEY>>'
  4. eventhub_name = '<<THE_EVENT_HUB_NAME>>'
  5. consumer_group = '<<THE_EVENT_HUB_CONSUMER_GROUP>>'
  6. client = EventHubConsumerClient.from_connection_string(connection_str, consumer_group, eventhub_name=eventhub_name)
  7.  
  8. def on_event(partition_context, event):
  9. print("Received event from partition {} - {}".format(partition_context.partition_id, event))
  10. partition_context.update_checkpoint(event)
  11.  
  12. with client:
  13. #client.receive(
  14. # on_event=on_event,
  15. # starting_position="-1", # "-1" is from the beginning of the partition.
  16. #)
  17. client.receive(
  18. on_event=on_event
  19. )
  20.