Databricks: Check for Orphaned Workspace Directories

In this post I will show you how to check if your Databricks workspace has directories that are orphaned to users/service principals.

import requests
from requests.auth import HTTPBasicAuth
import json

DATABRICKS_INSTANCE = dbutils.widgets.get('url')
TOKEN = dbutils.widgets.get('token')

#Get list of directories
endpoint = f'{DATABRICKS_INSTANCE}/api/2.0/workspace/list'
params = {
    'path': '/Users/'
}
response = requests.get(endpoint, params=params, auth=HTTPBasicAuth('token', TOKEN))
response.raise_for_status()
json_response = response.json()

for i, result in json_response.items():
    for value in result:
        directory = value["path"]
        user = directory.replace("/Users/", "")

        if '@' in user:
            endpoint = f'{DATABRICKS_INSTANCE}/api/2.0/preview/scim/v2/Users'
            params = {
                'filter': 'userName eq "%s"' % (user)
            }
            response = requests.get(endpoint, params=params, auth=HTTPBasicAuth('token', TOKEN))
            account = response.json().get('Resources', [])
            if not account:
                print('Account %s doesn't exist' % (account))
        else:
            endpoint = f'{DATABRICKS_INSTANCE}/api/2.0/preview/scim/v2/ServicePrincipals'
            params = {
                'filter': 'applicationId eq "%s"' % (user)
            }
            response = requests.get(endpoint, params=params, auth=HTTPBasicAuth('token', TOKEN))
            sp = response.json().get('Resources', [])
            if not sp:
                print('SP %s doesn't exist' % (sp))

SourceTree Installation on Windows

This post is how to install/Configure SourceTree.

Step 1

Install SourceTree.

Step 2

Install Beyond Compare.

Step 3

Install Git

Step 4

Create or Import your SSH key by going to SourceTree and clicking Tools then click “Create or import SSH Keys”. Follow the instructions then click “Save public key” and “Save private key” next to “Save the generated key”. Save public key to C:\Users\<USER>\.ssh\id_rsa.pub. Save private key to C:\Users\<USER>\.ssh\id_rsa.

Step 5

Update to latest SourceTree build by opening SourceTree click Tools –> Options then click “Updates” tab and update if required.

Step 6

Configure any options you want by going to Tools –> Options.

General Tab

  • Choose your theme
  • Setup SSH Client Configuration
  • Setup Default user information
  • Set your remote check for updates
  • Reopen repository tabs at startup
  • Spell check commit messages

Diff Tab

  • Convert tabs to spaces with width 4
  • External Diff tool Beyond Compare

Git Tab

  • Set Global Ignore List
  • Prune tracking branches no longer present on remote(s)
  • Push all tags to remotes
  • Fetch all tags

Authentication Tab

Setup any auth you have.

Step 7

Setup C:\Users\<USER>\.gitconfig like the below.

[core]
	autocrlf = true
[user]
	name = GaudreaultPC\\Gaudreault
	email = oliver@gaudreault.ca
[difftool "sourcetree"]
	cmd = 'C:/Program Files/Beyond Compare 5/BCompare.exe' \"$LOCAL\" \"$REMOTE\"
[mergetool "sourcetree"]
	cmd = "'' "
	trustExitCode = true
[safe]
	bareRepository = explicit

Step 8

If you are trying to transfer your SourceTree from one PC to another. The below are the steps you will need to perform.

  1. Copy your data from one PC to the other in the same exact folder structure. Include .git and .gitignore files and folders.
  2. On original PC go to %USERPROFILE%\AppData\Local\Atlassian\SourceTree and copy the following files to the new PC.
    1. bookmarks.xml
    2. opentabs.xml
  3. On the new PC close SourceTree and copy the two files from the previous step and copy to the same location as was in original PC.
  4. Ensure the new laptop has the same .gitconfig from old PC.

Step 9

Sometimes you might have issue with the repository settings for each repo. If you do just go in and fix it. Remove the origin and add back. Ensure you select “Default Remote”.

 

Installing Jenkins Using Docker on Windows

This post is how to install Jenkins using Docker in Windows.

Install Docker

Go to Docker and download “Docker Desktop for Windows – x86_64”. Then run as administrator and install. If you don’t have an account you can create one.

Then open Docker Desktop and go to settings and enable the following “Enable Docker terminal”.

Once you are done you need to start the Docker Engine.

 

 

 

If you get the following error “Failed to initialize: unable to resolve docker endpoint: open C:\Users\<USER>\.docker\machine\machines\default\ca.pem: The system cannot find the path specified.” Then go to environment variables and modify DOCKER_CERT_PATH and set to “C:\Users\<USER>\.docker\machine\certs”.

Then open the docker terminal.

Install Jenkins

Create a network called “jenkins”. This will give you back the id.

docker network create jenkins

You can check that it created successfully by running the following command

docker network ls

Next we run a docker:dind Docker image. If you don’t have the docker image it will download it.

A dind is a Docker In Docker allows developers to run a Docker container within an already running Docker container to support CI/CD pipelines and create sandboxed container environments.

docker run --name jenkins-docker --rm --detach --privileged --network jenkins --network-alias docker --env DOCKER_TLS_CERTDIR=/certs --volume jenkins-docker-certs:/certs/client --volume jenkins-data:/var/jenkins_home --publish 2376:2376 docker:dind

Next we create a Dockerfile

FROM jenkins/jenkins:2.462.2-jdk17
USER root
RUN apt-get update && apt-get install -y lsb-release
RUN curl -fsSLo /usr/share/keyrings/docker-archive-keyring.asc \
  https://download.docker.com/linux/debian/gpg
RUN echo "deb [arch=$(dpkg --print-architecture) \
  signed-by=/usr/share/keyrings/docker-archive-keyring.asc] \
  https://download.docker.com/linux/debian \
  $(lsb_release -cs) stable" > /etc/apt/sources.list.d/docker.list
RUN apt-get update && apt-get install -y docker-ce-cli
USER jenkins
RUN jenkins-plugin-cli --plugins "blueocean docker-workflow"

Next we will build a new docker image with the Dockerfile from the previous step. Be sure to name it appropriately and include versions. If you don’t have the right image it will get it for you. Make sure you are in the directory where you saved your Dockerfile.

docker build -t jenkins-blueocean:2.462.2-1 .

Next we run the image as a container in Docker

docker run --name jenkins-blueocean --restart=on-failure --detach --network jenkins --env DOCKER_HOST=tcp://docker:2376 --env DOCKER_CERT_PATH=/certs/client --env DOCKER_TLS_VERIFY=1 --volume jenkins-data:/var/jenkins_home --volume jenkins-docker-certs:/certs/client:ro --publish 8080:8080 --publish 50000:50000 jenkins-blueocean:2.462.2-1

Docker Container Logs using the below command. This will show you the password.

docker logs jenkins-blueocean

Record the password located “/var/jenkins_home/secrets/initialAdminPassword”.

docker exec jenkins-blueocean cat /var/jenkins_home/secrets/initialAdminPassword

or

docker exec jenkins-blueocean bash
cat /var/jenkins_home/secrets/initialAdminPassword
exit

Then navigate to http://localhost:8080 and enter the password in the “Unlock Jenkins” screen.

 

 

 

 

 

Then select the packages you want to install and click install.

Once the packages install then you need to create the admin user.

Then select your url and click “Save and Finish”.

 

 

 

 

 

Oracle JDBC

This post is how to connect to oracle from JDBC.

There are four main connection examples:

TCP to Single Host
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<HOST>)(PORT=<PORT>))(CONNECT_DATA=(SERVICE_NAME=<SERVICE_NAME>)))
TCP to Multiple Hosts
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<HOST1>)(PORT=<PORT1>))(ADDRESS=(PROTOCOL=TCP)(HOST=<HOST2>)(PORT=<PORT2>))(CONNECT_DATA=(SERVICE_NAME=<SERVICE_NAME>)))
SSL to Single Host
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=<HOST>)(PORT=<PORT>))(CONNECT_DATA=(SERVICE_NAME=<SERVICE_NAME>)))
SSL to Multiple Hosts
jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(HOST=<HOST1>)(PORT=<PORT1>))(ADDRESS=(PROTOCOL=TCPS)(HOST=<HOST2>)(PORT=<PORT2>))(CONNECT_DATA=(SERVICE_NAME=<SERVICE_NAME>)))
Connect Using PySpark With Databricks

You will need to get the secret from a keyvault. Refer to Databricks: Get Secret

secret = <Get_Secret>

df = spark.read \
    .format("jdbc") \
    .option("url", "<URL>") \
    .option("dbtable", "<SCHEMA>.<TABLE>") \
    .option("user", "<ACCOUNT>") \
    .option("password", secret) \
    .option("driver", "oracle.jdbc.driver.OracleDriver") \
    .laod()
Connect Using SQL With Databricks

It should be noted that this isn’t a really good way and I don’t believe is completely secure.

Add cluster configuration

spark.databricks.secureVariableSubstitute.enabled false
spark.<SOME_VARIABLE_NAME> {{secrets/<SCOPE>/<SECRET>}}

Code

%sql

CREATE SCHEMA IF NOT EXISTS <SCHEMA_NAME>;

DROP VIEW IF EXISTS <VIEW_NAME>;

CREATE TEMPORARY VIEW <SCHEMA_NAME>.<VIEW_NAME>
USING org.apache.spark.sql.jdbc
OPTIONS (
   url "<URL>",
   dbtable "<SCHEMA_NAME>.<TABLE_NAME>",
   user "<ACCOUNT>",
   password "${spark.<SOME_VARIABLE_NAME>}",
   driver "oracle.jdbc.driver.OracleDriver"
)

 

Azure: Tags

This post is how to work with tags for Azure resources.

Install Graph Extension

az extension add --name resource-graph

Tag List

az tag list --subscription <NAME>

Query for Specific Tag Value

az graph query -q "project name, resourceGroup, type, tags | where tags.<TAGNAME>=~'<VALUE>'"

Query for Multiple Tags

az graph query -q "project name, resourceGroup, type, tags | where tags.<TAGNAME>=~'<VALUE>' | where tags.<TAGNAME>=~'<VALUE>'"

Query for Resource Groups

az graph query -q "ResourceContainers | project name, type, tags | where tags.<TAGNAME>=~'<VALUE>'"

Query For Multiple Resource Types

az graph query -q "project name, resourceGroup, type, tags | where tags.<TAGNAME>=~'<VALUE>' | where type =~ 'microsoft.sql/servers/databases' or type =~ 'microsoft.storage/storageaccounts'"

 

Databricks: Python SDK

This post is how to use the Databricks Python SDK.

Install the Package

pip install databricks-sdk

Update Package

pip install databricks-sdk --upgrade

Check Package Version

pip show databricks-sdk | grep -oP '(?<=Version: )\S+'

Setup WorkspaceClient

from databricks.sdk import WorkspaceClient

secret = dbutils.secrets.get(scope = "<SCOPE>", key = "<KEY>")

w = WorkspaceClient(
  host = 'https://<URL>/'
  azure_workspace_resource_id = '<RESOURCE_ID_OF_DATABRICKS>',
  azure_tenant_id = '<TENANT_ID>',
  azure_client_id = '<CLIENT_ID>',
  azure_client_secret = secret
)

Setup AccountClient

You can get the account_id from the databricks account portal. By your id in the top right hand corner.

from databricks.sdk import AccountClient

secret = dbutils.secrets.get(scope = "<SCOPE>", key = "<KEY>")

a = AccountClient(
  host = 'https://accounts.azuredatabricks.net'
  account_id = '<ACCOUNT_ID>'
  azure_tenant_id = '<TENANT_ID>',
  azure_client_id = '<CLIENT_ID>',
  azure_client_secret = secret
)

List Workspace Groups

NOTE: You must also setup the workspaceclient to do this.

w.groups.list()

List Account Groups

NOTE: You must also setup the accountclient to do this. You must also be account admin.

a.groups.list()

Create Storage Credential

NOTE: Your SPN must be account admin to do this. You must also setup the workspaceclient to do this.

from databricks.sdk.service.catalog import AzureManagedIdentity

storage_credential_name = '<CREDENTIAL_NAME>'
comment = '<COMMENT>'
connector_id = '<DATABRICKS_ACCESS_CONNECTOR>'
az_mi = AzureManagedIdentity(access_connector_id = connector_id)

w.storage_credenditals.create(
  name = storage_credential_name,
  azure_managed_identity = az_mi
  comment = comment
)

 

PySpark DataFrame Methods

This post shows different methods of a DataFrame.

Get the first value in a column

df = some_dataframe_definition

value = df.select("SOME_COLUMN_NAME").first()[0]

Convert Dataframe to JSON

df = some_dataframe_definition

result_json = df.toJSON()

Get a Row

df = some_dataframe_definition

row = df.collect()[0]      #You can switch out 0 for whatever row you want.

Count rows of Dataframe

df = some_dataframe_definition

num_rows = df.count()

 

 

Databricks Unity Catalog SQL Commands

This post is basic commands you will need to know for working with Unity Catalog.

Display Current Metastore
SELECT CURRENT_METASTORE();
Display Current Catalog
SELECT CURRENT_CATALOG();
Create Catalog
CREATE CATALOG IF NOT EXISTS  <Catalog_Name> COMMENT 'A COMMENT';
Create Catalog With Location
CREATE CATALOG IF NOT EXISTS <Catalog_Name> MANAGED LOCATION 'abfss://<METASTORE_CONTAINER_NAME>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<CATALOG_NAME>' COMMENT 'A COMMENT';
Describe Catalog
DESCRIBE CATALOG <Catalog_Name>;
Create Schema
CREATE SCHEMA IF NOT EXISTS <SCHEMA_NAME> COMMENT '<COMMENT>';
Create Schema With Location
CREATE SCHEMA IF NOT EXISTS <SCHEMA_NAME> MANAGED LOCATION 'abfss://<METASTORE_CONTAINER_NAME>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<CATALOG_NAME>/<SCHEMA_NAME>' COMMENT '<COMMENT>';
Show All Storage Credentials
SHOW STORAGE CREDENTIALS;
Describe Credential
DESCRIBE STORAGE CREDENTIAL <CREDENTIAL_NAME>;
Create External Location

You will first need a storage credential.

You can reference down to the full table path or keep it at the container

CREATE EXTERNAL LOCATION IF NOT EXISTS <NAME>
URL 'abfss://<COTNAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/'
WITH (STORAGE CREDENTIAL <CREDENTIAL_NAME>)
COMMENT '<COMMENT>';
Create External Table
CREATE TABLE <CATALOG_NAME>.<SCHEMA_NAME>.<TABLE_NAME>
USING <FORMAT>
LOCATION 'abfss://<COTNAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/FOLDER/PATH;'
Grant Create Storage Credential on Metastore
GRANT CREATE STORAGE CREDENTIAL ON METASTORE TO `<USER>`;
Grant Permission to Create External Locations on Storage Credential
GRANT CREATE EXTERNAL LOCATION ON STORAGE CREDENTIAL <CREDENTIAL_NAME> TO `<USER>`;
Grant Permission to Create External Location On Metastored
GRANT CREATE EXTERNAL LOCATION ON METASTORE TO `<USER>`;
Grant Permission to Use Catalog
GRANT USE_CATALOG ON CATALOG <CATALOG_NAME> TO `<USER>`;
Show all Grants On Metastore
SHOW GRANTS `<USER>` ON METASTORE;
Grant Permission to Use Schema
GRANT USE_SCHEMA ON SCHEMA <CATALOG_NAME>.<SCHEMA_NAME> TO `<USER>`;
Grant Permission to Create Table
GRANT CREATE TABLE ON SCHEMA <CATALOG_NAME>.<SCHEMA_NAME> TO <USER>;

 

Databricks Unity Catalog Rest API’s

This post is how to work with Databricks Unity Catalog Rest API’s.

Set Catalog Isolation Mode to ISOLATED

curl --location --request PATCH 'https://<DATABRICK_URL>/api/2.1/unity-catalog/catalogs/<CATALOG_NAME>' \
--header 'Authorization: Bearer <TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
"isolation_mode": "ISOLATED"
}'

Bind Workspace to Catalog

curl --location --request PATCH 'https://<DATABRICK_URL>/api/2.1/unity-catalog/bindings/catalog/<CATALOG_NAME>' \
--header 'Authorization: Bearer <TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
"add": [{ "workspace_id": "<WORKSPACEE_ID>", "binding_type": "BINDING_TYPE_READ_WRITE" }]
"remove": []
}'

Unbind Workspace to Catalog

curl --location --request PATCH 'https://<DATABRICK_URL>/api/2.1/unity-catalog/bindings/catalog/<CATALOG_NAME>' \
--header 'Authorization: Bearer <TOKEN>' \
--header 'Content-Type: application/json' \
--data-raw '{
"unassign_workspaces": ["<WORKSPACE_ID>"]
}'

List Workspaces Assigned to Catalog

curl --location --request GET 'https://<DATABRICK_URL>/api/2.1/unity-catalog/bindings/catalog/<CATALOG_NAME>' \
--header 'Authorization: Bearer <TOKEN>' \
--header 'Content-Type: application/json'

 

Synapse: Get KeyVault Properties Using Token Library

This post is how to get the key vault properties using the token library.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

import sys
from pyspark.sql import SparkSession

linked_service_name = '<KEYVAULT_LINKED_SERVICE_NAME>'
spark = <GET_SPARK_SESSION>
token_library = spark._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary
key_vault_url = token_library.getFullConnectionStringAsMap(linked_service_name).get('url')

print(key_vault_url)
print(token_library.getFullConnectionStringAsMap(linked_service_name))

 

Synapse: SAS Token

This post is how to get the SAS token from a notebook.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

from notebookutils import mssparkutils

linked_service_storage_account_name = '<LINKED_SERVICE_STORAGE_NAME>'
blob_sas_token = mssparkutils.credentials.getConnectionStringOrCreds(linked_service_storage_account_name)

spark.conf.set('fs.azure.sas.<CONTAINER_NAME>.<ADLS_STORAGE_ACCOUNT_NAME>.blob.core.windows.net', blob_sas_token

 

Synapse: Environment Variables

This post is how to work with environment variables in Synapse.

Ensure you have a spark session created. Refer to PySpark: Create a Spark Session

Get Environment Variable

It should be noted that “str” is the type that variable is. You can change it to whatever is required.

var: str = spark.conf.get('spark.executorEnv.<ENV_NAME>')
Set Environment Variable
spark.conf.set('spark.executorEnv.<ENV_NAME>', '<VALUE>')

 

Synapse: List Python Packages

This post is how to list the python packages in various ways.

You can use %pip to list the python packages that are installed.

%pip freeze

However doing it that way may not give you the exact versions that are installed. To get a comprehensive list do the following.

import pkg_resources

for package in pkg_resources.working_set:
    print(package)

 

Synapse: Help Command

This post is just how to use the help command from mssparkutils.

You can use help at various levels of Synapse.

Root

The following command will tell you what areas help can assist you in. This will respond with

  • fs
  • notebook
  • credentials
  • env
from notebookutils import mssparkutils

mssparkutils.help()
FileSystem

If you leave the help command empty it will just return all options that are available for help. If you put a command in then it will explain that command in greater detail.

from notebookutils import mssparkutils

mssparkutils.fs.help()

mssparkutils.fs.help('cp')

 

Synapse: Mounts

This post is how to work with mounts on Synapse.

I suggest mounting to an ADLS storage account. That is what I will assume in the below examples.

List Mounts
from notebookutils import mssparkutils

mssparkutils.fs.mounts()
Get Mount Path

The output of this command will produce ‘/synfs/<number>/mnt/<CONTAINER_NAME>’

from notebookutils import mssparkutils

mount_name = "/mnt/<CONTAINER_NAME>"
mount_path = mssparkutils.fs.getMountPath(mount_name)
Unmount
from notebookutils import mssparkutils

mount_name = "/mnt/<CONTAINER_NAME>"
mssparkutils.fs.unmount(mount_name)
Mount Using a Linked Service

First you must have a linked service created to the storage account. This linked service must be hard-coded and not parameterized in any way.

from notebookutils import mssparkutils

container = '<CONTAINER_NAME>'
storage_account = '<STORAGE_ACCOUNT_NAME>'
sub_folder = '<SUB_FOLDER>' #it should be noted that this isn't required.
linked_service_name = '<LINKED_SERVICE_NAME>'

mssparkutils.fs.mount(
    source='abfss://%s@%s.dfs.core.windows.net/%s/' % (container, storage_account, sub_folder),
    mountPoint='/mnt/%s' % (container),
    {'linkedService':linked_service_name, 'fileCacheTimeout': 120, 'timeout': 120}
)
Mount Using Configs

You will need to get the secret. Refer to Synapse: Get Secret

from notebookutils import mssparkutils

client_id = '<CLIENT_ID>'
tenant_id = '<TENANT_ID>'
container = '<CONTAINER_NAME>'
storage_account = '<STORAGE_ACCOUNT_NAME>'
sub_folder = '<SUB_FOLDER>' #it should be noted that this isn't required.

configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.oauth.provider.type": "org.apache.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id": client_id,
  "fs.azure.account.oauth2.client.secret": secret,
  "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/" tenant_id + "/oauth2/token"
}

mssparkutils.fs.mount(
  source='abfss://%s@%s.dfs.core.windows.net/%s' % (container, storage_account, sub_folder),
  mountPoint='/mnt/%s' % (container),
  extraConfigs=configs
)

 

PySpark: Delta Lake

This post is how to use pyspark to work with Delta Tables.

For more information on Delta Lake you can refer here.

First you need to install the “delta-spark” package for whatever version you require.

pip install delta-spark==3.1.0

Setup a Spark Session.

To read delta tables you can refer to PySpark: Read From ADLS to DataFrame.

To write delta tables you can refer to PySpark: Save a DataFrame To ADLS.

Vacuum Delta Table

from delta.tables import DeltaTable

vacuum_hrs = 100
path = 'abfss://<CONTAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/'

delta_table = DeltaTable.forPath(spark, path)
delta_table.vacuum(vacuum_hrs)

Compaction

Impoves reads by merging small files into larger ones.

from delta.tables import DeltaTable

path = 'abfss://<CONTAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/' 

delta_table = DeltaTable.forPath(spark, path)
delta_table.optimize().executeCompaction()

Z-Order

from delta.tables import DeltaTable

path = 'abfss://<CONTAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/' 
columns = ''

delta_table = DeltaTable.forPath(spark, path)
delta_table.optimize().executeZOrderBy(columns)

Delete

from delta.tables import DeltaTable
import pyspark.sql.functions as F

path = 'abfss://<CONTAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/' 

delta_table = DeltaTable.forPath(spark, path)
delta_table.delete(F.col('<MY_COL>') == '<SOME_VAL>')

#You can also use sql
delta_table.delete("column == 'some_VALUE'")

Modify Properties

You can refer here for more properties.

dataSkippingNumIndexedCols

You would do this if you have over the max columns that the delta lake can collect statistics on. Default value is 32.

path = 'abfss://<CONTAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/'
skip_cols = <SOME_VALUE>

spark.sql("ALTER TABLE delta.`%s` SET TBLPROPERTIES ('delta.dataSkippingNumIndexedCols' == '%s')" % (path, skip_cols))