Databricks: Notebook SQL

This post is how to work with Databricks SQL through a Notebook.

Create a Temp View of a DataFrame.

df = <SOMETHING>
df.createOrReplaceTempView("<TABLE_NAME>")

Drop a Table

%sql
drop table <SCHEMA>.<TABLE>;

Describe Table

%sql
desc table extended <SCHEMA>.<TABLE>;

Describe Detail

%sql
describe detail <SCHEMA>.<TABLE>;

Show Table Properties

%sql
SHOW TBLPROPERTIES <SCHEMA>.<TABLE>;

Describe History

%sql
describe history <SCHEMA>.<TABLE>;

Create Schema

%sql
CREATE SCHEMA IF NOT EXISTS <SCHEMA>;

Create Parquet Table

%sql
CREATE TABLE <SCHEMA>.<TABLE> USING PARQUET LOCATION 'abfss://<COTNAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/'

Create Delta Table

%sql
CREATE TABLE <SCHEMA>.<TABLE> USING DELTA LOCATION 'abfss://<COTNAINER>@<STORAGE_ACCOUNT>.dfs.core.windows.net/<FOLDER>/'

Upsert

MERGE INTO schema.table t \
USING ( \
  SELECT columns \
  FROM table \
) AS source ON (source.column = t.column) \
WHEN NOT MATCHED THEN \
  INSERT ( \
    ( \
      column, column2 \
    ) \
  VALUES ( \
    source.column, source.column2 \
  ) \
WHEN MATCHED THEN \
  UPDATE SET \
    t.column = source.column \

 

Postgres: Misc

Here are some misc things you can do in postgres 9.4.

Print to Console:

 raise notice 'VALUE: %', value;

Json Build Object:

 json_build_object('name', value, 'name2': value2)

Json Array:

 json_agg(data)

Extract Month:
You can also extract year, day, etc.

 extract(month from date_column)

Json Querying:
You can extract a field from a json field. You can do:

 rec->>'id'
rec#>'{field, sub_field,value}'
rec->'object'

Row To Json:

 SELECT row_to_json(t)
FROM mytable t

Update With FROM:

 UPDATE mytable l
SET old_val_id=sub.new_val_id
FROM (
     SELECT l2.id, s.id as new_val_id
     FROM mytable l2
          INNER JOIN mysupertable s ON s.id=l2.old_val_id
) sub
WHERE sub.id=l.id;

Inline Script:

 DO $do$
DECLARE id integer;
BEGIN      

END $do$ LANGUAGE plpgsql;

If:

 IF CONDITION THEN
END IF;

Let’s say you have a json field and in that field you have a field which tells you what key you should select for certain data. It could be customizable from the user entering data. To dynamically select that field you can do the below.

jsonField->((to_json((jsonField->'key'->'sub_key'))#>>'{}')::text)

Upsert:
Sometimes we want to perform update if the record exists or an insert if it doesn’t. Most of the time we write a function that deals with this on a record by record basis. But what if we want to do a batch update or insert. What do we do then. We perform an upsert. See below. We write the update and return the result in the remainder of the query  we check if upsert gave us anything. If it didn’t we perform the insert.

WITH upsert AS (UPDATE MyTable mt
            SET column_name = sub.new_value
            FROM (SELECT mot.id, mot.new_value 
                 FROM MyOtherTable mot
                 ) sub
            WHERE sub.id=mt.related_id 
RETURNING *) 
INSERT INTO MyTable (id, column_name, related_id) 
SELECT ?, ?, ? 
WHERE NOT EXISTS (SELECT * FROM upsert)

Regex Substring:
There are different ways of using regex. This is one way.

substring(column_name from '([0-9]{4}-[0-9]{2}-[0-9]{2})')

PGSQL Loop:
This is one way to loop using pgsql.

DO $do$
DECLARE rec RECORD;
BEGIN
	FOR rec IN SELECT * FROM MyTable
	LOOP
		--We can then use rec like rec.colum_name
	END LOOP;
END $do$ LANGUAGE plpgsql;

Milliseconds to timestamp:

This will return 2017-08-17 21:26:04

select to_timestamp(1503005165);