Skip to content

databricks#

Databricks Connect

Databricks Connect allows you to connect your favorite IDE (PyCharm, VSCode, etc.) and other custom applications to Databricks compute and run Spark (or non-Spark) code.

This post is not a comprehensive guide on Databricks Connect; rather, it consists of side notes from the Azure Databricks docs. Most of the notes also apply to Databricks on AWS and GCP.

Running asyncio task in Databricks

Standard method to run asyncio task is as simple as asyncio.run(main()). But in Databricks, it is not that simple. With the same command, you will get the following error:

import asyncio
async def main():
    await asyncio.sleep(1)
asyncio.run(main())

RuntimeError: asyncio.run() cannot be called from a running event loop

Indeed, in Databricks, we've already in a running loop:

import asyncio
asyncio.get_running_loop()

<_UnixSelectorEventLoop running=True closed=False debug=False>

Databricks job/task context

Suppose we're running following job/task in a Azure Databricks workspace:

jobId: "1111"
jobRunId: "2222"
taskRunId: "3333"
jobName: "ths job name"
taskName: "first-task"
databricksWorkspaceUrl: https://adb-4444444444.123.azuredatabricks.net/

Run below command in a Databricks job (task precisely):

dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()