Databricks Python pip authentication#
Before the Databricks Unit Catalog's release, we used init scripts stored in DBFS to generate the pip.conf
file during cluster startup, allowing each cluster its unique auth token. But with init scripts no longer available in the Unit Catalog's shared mode, an alternative approach is required.
I haven't tested all of the methods below.
Just tested Method 1 and Method 2. Method 3 and Method 4 are from here.
Unity Catalog needs Databricks runtime 11.3 LTS or above
Method 1: Preparing pip.conf
file in advance#
A workaround involves placing a prepared pip.conf
in the Databricks workspace and setting the PIP_CONFIG_FILE
environment variable to point to this file. This method, however, presents security concerns: the pip.conf
file, containing the auth token, becomes accessible to the entire workspace, potentially exposing it to all users and clusters. See here to check this workaround.
In contrast, the Unit Catalog's single mode retains init script availability. Here, the pip auth token is stored securely in a vault and accessed via the Databricks secret scope. Upon cluster startup, the init script fetches the token from the vault, generating the pip.conf
file. This approach is considerably more secure than the shared mode alternative.
Method 2: Keeping using init scripts but with Azure ADLS Gen2 instead of DBFS#
Unit Catalog's shared mode does not allow init scripts stored in DBFS. However, the init script can be stored in Azure ADLS Gen2 accessible by ABFSS. It is also needed to configure the credentials to connect to Azure ADLS Gen2.
Ref to this PDF filefor details.
Databricks runtimes from 11.3 LTS to 13.3 LTS (13.3 not included) might be not supported, as not tested yet.
Method 3: Keeping using init scripts in DBFS with allowlist#
Refer to this link, And be aware that this feature is only available in Databricks runtime 13.3 LTS or above.
Method 4: Keeping using init scripts but with UC volume instead of DBFS#
Refer to this PDF file for details. And be aware that this feature is only available in Databricks runtime 13.3 LTS or above.