Python adding version info to docstrings
When checking PySpark's source code, find a nice way it uses to add version information to docstrings by a @since() decorator. Here is an example:
When checking PySpark's source code, find a nice way it uses to add version information to docstrings by a @since() decorator. Here is an example:
Quick example from the official Python documentation about thread safety in Python:
Thread safe operations | |
---|---|
It's important to understand that Python, due to its Global Interpreter Lock (GIL), can only switch between threads between bytecode instructions. The frequency of these switches can be adjusted using sys.setswitchinterval(). This ensures that within a single bytecode instruction, Python will not switch threads, making the operation atomic (thread-safe). For a deeper dive into this topic, you can read this discussion on atomic and thread-safe operations in Python.
During local testing, we often need to set environment variables. One way to do this is to create a .env
file in the root directory of the project. This file contains key-value pairs of environment variables. For example, a .env
file might look like this:
Hereunder a quick bash script to generate a .env
file from a list of Azure KeyVault secrets, same logic can be applied to other secret managers.
This posts tests some popular Python tools (sqlalchemy_data_model_visualizer , sqlalchemy_schemadisplay, and eralchemy2) to generate ERD (Entity-Relation Diagram) from sqlalchemy models.
The test code can be found in this Github repo.
The ForeEach
activity in Azure Data Factory has some important limitations. One of them is when working with the batch
mode, it would be nice to embed only pipeline activities inside.
Name | Scope | web framework middleware | VSCode Extension |
---|---|---|---|
scalene | cpu, gpu, memory, duration | partially | yes |
cProfile (Python native, function level only and cli only) | duration | no | no |
VizTracer | duration | unknown | yes |
profyle (based on Viztracer) | duration | yes | no |
pyinstrument | duration | yes | no |
py-spy | duration | no | no |
yappi (cli only) | duration | unknown | no |
austin | duration | unknown | yes |
Interesting reading:
Standard method to run asyncio task is as simple as asyncio.run(main())
. But in Databricks, it is not that simple. With the same command, you will get the following error:
import asyncio
async def main():
await asyncio.sleep(1)
asyncio.run(main())
RuntimeError: asyncio.run() cannot be called from a running event loop
Indeed, in Databricks, we've already in a running loop:
import asyncio
asyncio.get_running_loop()
<_UnixSelectorEventLoop running=True closed=False debug=False>
Load a local json file into a PyArrow table, then write it to a parquet file in Azure Blob Storage without using pandas.
The most secure way to use secrets in a Dockerfile is to use the --secret
flag in the docker build
command. This way, the secret is not stored in the image, and it is not visible in the Dockerfile.
A common use case in Python world is to install packages from a private PyPI repository in a Dockerfile. Suppose during the CICD pipeline, there's an environment variable called PIP_INDEX_URL
where holds this private PyPI credentials.
Check the official Build secrets doc.
Flask is a little bit old-fashioned today (I know it's still widely used), as it's not async native, among others. When I prepared my fastapi-demo this weekend, I discovered a new framework called Quart, which is maintained by Pallet Project, the same community maintaining Flask. They said "Quart is an asyncio re-implementation of the popular Flask micro framework API. This means that if you understand Flask you understand Quart.". So I decided to give it a try.