python#

June 12, 2024
in python, debug
1 min read

Profiling Python code

Name	Scope	web framework middleware	VSCode Extension
scalene	cpu, gpu, memory, duration	partially	yes
cProfile (Python native, function level only and cli only)	duration	no	no
VizTracer	duration	unknown	yes
profyle (based on Viztracer)	duration	yes	no
pyinstrument	duration	yes	no
py-spy	duration	no	no
yappi (cli only)	duration	unknown	no
austin	duration	unknown	yes

Interesting reading:

June 4, 2024
in python, async, databricks
3 min read

Running asyncio task in Databricks

Standard method to run asyncio task is as simple as asyncio.run(main()). But in Databricks, it is not that simple. With the same command, you will get the following error:

import asyncio
async def main():
    await asyncio.sleep(1)
asyncio.run(main())

RuntimeError: asyncio.run() cannot be called from a running event loop

Indeed, in Databricks, we've already in a running loop:

import asyncio
asyncio.get_running_loop()

<_UnixSelectorEventLoop running=True closed=False debug=False>

May 5, 2024
in python, file, spark
1 min read

Convert json to parquet and send to Azure Blob Storage

Load a local json file into a PyArrow table, then write it to a parquet file in Azure Blob Storage without using pandas.

April 6, 2024
in python, container, docker, vault, cicd
1 min read

The most secure way to use secrets in a Dockerfile is to use the --secret flag in the docker build command. This way, the secret is not stored in the image, and it is not visible in the Dockerfile.

A common use case in Python world is to install packages from a private PyPI repository in a Dockerfile. Suppose during the CICD pipeline, there's an environment variable called PIP_INDEX_URL where holds this private PyPI credentials.

Check the official Build secrets doc.

March 24, 2024
in python, sqlalchemy, flask, quart
2 min read

First try on Quart an asyncio re-implementation of Flask

Flask is a little bit old-fashioned today (I know it's still widely used), as it's not async native, among others. When I prepared my fastapi-demo this weekend, I discovered a new framework called Quart, which is maintained by Pallet Project, the same community maintaining Flask. They said "Quart is an asyncio re-implementation of the popular Flask micro framework API. This means that if you understand Flask you understand Quart.". So I decided to give it a try.

March 24, 2024
in python, sqlalchemy
4 min read

Sqlalchemy eager loading

This posts describes the differences on selectinload, joinedload, subqueryload, these 3 popular eager loading techniques in Sqlalchemy (so as to SQLModel)

February 26, 2024
in api, python, async, azure, pandas
3 min read

Getting all users from MS Graph API in few seconds

MS Graph API's endpoint for retrieving users, GET /users can return all users of the tenant. The default limit is 100 users per page, and the maximum limit is 999 users per page. If there are more than 999 users, the response will contain a @odata.nextLink field, which is a URL to the next page of users. For a big company having a large number of users (50,000, 100,000, or even more), and it can be time-consuming to retrieve all users.

While MS Graph API provides generous throttling limits, we should find a way to parallelize the queries. This post explores sharding as a strategy to retrieve all users in a matter of seconds. The idea is to get all users by dividing users based on the first character of the userPrincipalName field.For instance, shard 1 would encompass users whose userPrincipalName starts with a, shard 2 would handle users starting with b, and so forth.

February 22, 2024
in debug, python
2 min read

Debugging in Python

January 15, 2024
in python, package
2 min read

Python packaging with Flit

This post provides a simple starter demonstration on how to use Flit for building and publishing Python package. However, for more complex builds, such as compiling C code, you still need the de facto standard setuptools.

January 5, 2024
in python, shell, git
2 min read

bash-git-prompt tweaks

Some tweaks I made to bash-git-prompt. dynamic Python virtualenv path, new var GIT_MESSAGE, etc.