Convert json to parquet and send to Azure Blob Storage
Load a local json file into a PyArrow table, then write it to a parquet file in Azure Blob Storage without using pandas.
Load a local json file into a PyArrow table, then write it to a parquet file in Azure Blob Storage without using pandas.
The most secure way to use secrets in a Dockerfile is to use the --secret
flag in the docker build
command. This way, the secret is not stored in the image, and it is not visible in the Dockerfile.
A common use case in Python world is to install packages from a private PyPI repository in a Dockerfile. Suppose during the CICD pipeline, there's an environment variable called PIP_INDEX_URL
where holds this private PyPI credentials.
Check the official Build secrets doc.
Flask is a little bit old-fashioned today (I know it's still widely used), as it's not async native, among others. When I prepared my fastapi-demo this weekend, I discovered a new framework called Quart, which is maintained by Pallet Project, the same community maintaining Flask. They said "Quart is an asyncio re-implementation of the popular Flask micro framework API. This means that if you understand Flask you understand Quart.". So I decided to give it a try.
This posts describes the differences on selectinload
, joinedload
, subqueryload
, these 3 popular eager loading techniques in Sqlalchemy (so as to SQLModel)
This post is based on the official Azure documentations (Asynchronous messaging options, Compare Azure messaging services, Enterprise integration using message broker and events, Azure Well-Architected Framework) and describes a resume of differences and uses cases for Azure messaging service, including Service Bus, Event Grid, Event Hubs. The official documentations are very good and comprehensive, this post is for my personal reference as a quick reminder.
MS Graph API's endpoint for retrieving users, GET /users can return all users of the tenant. The default limit is 100 users per page, and the maximum limit is 999 users per page. If there are more than 999 users, the response will contain a @odata.nextLink
field, which is a URL to the next page of users. For a big company having a large number of users (50,000, 100,000, or even more), and it can be time-consuming to retrieve all users.
While MS Graph API provides generous throttling limits, we should find a way to parallelize the queries. This post explores sharding as a strategy to retrieve all users in a matter of seconds. The idea is to get all users by dividing users based on the first character of the userPrincipalName
field.For instance, shard 1 would encompass users whose userPrincipalName
starts with a
, shard 2 would handle users starting with b
, and so forth.
Outlook.com has an overwhelming number emails and deleting them using rules is challenging. Notably, the online filter of Outlook.com can only loads a maximum of 1000 emails. Therefor I have to use scripts (like VBA scripts) to delete them. Below is the script that selects all the unread emails before a given date and deletes them.
This post provides a simple starter demonstration on how to use Flit for building and publishing Python package. However, for more complex builds, such as compiling C code, you still need the de facto standard setuptools.