Github Actions: Cache#
Life span#
Github Actions cache has a life span of 7 days, and the total size of all caches in a repository is limited to 10 GB.
Standard Cache#
Cache key should be as specific as possible, so that the post cache restore installation can be reduced or skipped.
For Python pip install, we could use the following cache key:
- name: Get pip cache dir
run: |
os_version=$(cat /etc/os-release | grep -i "version=" | cut -c9- | tr -d '"' | tr ' ' '_')
github_workflow_full_path="${GITHUB_WORKFLOW_REF%@*}"
python_full_version=$(python -c 'import platform; print(platform.python_version())')
node_major_version=$(node --version | cut -d'.' -f1 | tr -d 'v')
echo "os_version=$os_version" >> $GITHUB_ENV
echo "github_workflow_full_path=$github_workflow_full_path" >> $GITHUB_ENV
echo "python_full_version=$python_full_version" >> $GITHUB_ENV
echo "PIP_CACHE_DIR=$(pip cache dir)" >> $GITHUB_ENV
- name: cache pip
uses: actions/cache@v3
with:
# path: ${{ env.PIP_CACHE_DIR }}
path: ${{ env.pythonLocation }}
key: ${{ env.github_workflow_full_path}}-${{ env.os_version }}-${{ env.python_full_version }}-${{ env.node_major_version}}-${{ hashFiles('requirements/*.txt') }}
The cache
action repository provides also some Python caching examples.
pip cache dir vs pip install dir#
The path
parameter in actions/cache@v3
could be:
${{ env.PIP_CACHE_DIR }}
if you only want to cache the pip cache dir, so you can skip the Python package download step, but you still need to install the packages.${{ env.pythonLocation }}
if you want to cache the whole python installation dir, this is useful when you want to cache thesite-packages
dir, so that thepip install
step can be reduced or skipped, this is also why we must use the${{ env.os_version }}
,${{ env.python_full_version }}
in the cache key. In most of cases, this is the best choice.
hashFiles#
In Azure Pipelines, there's similar thing as hashFiles() function, it should be in the form of glob pattern, like requirements/*.txt
, but without double quotes, otherwise treated as a static string.
# Azure Pipelines
- task: Cache@2
inputs:
key: 'python | "$(pythonFullVersion)" | "$(osVersion)" | "$(System.TeamProject)" | "$(Build.DefinitionName)" | "$(Agent.JobName)" | requirements/*.txt'
path: ...
displayName: ...
Otherwise, we can also achieve the same result by some pure bash commands:
# suppose parameters.requirementsFilePathList is a list of file paths
- script: |
echo REQUIREMENTS_FILE_PATH_LIST_STRING: $REQUIREMENTS_FILE_PATH_LIST_STRING
all_files_in_one_line=$(echo $REQUIREMENTS_FILE_PATH_LIST_STRING | jq '. | join(" ")' -r)
echo all_files_in_one_line: $all_files_in_one_line
all_files_md5sum=$(cat $all_files_in_one_line | md5sum | awk '{print $1}')
echo all_files_md5sum: $all_files_md5sum
echo "##vso[task.setvariable variable=pythonRequirementsFilesHash;]$all_files_md5sum"
displayName: Set pythonRequirementsFilesHash
env:
REQUIREMENTS_FILE_PATH_LIST_STRING: "${{ convertToJson(parameters.requirementsFilePathList) }}"
Cache with actions/setup-python#
The action actions/setup-python has built-in functionality for caching and restoring dependencies with cache
key. This cache method can only cache the pip cache dir to reduce the Python packages download time like path: ${{ env.PIP_CACHE_DIR }}
in above example, but still need to install the packages, which is much slower than caching the package installation location. As the time of writing, the cache source dir (which is the pip cache dir) is generated by the action itself, and cannot be customized.
The cache key is something like: setup-python-Linux-22.04-Ubuntu-python-3.10.13-pip-308f89683977de8773e433ddf87c874b6bd931347b779ef0ab18f37ecc4fa914
(copied from workflow run log), which is generated as per this answer.
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.10'
cache: 'pip' # caching pip dependencies, could be pip, pipenv, or poetry
cache-dependency-path: requirements/*.txt
- run: pip install -r requirements.txt
If cache-dependency-path
is not specified, and if the cache type is pip
, it will try to find all the requirements.txt files in the repo and hash them to generate the cache key. For cache type with pipenv
or poetry
, I didn't test them.