Pip dependencies not getting cached even after successful restore

Hi folks.

I am new to GitHub Actions so apologies if I am asking a naive question.

I am trying to set up my workflow such that after the pip dependencies get installed for the first time they get cached and do not get installed next time when the workflow is triggered.

My workflow file looks like so:

name: Metric logging with wandb
on: [push]
jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python 3.x
        uses: actions/setup-python@v2
        with:
          python-version: '3.7' 
      - uses: actions/cache@master
        id: cache
        with:
          path: .cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
          restore-keys: ${{ runner.os }}-test
      - name: Install Dependencies
        if: steps.cache.outputs.cache-hit != 'true'
        run: |
          echo "Installing dependencies and caching them."
          pip install -r requirements.txt
      - name: Train model and log wandb metric
        run: |
          # Your ML workflow goes here
          python train.py --key ${{ secrets.WANDB_API_KEY }}
          echo "Finished run."

The second time the workflow is getting trigger there is a cache hit and it is able to restore the cache as well:

But then again I get the following:

image

pandas is not a new requirement that was already installed in the previous run.

Am I missing out on something?

2 Likes

I modified my workflow file a bit like the following:

name: Metric logging with wandb

on: [push]

jobs:
  run:

    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2
      - name: Set up Python 3.x
        uses: actions/setup-python@v2
        with:
          python-version: '3.7' 
      - uses: actions/cache@master
        id: cache
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
          restore-keys: |
            ${{ runner.os }}-pip-
            ${{ runner.os }}-
      - name: Train model and log wandb metric
        run: |
          # Your ML workflow goes here
          pip install -r requirements.txt
          python train.py --key ${{ secrets.WANDB_API_KEY }}
          echo "Finished run."

But still the dependencies install and do not get cached.

2 Likes

Hi @sayakpaul
Do you have a link to the public repository or a minimal example repository that shows this behaviour?
By the way, could you please add ls ~/.cache/pip/ before “Train model and log wandb metric” step to check if pandas has been installed in the previous run?

2 Likes

@niconbwthanks for your reply. Here’e the branch where I tested this: https://github.com/sayakpaul/wine/tree/wandb-integration./

2 Likes

Hi @sayakpaul
The reason for this problem should be that the specified cache path is incorrect. You could try to specified cache path, such as:

      - uses: actions/cache@master
        id: cache
        with:
          path: /opt/hostedtoolcache/Python/3.7.8/x64/lib/python3.7/site-packages/*
          key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
          restore-keys: |
            ${{ runner.os }}-pip-
            ${{ runner.os }}-

More about locations for the tools cache in each GitHub-hosted runner.
By the way, there is an environment variable called RUNNER_TOOL_CACHE on each GitHub hosted runner that describes the location of this tools cache and there is where you will find Python and PyPy installed. In your scenario, pandas should has been installed in the following path:
$RUNNER_TOOL_CACHE/Python/3.7.8/x64/lib/python3.7/site-packages/pandas

2 Likes

Thank you for the pointers, @niconbw. Could you also elaborate as to why the message “Cache restored …” was there in my logs? Am I missing out on something?

2 Likes

Hi @sayakpaul
We can find that you are using the ~/.cache/pip path in the cache@master action, but pandas has not been installed under this path. Even if you save the cache under this path(~/.cache/pip), you still cannot find the module (such as pandas) when you restore the cache successfully.

2 Likes

Fair enough @niconbw. Thank you for your guidance.

2 Likes

@niconbw I tried your suggestions but I am still not sure if caching is indeed being reflected. Here are the logs: https://github.com/sayakpaul/wine/runs/975167639?check_suite_focus=true.

1 Like

Hi @sayakpaul
Please try again with the following yml:

jobs:
  run:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python 3.x
        uses: actions/setup-python@v2
        with:
          python-version: '3.7'    
       
      - uses: actions/cache@master
        id: cache
        with:
          path: ${{ env.pythonLocation}}/lib/python3.7/site-packages/*
          key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
          restore-keys: |
            ${{ runner.os }}-pip-
            ${{ runner.os }}-
            
      - name: Install Dependencies
        if: steps.cache.outputs.cache-hit != 'true'
        run: |
          echo "Installing dependencies and caching them."
          pip install -r requirements.txt
     
      - name: ls show env.pythonLocation
        run: ls -R ${{ env.pythonLocation}}
     
      - name: show module
        run: pip show pandas
      
      - name: Train model and log wandb metric
        run:  echo "Finished run."
1 Like

@niconbw you probably forgot to include python train.py.

1 Like

Hi @sayakpaul
sorry. Please add python train.py back in your side.

1 Like

@niconbw I appended the necessary stuff in the YAML file and I got it to working:
https://github.com/sayakpaul/wine/runs/979690608?check_suite_focus=true.

Thanks much.

1 Like

Hi @sayakpaul
You’re welcome. It’s my pleasure to help you.

1 Like

jobs:
run:
runs-on: ubuntu-latest

steps:
  - uses: actions/checkout@v2
  - name: Set up Python 3.x
    uses: actions/setup-python@v2
    with:
      python-version: '3.7'    
   
  - uses: actions/cache@master
    id: cache
    with:
      path: ${{ env.pythonLocation}}/lib/python3.7/site-packages/*
      key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
      restore-keys: |
        ${{ runner.os }}-pip-
        ${{ runner.os }}-
        
  - name: Install Dependencies
    if: steps.cache.outputs.cache-hit != 'true'
    run: |
      echo "Installing dependencies and caching them."
      pip install -r requirements.txt
 
  - name: ls show env.pythonLocation
    run: ls -R ${{ env.pythonLocation}}
 
  - name: show module
    run: pip show pandas
  
  - name: Train model and log wandb metric
    run:  echo "Finished run."