Artifcats vs cache for sharing node_modules between jobs

I’m trying to figure out the best way to deal with node_modules, which in my case is 900MB, 175MB gzipped, with 10s of thousands of files.

Requirements:

  1. Cache node_modules between workflow runs to avoid initializing from scratch every time (4 minutes), and to make any updates to package.json dependencies only need an incremental npm install

  2. Allow multiple time-consuming jobs in a workflow to run in parallel, all of which need node_modules (some need devDependencies, others I would prefer to not to save space)

Artifacts seem like the suggested/official way to share a directory between jobs in the same workflow, but the current implementation is very slow when many files are involved. A workaround is to tar up the archive yourself first before uploading it, but that’s exactly what actions/cache does! with many fewer steps.

Does it make sense to use actions/cache as the sharing mechanism, with the key including a hash of npm-shrinkwrap.json? The node_modules setup job will accept restore-keys for earlier cache versions, since it will always update via npm install. Subsequent steps will only accept the exact key.

Something like:

jobs:
  setup-node_modules:
    steps:
      ...
      - name: Cache node modules
        uses: actions/cache@v1
        with:
          path: node_modules
          key: node_modules-${{ hashFiles('npm-shrinkwrap.json') }}
          # loading an older version is fine here, since it will get an npm install
          restore-keys: |
            node_modules-

      - name: npm install
        run: npm install


  parallel-build1:
    needs: setup-node_modules
    steps:
      - name: Load node_modules
        uses: actions/cache@v1
        with:
          path: node_modules
          key: node_modules-${{ hashFiles('npm-shrinkwrap.json') }}
          # no restore-keys here, so we only accept this exact version

      ...run webpack, publish assets...


  parallel-build2:
    needs: setup-node_modules
    steps:
      - name: Load node_modules
        uses: actions/cache@v1
        with:
          path: node_modules
          key: node_modules-${{ hashFiles('npm-shrinkwrap.json') }}
          # no restore-keys here, so we only accept this exact version

      - name: prune
        run: npm prune --production

      ...build and publish docker container...

It seems to me this will guarantee a cache hit for the subsequent steps, with the exact contents of node_modules.

Possible complications include what happens when the compressed size exceeds 400MB (the current max), or when multiple workflows are running at the same time.

Is there any other reason this isn’t a good idea? Using artifacts just seems very clumsy.

1 Like

For this workflow I think the cache action is probalby a better option than the artifacts route.  Your primary complication today would be the potential limit of 400MB per cache entry.  We are working to increase that to several GB in the near future.

Somehow related to this message: Do you checkout your project using the actions/checkout@v1 action for every build (setup-node_modules, parallel-build1 and parallel-build2)?

1 Like

With actions/cache is that shared between workflows?
And are there race conditions to worry about?

In particular, I have 3 .github/workflows/$something.yml files, and they all share the same node_modules caching which is keyed by the runner.os and the ${{ hashFiles('**/yarn.lock') }}.

Are there risks with this?