Is it possible to run a matrix strategy job async

I have a template workflow yml file with 2 jobs, each with the same matrix strategy. The second job depends on the first with needs. I’d like to know if there’s a way to run each value in the matrix async so that the second job doesn’t require all matrix values to complete in the first job before moving on to the second one?

Example main.yml:

on: [push]

jobs:
  ssql:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        FRUIT: ["apple", "pear"]
    name: Get data
    steps:
      - name: Checkout cum-rev repo
        uses: actions/checkout@v2 # Defaults to current repo - check out current repo
      - name: Checkout ds-ssql-gh-action
        uses: actions/checkout@v2
        with:
          repository: ourorg/ds-ssql-gh-action
          token: ${{ secrets.WORKFLOW_TOKEN }}
          ref: main
          path: './ds-ssql-gh-action'
      - name: Run query and save data
        uses: ./ds-ssql-gh-action
      - name: Upload data as artifact
        uses: actions/upload-artifact@v2
        with:
          name: ${{ matrix.FRUIT }}-data
          path: output_data/data.csv

  Rtrain:
    runs-on: ubuntu-latest
    name: Train model
    needs: ssql
    strategy:
      matrix:
        FRUIT: ["apple", "pear"]
    steps:
      - name: checkout current repo
        uses: actions/checkout@v2 # Defaults to current repo - check out current repo
      - name: Retreive data from ssql job 
        uses: actions/download-artifact@v2
        with:
          name: ${{ matrix.FRUIT }}-data
      - name: Just check csv file is there
        run: ls -l
      - name: Fit model and save as RDS
        uses: ./Rtrain # uses an action in this directory
      - name: Check the model is there
        run: ls -l
      - name: Upload model as artifact
        uses: actions/upload-artifact@v2
        with:
          name: ${{ matrix.FRUIT }}-Model
          path: ${{ matrix.FRUIT }}-Model.RDS

Where I use needs in the second job, I really mean ‘needs the ssql job for the corresponding fruit, not all jobs in ssql to complete first’.

Suppose the ssql job for ‘apple’ takes 3 times as long as for ‘pear’. I would like the next job, Rtrain, to proceed once the ssql job for ‘pear’ is done without waiting for the ‘apple’ one to complete.

Is this possible?

For the needs list you need to provide a job ID. You could try to get it for each job of the matrix by adding this step to the first job (ssql):

      - name: Print current job ID
        run: echo '${{ github.job }}'

(see docs on the github context)

Then try to use ID of the concrete job you are interested in in the needs list of the subsequent job.

I don’t know if it will work, but I think it’s worth a try.

1 Like

Presumably here I would need to pass the {{ github.job }} from the first to the second job, perhaps using an artifact. Was that the flow you had in mind here?

No, you don’t need any artifacts. ${{ github.job }} is just a string. You add that step to print it, run your workflow, see the value in the logs, copy it, and then you can remove that step and use the found value in the next job’s needs list.

1 Like

Ah I see, so it’s expected the job id will always be the same, even on subsequent daily runs. OK, gonna try this, thank you!

This post was flagged by the community and is temporarily hidden.