How can I have a job run after another job, even if that job has failed?

I have an initial job which builds a binary and posts it somewhere (specifically to an S3 bucket).

After that completes, I have another (Matrix) job that uses that built artifact in some way.

I then want a final job that runs after the Matrix job has finished (regardless of whether any of the jobs have passed or failed), that deletes said artifact from S3.

The problem is, if I set continue-on-error for the matrix job to true (in order to have it always run the biunary-deletion job), then the workflow will show as passed, which I don’t want. Is there a way around this - or a better approach? This workflow will run on every push on our default branch, btw.

1 Like

It sounds like you’d want to use the always() or failure() status check functions to run something on (or regardless of) failure.

That said, you’re also running a matrix so I’m not quite sure how you’d want to set this up without being able to see your workflow. Can you post a link to it or copy and paste it in here?

Something like this:

name: Workflow
on:
  push: 
    branches:
    - 'develop'
jobs:
  Prebuild:
    runs-on: [self-hosted, linux, ubuntu1804]
    outputs:
      binary_url: ${{ steps.upload.outputs.binary_url }}
    steps:
    - name: Build binary
      run: build.sh
    - name: Upload binary to S3
      id: upload
      run: upload.sh && echo "::set-output name=binary_url::$BINARY_URL"
  Build:
    strategy:
      fail-fast: false
      matrix:
        number: ["build1", "build2"]
    needs: [Prebuild]
    continue-on-error: true
    runs-on: [self-hosted, linux, ubuntu1804]
    env:
      BINARY_URL: ${{ needs.Prebuild.outputs.binary_url }}
    steps:
    - name: Do stuff
      run: curl -LO "${BINARY_URL}"
  Postbuild:
    needs: [Prebuild, Build]
    runs-on: [self-hosted, linux, ubuntu1804]
    env:
      BINARY_URL: ${{ needs.Prebuild.outputs.binary_url }}
    steps:
    - name: Remove binary from S3
      run: aws s3 rm "${BINARY_URL}"

However this would mean the workflow would show up as passed, even though one of the Build jobs may have failed.

However if I remove continue-on-error, I would no longer reach the Postbuild job, should one of the Build jobs fail.

I think what you are trying to do is illogical, in the context of the job properties used, as they are currently defined.

In the Postbuild job you are using the “needs” property, which means that the PostBuild job waits for a successful run of the Build job, since it is implied it depends on it. Therefore, the logic would follow that you don’t want the Postbuild job to run if the Build job fails.

Is there any reason not to do the deletion in a step at the end of the Build job? In that case you could use if: always() on that step to make sure it runs even if any earlier step failed or was canceled.

I think I’ve managed to figure out what you’re looking to do here. If you want to jump straight to the code I’ve made a quick example in a repo here:

https://github.com/ThomasShaped/actions-tests/actions/runs/428373898

I think the reason you’re still seeing this workflow return as a success is because you’ve included continue-on-error: true which tells actions to ignore failures entirely.

You’ve already got fail-fast: false so your workflow should still run all iterations of your matrix even if one of them fails.

The final thing you’ll need to do is to add an if: always() to your last job. That will guarantee that your final Postbuild job still runs, regardless if any of the runs in Build fail. This setup should also report your workflow run as a failure if one of the runs in your matrix fails.

1 Like

The PostBuild job has to specify needs: Build, or it will run in parallel with the Build job. That would result in cleanup happening in the middle of the build.

AFAIK “needs” is the only way to control the ordering of jobs, so we get this seemingly-illogical pattern of “needs: X (but still runs even if it fails)”. Maybe we need something like after: <Job(s)>, which would define an order without specifying that one job requires the other to succeed.

That’s a viable option in some cases, but here the Build job specifies a matrix. Prebuild runs once and uploads a single artifact shared by multiple Builds. Putting the deletion in the Build job would run that multiple times, and possibly one Build might delete the artifact before the others have finished using it.

1 Like

I want the postbuild job to wait for either a succesful or failed run of the build job, but not to start before the build job has finished.