How to disable "Re-run failed jobs"?

GitHub added “Re-run failed jobs” option.

image (1)

The problem with this option is that in systems that rely on a central component to orchestrate test execution (such as Cypress), there is no guarantee in which partition tests will be executed, i.e. if you only retry the “failed jobs”, those jobs (upon a retry) may get assigned tests that are passing.

Need to disable “Re-run failed jobs” as otherwise we have deployments with failing tests get deployed to production.

1 Like

Check GITHUB_RUN_ATTEMPT and if the value is > 1, fail.

This will prevent retrying all jobs as well. We still want to allow to retry all jobs.

If there’s a step you need to always run, have it set an output that is the run_attempt, and check that the run_attempt the required step last ran on is the same as the current run_attempt.

That’s probably the easiest patch.

1 Like

We have this step, which is a matrix of all integration tests.

test_integrations:
  name: Integration Tests
  runs-on:
    - self-hosted
    - contra-general-purpose
  steps:
    - env:
        CWA_PROXY_API: "true"
        CWA_USE_LOCAL_HTTPS: "true"
        DOPPLER_TOKEN: ${{ secrets.DOPPLER_TOKEN }}
      name: Run integration tests
      run: doppler run -- yarn ts-node --transpile-only server/index.ts --run-integration-tests
  strategy:
    matrix:
      containers:
        - 1
        - 2
        - 3
        - 4
        - 5
        - 6
        - 7
        - 8
        - 9
        - 10
  timeout-minutes: 20

If one is retried, all of them have to be retried.

I stumbled upon this question because re-running a subset of jobs is problematic for my matrix strategy workflow as well. In my case, I’m using 5 containers for a Cypress Dashboard integrated job. As the author of this thread points out, there is orchestration that occurs with the matrix where running an individual failed job (container run) will not work properly b/c the necessary matrix-level orchestration never occurs.

It would be very helpful if you could disable the ability to re-run failed jobs at the repository level or workflow level. Re-running all jobs is not a problem.