"Error: The operation was canceled." in CI

Hi,

Using a matrix of jobs in my repository GitHub - rb-anssi/libecc_local: Library for elliptic curves cryptography, the jobs run quite well a few hours and then some of them get stuck and I get the following error after 6 hours (without an apparent issue or timeout in the job shell execution itself):

Error: The operation was canceled.

Which is not very helpful to understand what happened. And all the other jobs are then canceled because of this failure.
I am trying to stick under the maximum parallel jobs limit (of 20 parallel jobs if I understand correctly) using “max-parallel: 10” for my two independent matrix strategies, or may be I am hitting another limit without knowing it?

Any idea of what could be the reason behind this error?

Thanks in advance,

It’s in the list of usage limits:

  • Job execution time - Each job in a workflow can run for up to 6 hours of execution time. If a job reaches this limit, the job is terminated and fails to complete.

Depending on the number of jobs in your matrix the next point may be relevant too, depending on which hits earlier:

  • Workflow run time - Each workflow run is limited to 72 hours. If a workflow run reaches this limit, the workflow run is cancelled.

First of all, thanks for this answer!

Regarding the limits, I was indeed aware of these two. However, the curious thing here is that my jobs do not take 6 hours (actually, almost the same jobs are validated in a previous pipeline and take around 30 minutes to an hour). It seems that something is stuck at some moment (and this is not a hang of my job as nothing is outputed on the console).
The 72 hours limit is ambiguous though: is this an absolute time limit, or a CPU time limit encompassing all the parallel work? (e.g. 72 jobs of 1h in parallel).

Thanks in advance,

Hi @rb-anssi,

Hopefully you have already resolved your issue, but I wanted to finish this thread in case others have the same question.

I have encountered this error before with multiple jobs running in parallel. If one of the jobs actually fails, by default, all other jobs are automatically cancelled. You can prevent this behavior by adding:

fail-fast: false

to the strategy.

see this stack exchange question for more: continuous integration - GitHub Actions Disable Auto Cancel When Job Fails - Stack Overflow

The 72 hours are a wall clock time limit.