In our project, we have a multiple workflows, one of the has 28 jobs (Linux, testing different compilers). From those 28 jobs, some of them randomly fail. For example: Cleanup · taocpp/PEGTL@663a9f4 · GitHub
As you can see from the example, the compiler itself crashes. Re-running the jobs might help for one job, but chances are that now another one or two of those 28 jobs fails due to the same problem. I just re-ran the same jobs multiple times, each time one or two of the jobs will fail.
As it is the compiler crashing, it is not an instability in our code. As we are using different compilers in those jobs, it’s also not the compilers fault.
I would guess it is some kind of problem in the runners itself, maybe over-commitment and they are running out of resources? But I can’t debug that.
So, first of all: Can anyone from GitHub have a look as to why the runners are not stable? This is super-annoying and I don’t see anything I can do to fix this. Or where do I need to report this?
Secondly: It would be immenesly helpful if I could just re-run a single jobs. Why can I only re-run all of the jobs or none? This makes zero sense.