Macos job sometimes get canceled without any reason

In TensorFlow addons (public repository), sometimes our build on the master branch and pull requests fails because a macOS job get canceled (not failed). It’s in a build matrix, but it’s not related to fast-fail because no other job fails. Here are a few examples:

Master branch:

https://github.com/tensorflow/addons/runs/561697941

https://github.com/tensorflow/addons/runs/560832506

https://github.com/tensorflow/addons/runs/560825167

https://github.com/tensorflow/addons/runs/559256581

Pull request:

https://github.com/tensorflow/addons/pull/1570/checks?check_run_id=561551029

Currently 30% of our builds in the master branch are failing because of this, as well as a significant number of builds in pull requests. Could the error message be improved so that we know at least what is causing this?

Thanks a lot.

1 Like

Hi @gabrieldemarmiesse ,

Thank you for reaching this out! If the workflow logs do not provide enough detail to diagnose why a workflow, job, or step is not working as expected, you can enable additional debug logging. Please refer to the doc for the details.

Since it’s an intermitent and not reproduce in the forked repository on my side, to aviod any potential activity which exceeds github workflow usage limit, I would suggest to split the matrix jobs into several jobs.

Hope it helps!

Hi, I’m seeing similar issues only running 3 concurrent jobs at once. I’m running this under the eclipse namespace which has tons of projects, but not many using github actions from what I understand. It would be good to know what the cause could be.