We recently switched rmagick to use Github Actions, but one thing we’re running into is that often checks will hang indefinitely with no build output. This is especially a problem because we’re using a matrix that produces 33 checks, so we’ll often end up with 1-2 failing checks in a build. It’s compounded by the fact that we cannot re-run an individual check, so we end up stuck in a cycle of re-running the entire build over and over until we eventually get all 33 passing. We can add a timeout, but this doesn’t remedy the fact that the builds are stalling in the first place. We didn’t have this sort of problem when running the tests on TravisCI or CircleCI. How can we go about debugging this issue?
Hi @mockdeep ,
There is a limitation for github actions: maximum concurrent jobs in a workflow is 20 for github free. Please refer to official doc here for more details.
33 jobs in your workflow and only 20 jobs will be started, other 13 jobs will be in queue status, 1~2 jobs starts at last and will timeout over 10mins, which will be canceled then.
It’s recommended to use ‘jobs.<job_id>.needs’ to limit the concurrent jobs number. You can add it to the job ‘test-windows’, code sample as below:
test-windows: needs: [test-linux] runs-on: windows-latest
I forked your repo and it works on my side.
@weide-zhouI tried that, but it still failed frequently. I don’t think the problem has anything to do with queueing time. We were seeing timeouts before we set
timeout-minutes to 10. Previously it would stall for 6 hours before finally getting shut down and failing the build.
Hi @mockdeep ,
Check on my side, the timout intermitently occurs for below 2 combinations: ruby 2.6 + IM6.7/6.8, but not occurs for others, eg: ruby + IM6.9. And it’s failed at command: “bundle exec rake”.
I assume there could be some compatible problems for the 2 combinations, could you please check on local server, is it same behavior?
If possible you can remove the two combinations in your matrix for only test.
@weide-zhouYeah, I see those same failures pretty frequently on Github Actions, but not locally, and they also pass consistently on CircleCI. I don’t really like the idea of excluding them from the matrix, as they’re commonly used versions.
Probably the same problen occured in my playwright-ruby-client project.
I have no idea how to debug this…