Github workflow triggering check_suite.completed events prematurely

Hi,

I’ve set up GitHub Actions in a repo and created a workflow for CI. In this workflow, I have some 5 jobs which are triggered like this:

  • job 1: immediately

  • jobs 2, 3, 4: as job 1 successfuly finishes ( needs: [job1] )

  • job 5: as jobs 2, 3, 4 successfuly finish ( needs: [job2, job3, job4] )

I noticed that when job1 executes successfuly, thus triggering execution of the jobs that depend on it, a check_suite.completed(status = “success”) event  is emitted. Looking at the UI showing the workflow run, this happens well before the workflow completed executing all of its defined jobs.

Now, according to the docs:

GitHub Actions use the Checks API to output statuses, results, and logs for a workflow. GitHub creates a new check suite for each workflow triggered by the same event. The check suite contains a check run for each job in the workflow, and each job includes steps. GitHub Actions are run as a step in a workflow.

By this mapping of workflows to the checks API, the check_suite.completed event shoud trigger only after all the workflow jobs have ran. This would make sense.

In my use case, a webhook is listening to the check_suite.completed event to decide on approving a PR. Given that these events fire prematurely, it can very well be that approvals will be issued on an ultimately failing check_suites.

Worth noting, removing sequencing (the needs: [job*] statements), making all jobs run in parallel resolves this issue, with the check_suite.completed event triggering at the end of the workflow (according to the UI)

Has anyone else run into this issue? This looks like a bug to me.

Cheers!

3 Likes

Thanks for the report.  We’ll investigate.

1 Like

@acazacu@ethomson  this issue is currently happening to us with a custom app that we’ve created and I can repo it without a problem. We have more or less the same job calculation as you. 

I tried a couple of tests but I haven’t been able to figure out what the exact timing is but basically if there is a sufficiently long gap between a check_run.conclusion == success and the next check_run.create, then the check_suite event with action == completed is triggered. It’s something in the seconds. Which to some extent I understand because Github doesn’t know, what it doesn’t know. i.e. how many check_runs are left and if there are any left at all

This is causing us to disable our auto merge ability for our enterprise since we listen for the check_suites to be completed and status == completed, and we are triggering these too early. 

Two possible workarounds that i can think of for an app would be

  1. Create a check_run ingestor/mapper that creates all the check_runs on pull request ingestion (decently large refactor)
  2. Figure out a way to ignore the first check_suite completion event (hack) - won’t work. Tested this yesterday, thinking there could be two events triggered for the same check_suite.id after the state was set back to in_progress, and there is only one check_suite.completed event per check_suite.id. 

it would be ideal to be able to tweak a value like { report_success_after: 60 seconds }, so that we can control it being reported too early at the source.