Workflow run has completed successfully, but one job displays as "queued" in the UI

I have a workflow that contains jobs that run on ubuntu-latest and one job that runs on a self-hosted runner.

The workflow has run correctly, but the web UI has not updated properly. I believe this happens only rarely. I also think re-running the workflow will not display the bug; it is a fluke that happens only rarely.

The primary reason why I’m reporting this is because there have been other problems with self-hosted runners lately, and this contributes to the confusion when identifying what exactly is going wrong with them. When this happened to me, I thought that the self-hosted runner had not picked up the job at all, and I was on the verge of cancelling the workflow run, rebooting machinery and re-running the workflow.

The setup is as follows:
At the time that the workflow run is triggered, the self-hosted runner is in the Offline state.

Job 1 runs on ubuntu-latest, and processes quickly.
Jobs 2 & 3 depend on job 1, and are enqueued at the same time.
Job 2 runs on ubuntu-latest; it starts a VM with the self-hosted runner. This is a quick curl trigger. This job ends after a few seconds.
A minute or so later, the self-hosted runner starts up, and job 3 begins processing. <=== the UI does not update to reflect that job 3 has begun processing - but it is evident from the runner’s local logs & activity that it has begun processing.

Screenshots at this time - ~12 minutes into the processing:


When job 3 completes, job 4 begins.
Job 4 runs on ubuntu-latest. It depends on job 3. It performs a curl call to shut down the VM. It completes after a few seconds.

Once job 4 has completed, the entire workflow run is marked as complete, even though job 3 still shows as queued. Screenshots:


Here is a rough story of the runner activity:

Message '1' received from session 'b62f102f-2575-40d1-b8b2-8329b08e99f3'.
Job request 2 for plan f81e04b6-3f2b-4705-b2a5-931efc11f01d job 0085fae8-b2ce-57a6-d3d6-b7fd7ddf8534 received.
Pull OrchestrationId f81e04b6-3f2b-4705-b2a5-931efc11f01d.build-win64.__default from JWT claims
WRITE LINE: 2020-07-24 19:38:24Z: Running job: Build for Win64
Start renew job request 2 for job 0085fae8-b2ce-57a6-d3d6-b7fd7ddf8534.
Successfully renew job request 2, job is valid till 7/24/2020 7:48:24 PM
...
Send job request message to worker for job 0085fae8-b2ce-57a6-d3d6-b7fd7ddf8534.
Entering JobStarted Notification
Entering StartMonitor
Successfully renew job request 2, job is valid till 7/24/2020 7:49:24 PM
Successfully renew job request 2, job is valid till 7/24/2020 7:50:24 PM
Successfully renew job request 2, job is valid till 7/24/2020 7:51:24 PM
...
Successfully renew job request 2, job is valid till 7/24/2020 8:11:29 PM
Successfully renew job request 2, job is valid till 7/24/2020 8:12:29 PM
STDOUT/STDERR stream read finished.
STDOUT/STDERR stream read finished.
Finished process 5196 with exit code 100, and elapsed time 00:24:32.5817623.
Worker finished for job 0085fae8-b2ce-57a6-d3d6-b7fd7ddf8534. Code: 100
finish job request for job 0085fae8-b2ce-57a6-d3d6-b7fd7ddf8534 with result: Succeeded
WRITE LINE: 2020-07-24 20:02:57Z: Job Build for Win64 completed with result: Succeeded
Stop renew job request for job 0085fae8-b2ce-57a6-d3d6-b7fd7ddf8534.
job renew has been canceled, stop renew job request 2.
Entering JobCompleted Notification
Entering EndMonitor

Here is the workflow run in question: https://github.com/falldamagestudio/UE4-GHA-Game/actions/runs/181595020

Thanks for your feedback.
I found that the GitHub engineering team is investigating this issue in this ticket . You could continue to track this issue in the link above. If there is any progress, we will update there. Thanks

1 Like