I have a workflow that contains jobs that run on ubuntu-latest and one job that runs on a self-hosted runner.
The workflow has run correctly, but the web UI has not updated properly. I believe this happens only rarely. I also think re-running the workflow will not display the bug; it is a fluke that happens only rarely.
The primary reason why I’m reporting this is because there have been other problems with self-hosted runners lately, and this contributes to the confusion when identifying what exactly is going wrong with them. When this happened to me, I thought that the self-hosted runner had not picked up the job at all, and I was on the verge of cancelling the workflow run, rebooting machinery and re-running the workflow.
The setup is as follows:
At the time that the workflow run is triggered, the self-hosted runner is in the Offline state.
Job 1 runs on ubuntu-latest, and processes quickly.
Jobs 2 & 3 depend on job 1, and are enqueued at the same time.
Job 2 runs on ubuntu-latest; it starts a VM with the self-hosted runner. This is a quick curl trigger. This job ends after a few seconds.
A minute or so later, the self-hosted runner starts up, and job 3 begins processing. <=== the UI does not update to reflect that job 3 has begun processing - but it is evident from the runner’s local logs & activity that it has begun processing.
Screenshots at this time - ~12 minutes into the processing:
When job 3 completes, job 4 begins.
Job 4 runs on ubuntu-latest. It depends on job 3. It performs a curl call to shut down the VM. It completes after a few seconds.
Once job 4 has completed, the entire workflow run is marked as complete, even though job 3 still shows as queued. Screenshots:
Here is a rough story of the runner activity:
Message '1' received from session 'b62f102f-2575-40d1-b8b2-8329b08e99f3'. Job request 2 for plan f81e04b6-3f2b-4705-b2a5-931efc11f01d job 0085fae8-b2ce-57a6-d3d6-b7fd7ddf8534 received. Pull OrchestrationId f81e04b6-3f2b-4705-b2a5-931efc11f01d.build-win64.__default from JWT claims WRITE LINE: 2020-07-24 19:38:24Z: Running job: Build for Win64 Start renew job request 2 for job 0085fae8-b2ce-57a6-d3d6-b7fd7ddf8534. Successfully renew job request 2, job is valid till 7/24/2020 7:48:24 PM ... Send job request message to worker for job 0085fae8-b2ce-57a6-d3d6-b7fd7ddf8534. Entering JobStarted Notification Entering StartMonitor Successfully renew job request 2, job is valid till 7/24/2020 7:49:24 PM Successfully renew job request 2, job is valid till 7/24/2020 7:50:24 PM Successfully renew job request 2, job is valid till 7/24/2020 7:51:24 PM ... Successfully renew job request 2, job is valid till 7/24/2020 8:11:29 PM Successfully renew job request 2, job is valid till 7/24/2020 8:12:29 PM STDOUT/STDERR stream read finished. STDOUT/STDERR stream read finished. Finished process 5196 with exit code 100, and elapsed time 00:24:32.5817623. Worker finished for job 0085fae8-b2ce-57a6-d3d6-b7fd7ddf8534. Code: 100 finish job request for job 0085fae8-b2ce-57a6-d3d6-b7fd7ddf8534 with result: Succeeded WRITE LINE: 2020-07-24 20:02:57Z: Job Build for Win64 completed with result: Succeeded Stop renew job request for job 0085fae8-b2ce-57a6-d3d6-b7fd7ddf8534. job renew has been canceled, stop renew job request 2. Entering JobCompleted Notification Entering EndMonitor
Here is the workflow run in question: https://github.com/falldamagestudio/UE4-GHA-Game/actions/runs/181595020