SelfHosted Runners - when does a workflow check if they are available?

Hi

I’m interested in automating the provisioning of new self hosted runners.

Workflows seem to fail if no self hosted runners are available even if they are used at a later job. Ideally I’d want to provision a runner in an initial job, then a second job would actually use that self hosted runner.

Is there any documentation that states explicitly when a self hosted runner is check for or needs to be available?

@frostebite,

I’d want to provision a runner in an initial job,

What runner did you use to run this initial job? A self-hosted runner or a GitHub-hosted runner?

  • If you use a self-hosted runner, you should avoid directly installing the new self-hosted runner in the working directory of the runner where the initial job was running on.
    Because when the job completed, generally some processes created by the job execution will be cleaned, and some files generated during the job execution also will be cleaned. Typically the related files are in the runner’s working directory.

  • If you use a GitHub-hosted runner and install the new self-hosted runner on the hosted machine of the GitHub-hosted runner, the new self-hosted runner won’t be available.
    Because every time the initial job executes in a fresh instance of the virtual machine, and when the job completed the virtual machine will be discarded.

when a self hosted runner is check for or needs to be available

Generally when a self-hosted runner is Idle status, it is available.
When a job is in queue, GitHub follows the order below to route jobs to self-hosted runners:

  1. Attempts to locate a repository-level runner that can match the specified labels.

    • If have matched repository-level runners but are Active status, the job will keep in queue until a matched runner is available or cancelled with time-out (the maximum job queue time is 24 hours).
    • If no matched repository-level runners or the matched repository-level runners are Offline status, the job is sent to the organization-level runners.
  2. Attempts to locate a organization-level runner that can match the specified labels.

    • If have matched organization-level runners but are Active status, the job will keep in queue until a matched runner is available or cancelled with time-out (the maximum job queue time is 24 hours).
    • If no matched organization-level runners or the matched organization-level runners are Offline status, the job fails with an error.

Associated docs:

@frostebite,

How are things going?
Are my above explanations helpful to you to understand?
If you have questions about this topic, feel free to tell us, and we will provide the help as much as we can.

Hi Brightran,

thanks for your answer.

I think the first two scenarios you outline do not cover my use case.

To be clear, I would not use a self hosted runner to setup a new self hosted runner. And I’m not saying it would be setup locally.

The basic workflow I’m thinking about would be something like:

  1. Job 1 (runs on a github runner), connects to cloud provider and creates new self hosted runner (e.g on GCP, AWS). This cloud hosted runner is configured to join the appropriate repository etc.
  2. Job 2 (runs on the cloud based self hosted runner), does some workload better supported by a self hosted runner (e.g large disc, large CPU workload, GPU, whatever).
  3. Job 3 (runs on a github runner) cleans up cloud provisioned runner.

What I observe when trying this is the workflow seems to fail and not do step 1, because the workflow requires a self-hosted runner so github won’t even start it.

If I pre-provision one self-hosted runner and then destroy it without removing the runner, it shows as offline, the workflow does now run. But Job 2 won’t start. Even when the provisioned self-hosted runner is online and fully available. It seems like github doesn’t continually recheck for a runner, it just checks once?

What I’m looking for is any further clarification as to when github checks if a self-hosted runner is available and what will cause it to get stuck.

@frostebite,

What I observe when trying this is the workflow seems to fail and not do step 1, because the workflow requires a self-hosted runner so github won’t even start it.

All the jobs in a workflow run in parallel by default. After the workflow is triggered to run, the jobs will startup and go in queue almost simultaneously. The jobs run on GitHub-hosted runners will be assigned the specified runners. The jobs run on self-hosted runners will follow the order I mentioned above to route self-hosted runners.
In your workflow, If Job1 and Job2 run in parallel, it is almost impossible that Job2 will wait Job1 to complete installing the self-hosted. So Job2 will be failure almost immediately after the workflow startup.

You can try to use the ‘needs’ key (jobs.<job_id>.needs ) to set the execution order of these jobs.

jobs:
  job1:
    --- install a cloud hosted runner ---

  job2:
    needs: job1
    --- run on the cloud hosted runner ---

  job3:
    needs: [job1, job2]
    --- cleans the cloud runner ---

Under this configuration, before job1 completes successfully, job2 will not startup to in queue. Before job1 and job2 complete successfully, job3 will not startup.

The behaviour I described above occurs when I’m using needs on the second and third job.

@frostebite,

Maybe you also can try to setup two workflows:

  1. The first workflow is to execute the job to install the cloud hosted runner (job1). And add a step to execute the “Create a workflow dispatch event” API to trigger the second workflow when the cloud hosted runner is installed successfully.

  2. The second workflow runs on the workflow_dispatch event, to execute the job that needs to run on the cloud hosted runner (job2), and the job to clean the cloud runner (job3).