Load balancing of self hosted runners

We plan to migrate our CI and we are evaluating Actions as a candidate. We are already a customer of Github so we hope Actions can be used since we then expect the best integration.

We require self-hosted runners since many tests depend on USB connected hardware. We are a Camera company, much testing requires cameras to be connected.

Because of this, we have lots of weak PC (think 4 cores) that will be added to the runner pool to be able to connect all HW, they are not great but together they give us some compute power.

However, for building we have workstations with up to 128 logical cores, fast memory, and IO. These machines build our C++ code and run other heavy tasks up to 20 times faster than the weak “camera PCs”.

Because of this highly heterogeneous setup it is critical for CI latency (and throughput) that jobs are sent to the PC that at any time will do the job the quickest. As an example; Probably the first 3 jobs go on a 128 core WS, then two go on a 64 core WS, then one more at the 128 core WS, maybe now a 16 core WS will be chosen and so on. When all performant WSs are taken, jobs should be sent to the 4 core PCs, but only then since they are so slow. However, we do not want to not use them for compilation at all since in peak periods they do provide lots of value in computing minutes and they can take some load of the 128 core WS and reduce queues.

Today we use a load balancing plugin in Jenkins to deal with this. It works so that you give each PC an initial score (manually maintained and set while setting up a runner). Then, a penalty is given per active job on a runner. When a new job arrives it is given to the runner with the highest score.

This system gets the job done, but it is a pain to manually update all the initial scores when runner pool changes or when the nature of jobs significantly changes. Also since it is manual heuristics the load balancing is far from perfect. But it works…

Ideally, the load balancer should learn the requirements (CPU, IO) per jobs and the capabilities of runners and dynamically update these and schedule jobs at best effort. When runners or jobs is added or jobs change their behaviour this should be picked up by the balancer.

Now, I hope I have given enough context, else just ask. The question goes as;

  • How is Actions handling load balancing of self-hosted runners?
  • Does it cover our use case?
4 Likes

Hi @nedrebo,

Glad to see you in Github Community Forum!

Thanks for your detail description and i now know where you come from. Honestly, github self-hosted runner doesn’t support such load balancing. In github, it uses labels to define which runner is used for CI/CD jobs. As doc mentioned:

A self-hosted runner that matches all the assigned labels will then be eligible to run the job.

And it routes precedence for self-hosted runners as below:
https://docs.github.com/en/actions/hosting-your-own-runners/using-self-hosted-runners-in-a-workflow#routing-precedence-for-self-hosted-runners

Thanks

Thank you for the quick reply.

Is this feature on any roadmap? If not, can your comment on if the feature has been discussed and decided to be a “Won’t Do”?

Thanks for the link, I have not tried yet but I think I can (ab)use the repository-level and organization-level routing to make a poor man’s balancer. I can separate the runners into two groups, slow(organization-level) and fast(repository-level), and then the powerful ones will fill up first. Hopefully, repository-level runners can also be organization-level runners at the same time, then this should work pretty well.

Hi @nedrebo,

I don’t find related topic in public roadmap. You can raise a feature request ticket in below link where github product manager will help to check.
https://support.github.com/contact/feedback?contact[category]=actions

Thanks