Scaling Self Hosted Runners

Are there any recommendations around deploying scaling self hosted runners?  Right now, our 2 runners live on 2 EC2 instances and sit idle most of the time.  Right before releases we tend to queue up 10-15 jobs that cause a bottleneck for developers.  We are using AWS under the hood, and I’d like to find a solution that auto-scale based on load.  Some of the ideas I was thinking about exploring were:

  • setting up a Docker image that scales the number of active instances based on load (I’m not sure if this will work with the token setup)

  • running multiple process on the same EC2 instance, so that we don’t need an instance per runner

  • exploring if there is a solution around serverless (AWS Lambda) architecture where we could spin up runners only when they are requested

Has anyone else attempted to solve this, and if so what worked/didn’t work?

1 Like

I’ve been working toward a solution like this, and am, I think pretty close.

https://github.com/terradatum/github-runner is a project designed around the upstream https://github.com/actions/virtual-environments and is currently generating both AMIs and Docker images.

I’ve got the AMIs close to a place where I think they could be the basis for an AutoScaling group, but they’re not quite there yet. Specifically, they need a secure method for registering and de-registering themselves from GitHub during scaling.

Until then, I’ve got them set up so they can run multiple runners via systemd.

Hi @seansean11 ,

Github supports self-hosted runner rest api, eg: create a registration token for runner register, delete self-hosted runner from a repo…etc. Hence, if you’d like scaling self-hosted runners, you can set up a new docker image and use rest api command to register, delete…

It’s not recommended to run multiple process on same EC2 instance, since multiple jobs will execute on same machine and may disturb each other, eg: same env but different value required.

Thanks.

re: “…multiple jobs will execute on same machine and may disturb each other, eg: same env but different value required.”

systemd process isolation prevents this. The two areas where the runner would step on another runner is the installation directory (auto-update) and the working directory (actions setup). I’ve accounted for this and each service runs from its own installation of the runner and has its own work directory.

We, at Philips, are using autoscaled self hosted github action runners on AWS.

On demand we’re spinning up spot instances and assign them to workflows.

We’ve open sourced it: https://github.com/philips-labs/terraform-aws-github-runner

One of its creators wrote a blog about it with a lot of information about the working: https://040code.github.io/2020/05/25/scaling-selfhosted-action-runners

1 Like