Scalable Continuous Integration (CI) Patterns

Continuous Integration is a real game-changer for many development teams. The ability to automatically test or compile code as it is written drastically shortens the time that it takes to verify that what you’re developing is doing what it’s supposed to. For a small team building a simple project, CI is pretty easy to set up. However, you may notice that as you have more developers and projects using CI and the size of those projects increase in complexity, the performance impact of connecting your CI system to GitHub Enterprise can increase greatly, impacting everyone on your team.

However, it doesn’t have to be this way. Here we’ll show you one way in which you can lower the performance impact of setting up CI for your projects.

Webhooks vs Polling

What is Polling? Why do people use it and what are its downsides?

Polling  is when a script or application makes a request for a resource at a regular interval. For instance, a build job can query GitHub every minute for the latest commit SHA. Polling can be requesting repository data via Git or as a GitHub REST API request to compare remote information to local data.

An advantage to polling is that it can provide resilience from system or network outages. GitHub Enterprise doesn’t support re-triggering undelivered webhooks and that can result in missed events such as build or deployment jobs.

Polling is often used with continuous integration build environments such as TeamCity and Jenkins as it can be more straightforward to set up than configuring webhooks.

What are Webhooks and how do they work?

webhook  is a way to deliver real-time data to applications. Unlike traditional APIs where you need to poll for data frequently in order to get quasi real-time information, webhooks are triggers that send data when specific events occur.

You can think about webhooks like push notifications on your mobile phone. Rather than burning up the battery on your phone fetching information (polling) from applications to get updates, push notifications - or webhooks - automatically send data based on event triggers. And, just like push notifications, webhooks are less resource-intensive.

For more information about webhooks, you may want to check out the GitHub Developer guide on webhooks.

Why Use Webhooks Instead of Polling an API?

Webhooks are far more efficient than polling, from a resource and communication standpoint. Zapier did a study across 30 million poll requests made through their service, and found that 98.5% of polls are wasted and they spent 66x more resources on polling.

Data is always old. The very nature of webhooks and the fact that they are typically event-triggered means they are providing you with near real-time information. Due to this, if you want information as close to real-time as possible, you should elect to use webhooks over polling.

Webhooks are superior to polling in terms of freshness of data, efficiency of communication, and infrastructure costs.

Getting started with Webhooks

You can configure webhooks directly in your repository or organization settings:

Once you’ve configured a hook, the new deliveries section helps you track, troubleshoot, and re-send a webhook event payload:

Other strategies and workarounds

When possible, using webhooks is always the best option for preventing performance problems when implementing CI with GitHub Enterprise. However, sometimes that’s not immediately possible if you have a lot of users using polling currently and sometimes switching to using webhooks isn’t enough. You can use the following strategies as needed to help mitigate performance issues while you move to a webhooks-based CI infrastructure.

Polling Intervals and API Throttling

As of GitHub Enterprise version 2.10, GitHub Enterprise administrators can set up API rate-limiting on their GitHub Enterprise appliance. If you are experiencing excessive traffic from a number of different sources, you may want to configure rate limiting to help reduce the performance impact of these sources while you figure out what projects need to be reconfigured.

If you do know which projects are causing an issue already, you can also have the application/script limit or sleep requests over time while you set up webhooks instead. This can provide some immediate relief if your system is overwhelmed with requests.

Application Optimization

Applications should cache non-critical API resources and use conditional requests. The downside to this is that even though this is available, the entire response has to be computed before we know it hasn’t changed. That limits how useful this is for performance improvements.

CI build optimization

Another way to improve the performance of your CI systems is to improve the optimization of the CI systems themselves. You should regularly audit build systems to eliminate redundant build jobs, copy artifacts between build jobs, and spread activity over time to avoid spikes. For more information, see other techniques to speed up builds.

Continue the conversation

What CI patterns have worked for you in the past? What other challenges have you faced in implementing CI at scale? Let us know in the comments below!

Further Learning

Interested in learning more and trying some of what you learn in a guided, hands-on way? Check out the GitHub Learning Lab course Getting Started with GitHub Apps. In this course, you’ll learn how to: install Probot apps onto your repository, put GitHub webhooks to use, get started with GitHub’s APIs, and work with GitHub Apps.