Continuous Integration is a real game-changer for many development teams. The ability to automatically test or compile code as it is written drastically shortens the time that it takes to verify that what you're developing is doing what it's supposed to. For a small team building a simple project, CI is pretty easy to set up. However, you may notice that as you have more developers and projects using CI and the size of those projects increase in complexity, the performance impact of connecting your CI system to GitHub Enterprise can increase greatly, impacting everyone on your team.
However, it doesn't have to be this way. Here we'll show you one way in which you can lower the performance impact of setting up CI for your projects.
Polling is when a script or application makes a request for a resource at a regular interval. For instance, a build job can query GitHub every minute for the latest commit SHA. Polling can be requesting repository data via Git or as a GitHub REST API request to compare remote information to local data.
An advantage to polling is that it can provide resilience from system or network outages. GitHub Enterprise doesn't support re-triggering undelivered webhooks and that can result in missed events such as build or deployment jobs.
Polling is often used with continuous integration build environments such as TeamCity and Jenkins as it can be more straightforward to set up than configuring webhooks.
A webhook is a way to deliver real-time data to applications. Unlike traditional APIs where you need to poll for data frequently in order to get quasi real-time information, webhooks are triggers that send data when specific events occur.
You can think about webhooks like push notifications on your mobile phone. Rather than burning up the battery on your phone fetching information (polling) from applications to get updates, push notifications - or webhooks - automatically send data based on event triggers. And, just like push notifications, webhooks are less resource-intensive.
For more information about webhooks, you may want to check out the GitHub Developer guide on webhooks.
Webhooks are far more efficient than polling, from a resource and communication standpoint. Zapier did a study across 30 million poll requests made through their service, and found that 98.5% of polls are wasted and they spent 66x more resources on polling.
Data is always old. The very nature of webhooks and the fact that they are typically event-triggered means they are providing you with near real-time information. Due to this, if you want information as close to real-time as possible, you should elect to use webhooks over polling.
Webhooks are superior to polling in terms of freshness of data, efficiency of communication, and infrastructure costs.
You can configure webhooks directly in your repository or organization settings:
Once you've configured a hook, the new deliveries section helps you track, troubleshoot, and re-send a webhook event payload:
When possible, using webhooks is always the best option for preventing performance problems when implementing CI with GitHub Enterprise. However, sometimes that's not immediately possible if you have a lot of users using polling currently and sometimes switching to using webhooks isn't enough. You can use the following strategies as needed to help mitigate performance issues while you move to a webhooks-based CI infrastructure.
As of GitHub Enterprise version 2.10, GitHub Enterprise administrators can set up API rate-limiting on their GitHub Enterprise appliance. If you are experiencing excessive traffic from a number of different sources, you may want to configure rate limiting to help reduce the performance impact of these sources while you figure out what projects need to be reconfigured.
If you do know which projects are causing an issue already, you can also have the application/script limit or sleep requests over time while you set up webhooks instead. This can provide some immediate relief if your system is overwhelmed with requests.
Applications should cache non-critical API resources and use conditional requests. The downside to this is that even though this is available, the entire response has to be computed before we know it hasn't changed. That limits how useful this is for performance improvements.
Another way to improve the performance of your CI systems is to improve the optimization of the CI systems themselves. You should regularly audit build systems to eliminate redundant build jobs, copy artifacts between build jobs, and spread activity over time to avoid spikes. For more information, see other techniques to speed up builds.
What CI patterns have worked for you in the past? What other challenges have you faced in implementing CI at scale? Let us know in the comments below!
Interested in learning more and trying some of what you learn in a guided, hands-on way? Check out the GitHub Learning Lab course Getting Started with GitHub Apps. In this course, you’ll learn how to: install Probot apps onto your repository, put GitHub webhooks to use, get started with GitHub's APIs, and work with GitHub Apps.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.