Every day, the GitHub Enterprise Support team assists GitHub Enterprise Server administrators in managing, securing, and supporting the most critical and central hub of their organization’s development lifecycle. We’ve been keeping track of best practices for GitHub Enterprise Server Administrators that you can apply in your environment.
Overprovision hardware for growth
- Provision or expand your root volume to be larger than the default. To allow for the natural growth of your deployment, provisioning up to 2x or 4x or more than the 200GB default is recommended for busy systems.
- Pay close attention to disk IO performance and metrics such as burst bucket volumes and throughput maximums. (One example is the published performance expectations of EBS volume types). Fast SSD-based, low latency storage is required for optimal performance. Git operates on vast numbers of generally very small files meaning disk latency, and by extension IOPS capability, is extremely important. A fast, locally attached SSD-based disk is highly recommended for optimal performance.
- More memory = more room for disk cache and additional GitHub worker and service scaling.
- Also provide sufficient CPU and resources along with a fast disk for your GitHub Enterprise Backup Utilities system to increase backup and restore speed.
Test / Staging environments
- Keep your GitHub Enterprise Server appliance up to date with the latest release to take advantage of security updates and bug fixes. Testing upgrades in a staging environment is important, particularly when upgrading to a new feature release.
- Make use of one or more GitHub Enterprise Server Staging Environments to test and familiarize yourself with procedures such as Upgrading the appliance, Configuring and failing over in High Availability scenarios, and Restoring backups
- Your GitHub Enterprise Server license can be used to set up any number of non-production environments, and their use is strongly encouraged when preparing for upgrades or other changes to the environment.
High Availability and Disaster Recovery (Backups)
- Configuring at least one High Avalaiblity Replica is advised for redundancy and ease of recovery in the event of a primary appliance failure.
- GitHub Enterprise Backup Utilities should be configured and backups scheduled to run regularly against your GitHub Enterprise Servers. Testing restores to a staging or test environment is also equally important.
- Placing replicas and GitHub Enterprise Backup Utilities hosts in different regions or datacenters from the primary appliance can help to ensure geographic redundancy.
Implement policies for rate limiting and abuse
- API and Git rate limiting is only effective if you also set enforceable, policy-backed limits. Exceptions to your rate-limiting policy should be rare and used for specific integrations and service accounts that are outliers to your normal user API usage.
- Use log forwarding and
collectd
metrics forwarding to spot increases in request rates, bad actors, and patterns. - CLI usage reporting tools such as
ghe-governor-summary
,ghe-btop
, should be regularly reviewed for outlying or abusive behaviors. - Suspend bad actors, or block them at a network or firewall level.
Be a good CI or API citizen
- Use webhooks rather than cron-style polling
- If Cron-style polling has to be done, spread out and skew the fetches to reduce parallel requests.
- Utilize conditional requests with the GitHub API, and ensure that CI plugins are up to date, with caching features enabled as well.
- Build artifacts, logs, or even data backups should not be stored in git (especially in
refs/heads
). Git is not designed to be a database. Use Git LFS when dealing with large files to keep Git performing great. - Frequent automated pushes can be even more impactful on overall performance than frequent fetches. Limit usage of cron style pushes, and batch multiple commits into a single push where possible if automating pushes on a schedule.
Join the conversation!
Now you have some best practices to manage your GitHub Enterprise Server environment and tips to make it more secure and reliable. At the end of the day, maintaining a healthy and reliable environment is up to how you and your team manage it. What are some of your go-to practices? Please feel free to share in the comments below!