Bonjour! (ou Bonsoir?)
There is two parts to a pre-receive hook: The script itself which will process every commit, and in some cases, a custom environment that contains the libraries and utilities necessary for the hook to run (like the python package, for instance). You can learn more about pre-receive hooks environments in our GitHub Enterprise documentation.
The credentials are still present in the git history on the client side, and the commits in question will have to be removed from the git history. Let's not forget that credential security is not about "if" but about "when". Attackers need to be lucky only once while defenders need to be lucky every time. Having a detection and remediation framework can be used for more than only the developers' work ;)
If you really want to credentials to be leaked, then pre-commits are really the way to go. They can be rolled out quickly with your configuration management tool of choice.
We do have a hard timeout which cannot be mofidied to ensure that pages are displayed without errors, and for internal workers to not be tied by a single push. For large pushes (raw SQL data, or minified JS for instance, but it can be anything), the execution time of a non-optimised script can be in the order of minutes.
Don't hesitate to reach out at https://support.enterprise.github.com/hc/en-us if you have further questions! Our team would be happy to assist.
... View more
We recommend fast storage for backups to ensure they complete quickly, so Glacier is out of the question unless you use it to store older snapshots. You can find more about our requirements in the GitHub Enterprise Backup Utilities repository.
If you have specific questions which are not answered there, you can also open a ticket with GitHub Enterprise support.
... View more
The story so far
We've explored pre-receive hooks as implemented in GitHub Enterprise, and specifically how and when to implement them. That said, pre-receive hooks are not a panacea, and there are other tools that are better suited when you want to add constraints when pushing your code.
One of the first use cases that comes to mind is security scanning. Catching private keys, security credentials, and other access codes before they are even made public and refusing pushes sounds like a perfect use for pre-receive hooks! However the constraints inherent to pre-receive hooks make them a poor choice for this:
Server-side Git data does not have a concept of working copy, which means that anything that needs to access the contents of a modified file has to checkout that file first.
Pre-receive hooks run after data has been recorded in the Git history, but before it has been copied back to the origin. This means that the commit history must be rewritten to remove the whole file or revision, which comes with its own challenges. In comparison, commits metadata can be modified with git commit --amend , for instance.
Larger pushes, which can contain either a large number of files, a large number of commits, or large files (or both!) add to the pre-receive hook runtime and will hit the preset timeout. Bypassing said timeout is not possible, and would severely impact the stability and performance of your instance otherwise.
Two options are available:
Use pre-commit hooks instead. This ensures critical data never leaves the workstation in the first place. Changes cannot be saved until the issues are addressed, and execution time is not constrained by timeouts.
Use webhooks and an external service to scan for known critical data. This has several key benefits. When critical data is detected, the external service is configured to automatically roll the credentials and notify the relevant stakeholders.
You can make the distinction between test data and actual live data by comparing them against your infrastructure, instead of rejecting based on a pattern.
Having a system that can automatically invalidate and renew keys, tokens, and other critical data prepares you to handle security incidents that happen outside of GitHub Enterprise. For instance, dealing with compromised workstations or storage devices.
You can run as many checks on as many patterns as you want, without exceeding time limits.
Let's say you want commits to match a specific coding style, or confirm the work done is tracked against the correct issue in your issue tracker. Again, using pre-receive hooks delivers immediate feedback...or does it?
Similarly to security scanning, pre-receive hooks do not handle large amounts of data very well. They also make it difficult to get an accurate report of the style violations. Additionally, relying on external systems to block or allow pushes can have strict consequences when a third-party server is unavailable or experiencing degraded performance.
A better alternative would be to configure protected branches and required status checks:
Configure a webhook to trigger on every push that will forward the push payload to a build server of your choice.
Then configure your required checks, and use the Statuses API endpoint to set the status, being one of error , failure , pending , or success .
Finally, configure your protected branches to prevent changes being committed directly to your default branch, and enforce required status checks using the status you've implemented.
There are several advantages to this process:
Work is not blocked, but changes cannot be merged until issues are fixed. Further commits to fix issues can be pushed, instead of having to rewrite history.
You're not depending on a third-party system being online and responding quickly 100% of the time.
A select group of users can fast track urgent changes by dismissing required checks, for instance to implement an emergency security fix.
You can identify build errors ahead of time, and once your changes are confirmed to be working, you can over and fix the syntax as well.
While pre-receive hooks are a powerful Git feature available to GitHub Enterprise administrators, they act as a complement to other features that assist your workflow rather than impede it. These are just a couple scenarios where pre-receive hooks are not the best solution. Have you run into other situations that seemed like they would be perfect for using pre-receive hooks but then caused issues? Let us know in the comments below!
... View more
Pre-receive hooks at glance
Git hooks are custom scripts that are fired during various Git operations. Some, like pre-commit hooks, are executed on your workstation before your changes are committed to the repository history. Some are executed server-side, like post-receive hooks that can be used to trigger updates to external services. Pre-receive hooks are executed on the server after the changes are received but before they are accepted. It is generally used to accept or reject Git commits based on specific criteria, which we will explore right now.
When to use Pre-Receive hooks
Generally, pre-receive hooks can be anything that can run server side and return an exit code of 0–when the commit is valid, and a non-zero exit code–to reject the commit. As the script will run for every single push, it needs to be lightweight and run quickly. This makes pre-receive hooks more suited for certain tasks than others.
For instance, pre-receive hooks are great to handle a commit's metadata, and as such work well for tasks such as:
Enforce specific commit messages guidelines.
Check if the author is using their corporate email.
Block specific file extensions.
Or even Restrict pushes by IP range!
They do not work so well for uses that expect a "working copy" of the Git repository where files are accessible directly. This applies for tasks such as credentials scanning or coding style checks. For the former, pre-commit hooks are recommended, as they will make sure that compromised credentials never make it to the Git history. For the latter, status checks are preferred. This will allow a submitter to push additional commits to make their code compliant, without imposing additional constraints on intermediate commits or work in progress.
Similarly, as execution speed is key, any form of validation that relies on API requests or long computations would work better as a status check.
Implement a pre-receive hook
First you will need to write a script. Pre-receive hooks process STDIN output–the Git push–so the script will be contained in while loop. For instance, using bash:
while read OLDREV NEWREV REFNAME ; do
Once armed with your script, you need to make sure you can run it on your GitHub Enterprise instance. Pre-receive hooks environments are self-contained and cannot access files or utilities on your instance. Unless you use third-party tools or utilities, the default pre-receive environment should be enough. It is included in every GitHub Enterprise instance and already has the following utilities: awk , bash , coreutils , curl , find , gnupg , grep , jq , sed .
If you require specific tools, you will need to create your own 64-bit Linux chroot environment, either using Docker to generate a chroot archive or by creating the chroot archive manually. Note that while Docker can be used to create compatible environments, it is not used server-side and your pre-receive hook is not running in a Docker image.
Once you've generated and uploaded your archive to your GitHub Enterprise instance, commit your pre-receive hook to a dedicated repository–do not include it in an existing repository!
Finally, you need to define a scope for your pre-receive hook. It can be applied to every repository or a select few, disabled or enabled by repositories administrators, enforced–which means the exit code will dictate whether the push will be rejected– or not.
I hope this has helped clarify what pre-receive hooks are, their value, and how to use them. If you have any questions, please feel free to comment below. Also, stay tuned for a more in-depth look at when not to use pre-receive hooks, coming up in a new GitHub Enterprise Best Practices article soon!
... View more
High-Availability and GitHub Enterprise Backup Utilities at glance
High-Availability for GitHub Enterprise
High-Availability for GitHub Enterprise takes the form of an Active-Passive setup, where a High-Availability replica is a warm standby instance that is kept in sync with your main production GitHub Enterprise instance, or primary asynchronously. This allows you to have a redundant GitHub Enterprise instance in a different datacenter or the Cloud, which is then available for planned or unplanned outages of your primary GitHub Enterprise instance. We strongly recommend planned failovers when possible, as any synchronization delay between the two instances will equate to data loss.
What High-Availability replication can do:
Increase availability by storing a redundant copy in a different location.
Help with datacenters cutovers or hardware migrations when configuration doesn't change.
What it doesn't:
Creating staging servers to test upgrades or configuration changes. Please refer to the GitHub Enterprise Backup Utilities instead.
Scaling out. Please refer to our Geo-Replication feature instead.
Automatic failover. While technically possible, the potential data loss when promoting an outdated replica negates any advantage of automated failovers.
Full instance backups. Please refer to the GitHub Enterprise Backup Utilities instead.
Zero downtime upgrades. To prevent data divergence between the instances, they both need to be upgraded during the same maintenance window, with no user access during the upgrade process.
GitHub Enterprise Backup Utilities
The GitHub Enterprise Backup Utilities is a suite of of tools developed by GitHub and made available publicly to take application-aware and consistent snapshots of your GitHub Enterprise instance. The GitHub Enterprise Backup Utilities also include deduplication features, ensuring faster backup times and lower total space usage for later snapshots, while ensuring the integrity of each individual snapshot. This means that while the space on disk is lower, each snapshot contains a full backup of your instance and can be used to restore a copy of your data.
They can be used to either restore a copy of your GitHub Enterprise instance to a staging server, to test upgrades and features, or as a main part of your disaster recovery strategy to produce full copies, down to the configuration settings, of your production instance. They can also be used to supplement the recovery features of your GitHub Enterprise instance and prevent against accidental or malicious data loss.
As stated earlier, the GitHub Enterprise Backup Utilities are fully application-aware, and are strongly recommended over hypervisor level snapshots. Hypervisor level snapshots can lead to data corruption, reduced performance on your instance, and in more critical cases, full outage of your GitHub Enterprise instance.
Configuring your High-Availability replica
The High-Availability replica is designed to act as a primary after a controlled failover, and as such, needs to be provisioned with the same amount of memory, CPU cores, and storage as the primary GitHub Enterprise instance. As all data written to the primary will be replicated to your High-Availability instance, you need to ensure a high-capacity, low latency (when possible) link is used between the two instances. This will ensure both instances are synchronized at all times and prevent data loss when failing over. We strongly recommend storing the replica instance on a different storage and or hypervisor subsystem, or even a different datacenter, to increase availability.
Not sure if your replica is up to date? You can monitor the replication status with ghe-repl-status . The ghe-repl-status command returns Nagios-compatible error codes—0 - OK, 1 - WARNING, 2- CRITICAL— so you can use check_by_ssh or a similar tool at regular intervals to keep an eye on your replica. We know that granting SSH access to your monitoring system can be scary, so you can restrict access to your GitHub Enterprise instance by prefixing your key file with the allowed command when you set the public SSH key via the Management Console:
command="/usr/local/bin/ghe-repl-status" ssh-rsa AAAA....
In the event of a catastrophic failure of the primary instance, a well maintained replica GitHub Enterprise instance will lessen the overall disruption to services, and reduce the recovery and restoration efforts required to restore users' access to GitHub Enterprise. Monitoring the replication status should be an integral part of your Disaster Recovery strategy, and will ensure minimum or no data loss when failing over to the replica.
Configuring the GitHub Enterprise Backup Utilities
The GitHub Enterprise Backup Utilities cannot be installed on your GitHub Enterprise instance. Doing so would defeat the purpose of taking backups, as well as potentially impact the performance of your GitHub Enterprise instance. A Linux server with a recent version of Git and rsync is required, as well as a filesystem that supports both symbolic and hard links.
This server will need a reliable and fast link to your production GitHub Enterprise instance, with access over port 122 allowed. No other ports are necessary. You will need at least 5 times the storage allocated to your GitHub Enterprise instance reserved, and more if you would like to keep additional backup snapshots. We also strongly recommend fast storage to avoid bottlenecks during the data transfer phase.
Once your server is ready, you only need to clone the GitHub Enterprise Backup Utilities repository locally, then edit backup.config to include your instance details. Finally, generate a new SSH key, add it to your GitHub Enterprise instance, and you can start taking backups of your instance immediately.
You can monitor the progress of your backups as you start a backup task manually, or send the output to a log file, for instance for scheduled backups:
ghe-backup -v 1>>/opt/backup-utils/backup.log 2>&1
Benchmarks for each backup run are also stored in the benchmarks folder for every snapshot. This can help you schedule your backups and identify possible bottlenecks.
As mentioned earlier, each individual snapshot is both an incremental copy and a full copy of your instance. In practice, this means you can take a copy of any successful snapshot and copy it to a remote location for purposes such as backup redundancy, data archiving, overlapping backup schedules.
Finally, be sure to test your backup restores often! For instance, you can deploy a blank GitHub Enterprise instance and select Migrate. You can then restore the latest backup snapshot with ghe-restore -v [IP] . Is your staging instance already configured, and you want a copy of the latest settings? Act with caution, and use ghe-restore -v -c [IP] to restore the configuration of your instance alongside your data.
The GitHub Enterprise Backup Utilities and High-Availability features complement each other and will help you be ready for any disaster coming your way, planned or unplanned. Not sure how to implement them? Feel free to comment here, or reach out to us via the Support Portal for any issue you may have.
... View more