GitHub Enterprise Server has an internal monitor and concurrency controller for Git processes called Governor, which keeps count of Git operations. A command-line utility to query Governor data (
ghe-governor) was made available with GitHub Enterprise Server 2.11. Governor data files, located under
/data/user/gitmon/, hold one hour of data per file and are retained for two weeks. The files contain timestamps in their names which you can use to confirm the time period they cover. Here is an example:
$ sudo sh -c 'ls /data/user/gitmon/gitmon.*.db'
| sed -n '1p;$p' | grep -P -o '\d+' | while read; do echo -n $REPLY '= '; date -d@$REPLY; done 1551186000 = Tue Feb 26 13:00:00 UTC 2019 1552392000 = Tue Mar 12 12:00:00 UTC 2019
First, let's have a look at Governor's syntax. We will focus on common examples and queries later in this article.
Governor can find the top N records of Git queries for a given metric (column). The resulting table will be sorted by that column.
ghe-governor top <column> [options]
The column can be any of
Governor can find the top N groups of Git queries for a given grouping function and a given metric (column).
ghe-governor aggregate <grouping-function> <column> [options]
The grouping function can be any of
die_message_raw. When grouping by
cloning, the resulting values for
KEY have the following meaning:
2= shallow fetch
3= shallow clone
The column can be any of
Please see below for an explanation of some of the resulting table columns:
RTmeans response time, so
AVG RTis the average time in seconds that Git invocations took, and
MAX RTis the running time in seconds of the longest-running invocation, per host.
PLis parallelism, or how many Git invocations are running at any time. So
AVGPLare the maximum and average, respectively.
CPU/SECis how many seconds of CPU time are used only by Git per second of wall-clock time. This is the number of CPUs dedicated to Git, averaged over the entire duration of the query. You can divide the value by the actual number of CPU cores to get Git specific CPU percentage utilization. Unlike Unix system load, this number cannot exceed the actual number of CPU cores.
UPLis data that GitHub Enterprise Server uploaded -- i.e., client fetches and clones.
RCVis data that GitHub Enterprise Server received -- i.e., client pushes.
RCVcolumns are all in GB, but the rate is in MB/s.
Every query type can be limited in scope in the following ways:
-j= set output format to JSON instead of an ASCII table
-n<N>= limit the output size to N (default: 20)
-t <timespec>= only consider Git invocations since a given start time (default: 48 hours ago). You may want to use a tool such as https://www.epochconverter.com/ to convert UTC to Unix Epoch for finely-grained queries.
-t 1371614483= Invocations since a given Unix timestamp (seconds since 1970)
-t 1371614483637= Invocations since a given Java timestamp (milliseconds since 1970)
-t-1d= Invocations in the last day
-t-2h= Invocations in the last two hours
-t-20m= Invocations in the last twenty minutes
-u <timespec>= consider Git invocations up to a given end time (default: now)
-r <owner>/<repository>= consider only queries that match a given owner (user or organization) and repository. You can specify this option multiple times (logical OR).
-o <owner>= consider only queries that match a given owner (user or organization). You can specify this option multiple times (logical OR).
-V <protocol>= consider only queries arriving via a specific protocol (e.g.
web branch create,
pull request branch delete button, or
pull request merge button). You can specify this option multiple times (logical OR).
-P <program>= consider only queries that ran a given Git subprogram (e.g.
for-each-ref). You can specify this option multiple times (logical OR).
-I <address>= consider only queries from a specific IP address.
-I ""means local operations and is equivalent to
-V shell. You can specify this option multiple times (logical OR).
The following are long options for aggregate queries:
--count-only= only show the
--distinct-users= also show the
Now, that we know Governor's syntax, let's have a look at typical usage scenarios and example queries.
The overall summary provides the total and average number of Git requests over a recorded period:
The following set of sample commands may help to identify Git traffic patterns or spikes in activity. They make use of the
count metric, which is a good reference point to know what is being requested the most.
ghe-governor aggregate repo count
ghe-governor aggregate repo count -P upload-pack
ghe-governor aggregate cloning count -P upload-pack -r <organization>/<repository>
ghe-governor aggregate repo count -P receive-pack
ghe-governor aggregate user_id count -o <organization> -P upload-pack --distinct-users
To dive a bit deeper, the following queries indicate the actual volume of Git traffic:
ghe-governor aggregate user_id avg_uploaded ghe-governor aggregate ip avg_uploaded ghe-governor aggregate repo avg_uploaded
ghe-governor aggregate user_id uploaded_kb ghe-governor aggregate ip uploaded_kb ghe-governor aggregate repo uploaded_kb
ghe-governor top uploaded
Furthermore, you might be interested in bursts of concurrent clones. A thundering herd of clones can cause a spike in resource usage. You can check for concurrent clones by aggregating on
max_parallelism (result table column
ghe-governor aggregate repo max_parallelism -P upload-pack
The above metrics are only so useful in performance profiling. But Governor also collects CPU timing data, which is helpful in diagnosing high CPU utilization caused by Git operations.
Top repositories by CPU time:
ghe-governor aggregate repo cpu
Top programs by CPU time for a single repository:
Using the repository
-r flag, you can see the CPU breakdown for individual repositories as well. This time we're interested in the
program that used the most CPU time:
ghe-governor aggregate program cpu -r <organization>/<repository>
Top IP addresses by CPU time for a single repository:
Grouping by IP address and CPU time can help to identify continuous integration systems or users that are causing a performance hit:
ghe-governor aggregate ip cpu -r <organization>/<repository>
General Governor records with the most CPU time (not grouped):
ghe-governor top cpu
Sometimes, you want to find out which repository or program caused a specific disk write peak that you've seen. The following commands may be of help here.
ghe-governor aggregate repo disk_write_kb -t <timespec> -u <timespec>
ghe-governor aggregate program disk_write_kb -t <timespec> -u <timespec>
ghe-governor top disk_write
Governor ships with GitHub Enterprise Server and is able to provide insights about how your developers use Git and which implications their behavior may have on your GitHub Enterprise Server instance. In Enterprise Support, we regularly rely on Governor to help us answer all kinds of questions related to Git usage. Now, you can do the same.
What are your experiences with Governor? Feel free to comment below!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.