Understanding your graphs part 4 - Application servers and background jobs


In part 3 of our ‘Understanding your graphs’ mini-series, we talked about GitHub Enterprise Authentication graphs. In part 4, we’re going to talk about GitHub Enterprise Application server and background job graphs.

App servers

The application servers section provides insight into the activity of GitHub Enterprise services which provide data to users, or integrations.



App sessions graph

  • Profile of active sessions connected to GitHub Enterprise backend services. This graph provides a summary of the volume and type of activity from users.
  • Web unicorns sessions are often the largest portion of this graph, as users interact via the Web UI and API.



App errors graph

  • High error rates may indicate a problem with a service, or potential saturation due to request volume.
  • Please reach out to GitHub Business Support if you regularly encounter errors on this graph.


Active Workers

App active workers graph

  • Service workers which are currently serving a request.
  • User and integration daily activity trends are very visible in this graph.
  • Plateaus for extended periods in this graph indicate worker saturation, and should be investigated for any request queueing.
  • Worker counts automatically scale with system memory size at boot.


Queued Requests

App queued requests graph

  • Values in this graph indicate that requests were required to wait for a worker process to become available before it was able to process and serve the request.
  • If requests are constantly queuing, users will notice delays in responsiveness, as well as encounter errors or timeouts more frequently.
  • Queued requests occurring regularly is a major indicator of an undersized appliance for the amount of incoming requests.


App request/response

The Application request / response section looks at the rate of requests, how quickly those requests are responded to, and with what status they returned.



App throughput graph

  • Per minute request counts, broken down by type.
  • API is typically the highest on systems with many integrations or active CI and project management tools.


Response time

App response time graph

  • Reflects the speed of web requests at the 90th percentile in milliseconds.
  • Times of over a few seconds can indicate a poor user experience due to long browser load times, or slow API responses.


CPU Time

App CPU time graph

  • Time spent in Ruby garbage collection within the GitHub Enterprise web application.
  • Plateaus for extended periods of GC time may indicate a problem with the GitHub Enterprise application itself.


I/O Time

App IO Time graph

  • Time spent accessing disk IO by data services which GitHub Enterprise depends on.
  • Plateaus for extended periods of time may indicate system resource saturation.


Response Code

Response code graph

  • The number of responses per HTTP status code.
  • 2xx successful status codes will normally be the largest.
  • 401 Unauthorized codes will also be present in environments where API and Git over HTTP traffic is present, as initial requests from clients may not provide authentication headers.
  • 500 statuses indicate a potential issue with the GitHub Enterprise application, and should be investigated with support.



App errors graph

  • Represents the number of application exceptions generated per minute.
  • High rates of errors may indicate an issue impacting the GitHub Enterprise application.


Background jobs

  • Number of tasks queued for background processing on the GitHub Enterprise appliance.



Resque graph

  • Many user and application actions trigger jobs which run asynchronously on GitHub Enterprise, and are queued to be processed by resqued.
  • Workers which process the maint_git-serv queues are paused during GitHub Enterprise Backup Utilities snapshot runs. It is normal to see the number for this queue increase while a snapshot is in progress. The queue should then drain rather quickly once the snapshot run is complete.
  • As there are a finite number of resque worker processes, queues which never drain to 0 may indicate resource saturation or in some cases jobs which have gotten stuck, requiring manual intervention to clear.
  • Many queues simultaneously having hundreds or thousands of jobs pending can indicate resource saturation. Queue length can also be inspected from the SSH admin console by running ghe-resque-info.



Emails graph

When E-mail for notifications is enabled, this graph displays the length of the onboard postfix mail queues.

High numbers of deferred E-mail messages may indicate a problem with the configured SMTP server, or failures in mail delivery to specific user E-mail addresses.


Continue the conversation

There’s more to come in the “Understanding your graphs” mini-series. If you’d like to follow along, just subscribe to the “Understanding your graphs” label (link below). Please let us know if you have any questions in the comments.


@djdefi This document is very helpful for understanding the graphs.

It would be really helpful if you can add regarding hookshot workers in the background jobs and help us understand what do they do or how importance is it ? How often we need to check the numbers of the hookshot? At what number we need to worry about jobs getting queued if the external endpoint is not responding in time. This will really help us. Thanks 

1 Like