Help
cancel
Showing results for 
Search instead for 
Did you mean: 

Understanding your graphs part 4 - Application servers and background jobs

GitHub Staff

 

In part 3 of our 'Understanding your graphs' mini-series, we talked about GitHub Enterprise Authentication graphs. In part 4, we're going to talk about GitHub Enterprise Application server and background job graphs.

 

App servers

 

The application servers section provides insight into the activity of GitHub Enterprise services which provide data to users, or integrations.

 

Sessions

App sessions graph

  • Profile of active sessions connected to GitHub Enterprise backend services. This graph provides a summary of the volume and type of activity from users.
  • Web unicorns sessions are often the largest portion of this graph, as users interact via the Web UI and API.

 

Errors

App errors graph

  • High error rates may indicate a problem with a service, or potential saturation due to request volume.
  • Please reach out to GitHub Business Support if you regularly encounter errors on this graph.

 

Active Workers

App active workers graph

  • Service workers which are currently serving a request.
  • User and integration daily activity trends are very visible in this graph.
  • Plateaus for extended periods in this graph indicate worker saturation, and should be investigated for any request queueing.
  • Worker counts automatically scale with system memory size at boot.

 

Queued Requests

App queued requests graph

  • Values in this graph indicate that requests were required to wait for a worker process to become available before it was able to process and serve the request.
  • If requests are constantly queuing, users will notice delays in responsiveness, as well as encounter errors or timeouts more frequently.
  • Queued requests occurring regularly is a major indicator of an undersized appliance for the amount of incoming requests.

 

App request/response

 

The Application request / response section looks at the rate of requests, how quickly those requests are responded to, and with what status they returned.

 

Throughput

App throughput graph

  • Per minute request counts, broken down by type.
  • API is typically the highest on systems with many integrations or active CI and project management tools.

 

Response time

App response time graph

  • Reflects the speed of web requests at the 90th percentile in milliseconds.
  • Times of over a few seconds can indicate a poor user experience due to long browser load times, or slow API responses.

 

CPU Time

App CPU time graph

  • Time spent in Ruby garbage collection within the GitHub Enterprise web application.
  • Plateaus for extended periods of GC time may indicate a problem with the GitHub Enterprise application itself.

 

I/O Time

App IO Time graph

  • Time spent accessing disk IO by data services which GitHub Enterprise depends on.
  • Plateaus for extended periods of time may indicate system resource saturation.

 

Response Code

Response code graph

  • The number of responses per HTTP status code.
  • 2xx successful status codes will normally be the largest.
  • 401 Unauthorized codes will also be present in environments where API and Git over HTTP traffic is present, as initial requests from clients may not provide authentication headers.
  • 500 statuses indicate a potential issue with the GitHub Enterprise application, and should be investigated with support.

 

Errors

App errors graph

  • Represents the number of application exceptions generated per minute.
  • High rates of errors may indicate an issue impacting the GitHub Enterprise application.

 

Background jobs

  • Number of tasks queued for background processing on the GitHub Enterprise appliance.

 

Resque

Resque graph

  • Many user and application actions trigger jobs which run asynchronously on GitHub Enterprise, and are queued to be processed by resqued.
  • Workers which process the maint_git-serv queues are paused during GitHub Enterprise Backup Utilities snapshot runs. It is normal to see the number for this queue increase while a snapshot is in progress. The queue should then drain rather quickly once the snapshot run is complete.
  • As there are a finite number of resque worker processes, queues which never drain to 0 may indicate resource saturation or in some cases jobs which have gotten stuck, requiring manual intervention to clear.
  • Many queues simultaneously having hundreds or thousands of jobs pending can indicate resource saturation. Queue length can also be inspected from the SSH admin console by running ghe-resque-info.

 

E-mail

Emails graph

  • When E-mail for notifications is enabled, this graph displays the length of the onboard postfix mail queues.

  • High numbers of deferred E-mail messages may indicate a problem with the configured SMTP server, or failures in mail delivery to specific user E-mail addresses.

 

Continue the conversation

 

There's more to come in the "Understanding your graphs" mini-series. If you'd like to follow along, just subscribe to the "Understanding your graphs" label (link below). Please let us know if you have any questions in the comments.