Understanding your graphs part 5 - Network and Storage
In part 4 of our ‘Understanding your graphs’ mini-series, we talked about GitHub Enterprise Application servers and Background job graphs. In part 5, we’re going to talk about GitHub Enterprise Network and Storage graphs.
The network interface graphs can be useful in profiling user activity, and throughput of traffic in and out of the the GitHub Enterprise appliance.
- Breaks down the number of clients per TCP port, which is useful for examining how users are interacting with GitHub Enterprise.
- Further details on the TCP connection state, which can be useful in troubleshooting network or Load Balancer issues in some cases.
- The amount of data transferred inbound and outbound from the GitHub Enterprise appliance.
- TX (outbound) traffic is most commonly higher than RX (inbound), especially when many systems are “polling” the API or Git repositories for changes.
- Plateaus in this graph can be an indication of link saturation or reaching the maximum possible link throughput.
- The presence of any errors may indicate a problem with the physical or virtual network card, or cables connected to the Hypervisor host system.
- The amount of data sent to, and received by replica instances over the internal OpenVPN interface.
Replication Interface Errors
- Errors may occur here due to saturation or MTU problems on the physical link; however, these are generally not critical errors.
GitHub Enterprise repository performance is very dependent on the underlying storage system. Low latency, local SSD disks provide the highest performance. For more information on the GitHub Enterprise storage architecture, please see the System Overview guide on our documentation site.
Disk usage (Root Device)
- Disk space in bytes available for root volume storage.
- Growth on this volume is generally due to logging, which is on a 24 hour rotation schedule.
- The root volume reaching 100% usage can cause a system outage, or indicate a service issue which is causing extreme log growth.
Disk usage (Data Device dm-0)
- Disk space in bytes available for the user data volume.
- All user profile data, pull request and issue metadata, repositories, and release assets are stored on this device.
- The data volume reaching 85% usage will cause problems with the built in search functionality of GitHub Enterprise. It is recommended to increase storage capacity of the data volume prior to reaching 85% usage.
Disk latency (Root Device & Data Device dm-0)
- For best IO performance, average latency values below 10ms are recommended.
- Large spikes may be an indication of storage system saturation.
Disk operations (Root Device)
- Abnormally large amounts of time spent in root IO suddenly appearing may indicate a logging issue, or a general storage problem.
Disk operations (Data Device dm-0)
- Abnormally large amounts of time spent in data volume IO suddenly appearing may indicate a repository maintenance issue, or a general storage problem.
- Graph for reads trends generally follows the pattern of Git fetch or clone traffic on the system.
Disk pending operations (root Device)
- Pending disk operations on the root device may indicate storage system saturation for the root volume.
Disk pending operations (Data Device dm-0)
- Pending disk operations on the data device may indicate storage system saturation for the data volume.
Disk traffic (Root Device)
- Write traffic on the root volume is mostly due to logging and
collectdgraph data collection.
- Read traffic on the root volume is typically very low; However, support bundle generation may cause temporary spikes.
Disk traffic (Data Device dm-0)
- Read and write trends depend on user and integration activity.
- Plateaus in this graph may indicate storage system saturation.
Continue the conversation
There’s more to come in the “Understanding your graphs” mini-series. If you’d like to follow along, just subscribe to the “Understanding your graphs” label (link below). Please let us know if you have any questions in the comments.