Is rate limiting for docs.github.com documented anywhere?

Hello! We are using GitHub Actions for link checking on our project’s Markdown documentation and see the occassional 403 error when checking valid links that are hosted on docs.github.com or the older help.github.com (with proper 3xx HTTP redirects). We have examples of this failing in our CI tests in a GitHub Actions workflow using managed runners, we don’t use self-managed runners. I have found documentation about /api/v3/rate_limit on REST and GraphQL GitHub services. That endpoint obviously does not work for docs.github.com, and I cannot find clear documentation on GitHub Docs and other services (that are nor REST-based or meant for programmatic operation) being rate-limited and how to manage that. Am I missing something? This job runs during our PRs, but I assume less than 1,000 times a day or week.

Thanks for any help or pointers!

That content is backed by a repository. Instead of attacking it with a spider, you could clone GitHub - github/docs: The open-source repo for docs.github.com and either check against the source or build yourself.

There’s really no good reason to do what you’re doing. And what you’re doing is a trap for anyone who might contribute to your repository.

(Speaking as someone who has contributed to a bunch of repositories with bad link checkers as well as the docs repository.)

I appreciate your feedback here, but we feel there is a good reason to do what we are doing. I presume you do not have experience with the rate limiting behind docs.github.com and the immediate question then? I understand your viewpoint, but it does not address my question.

We do not selectively look for which links, but we maintain several inter-related projects with documentation for public consumption and often the links go stale. This approach allows us to maintain developer bandwidth (and there are a small number of us) and it actually helps us catch stale links more often than not, so I am not our team lead is going to debate merits of the choice with me. We see the merit for now. :slight_smile:

Only to the extent that I’ve probably tripped on it when making changes to the kubernetes ingress repository which ran a link checker.

You could file an issue in the GitHub/docs repository asking about it.

They don’t have any public headers that talk about it.

Personally, I’d set up a cache of files I’ve found/not found/when using actions/cache and thus limit the number of hits they see per time window. It’d still catch things as long as I expired entries, e.g. weekly.

While the repository has a lot of commits, overall, pages don’t get renamed/deleted that often.

1 Like

You make some good points here, thanks! I will look into the cache approach and talking to GH more formally through an issue on their repo. Have a good day! (And I will eventually check out your spell-checker action. :slight_smile: )

1 Like