Want to capture full spectrum of contributions but foiled by rate and return limits of API calls

Trying to understand the full breadth of community contributions - cumulative and by specific time period - to a repo. We built the beginning of a script, but quickly ran into the fact that we could only get 30 results per call. Sounds like that might be a pagination issue, but looks like 1000 is the limit on query results anyway. To be able to understand the cumulative contributions, we’ll need to be able to retrieve *all* of the results for the history of the repo.

Any ideas for how to get around this?

It would be great if the GitHub API Statistics call could include comments and other contributions in addition to commits.

Hi @heidimg,

Thank you for being here! 

but looks like 1000 is the limit on query results anyway. To be able to understand the cumulative contributions, we’ll need to be able to retrieve *all* of the results for the history of the repo.


Yes that is correct, that limit is documented here: https://developer.github.com/v3/search/#search

You’ll also see the same limit in the web UI – you can’t fetch more than 1000 results via Search.

The filters which code search supports are listed here:

https://help.github.com/en/articles/searching-code

That’s what you can use to try and slice your search query into several smaller queries, run each of them (and hope each will return fewer than 1000 results), and then combine the results so that you have them all.

If you’re unable to get search to give you the results you need, you might consider using an external project like https://cloud.google.com/blog/products/gcp/github-on-bigquery-analyze-all-the-open-source-code to get the data you need.

I hope this helps!

1 Like