GraphQL GitHub API: How to get more than 1000 Pull Requests?

Hi,

I’m unable to get more than 1000 PRs from the GraphQL API. I’m aware of the resource limitations, but 1000 nodes is far from the limit of 50000 stated in that document. I think I’m incorrectly using pagination, but can’t figure out where exactly I’m wrong here.

Can anybody help me to understand the reason?

Here is a simplified version of my code:

#!/usr/bin/env python

from __future__ import print_function

import os
import requests
import sys
import time

QUERY = """
{
  search(query: "is:pr repo:kubernetes/kubernetes created:>2016-01-01", type: ISSUE, first: 100, %s) {
    issueCount
    pageInfo {
        endCursor
        hasNextPage
    }
    edges {
      cursor
      node {
        ... on PullRequest {
          number
        }
      }
    }
  }
}
"""

def main(argv):
    if len(argv) > 1:
        token = argv[1]
    else:
        token = os.getenv('GITHUB_TOKEN')
    if not token:
        print("Usage: prs <github token>")
        return 1

    after = ''
    while True:
        request = requests.post('https://api.github.com/graphql',
                                json={'query': QUERY % after},
                                headers={"Authorization": "Bearer %s" % token})
        result = request.json()

        if 'errors' in result:
            print(result)
            return 1

        search = result['data']['search']

        print("Issue count:", search['issueCount'], "number of edges:", len(search['edges']))

        print("PageInfo:", search['pageInfo'], "cursor:", search['edges'][-1]['cursor'], "\n")

        if not search['pageInfo']['hasNextPage']:
            print("done")
            break

        after = 'after: "%s"' % search['edges'][-1]['cursor']
        time.sleep(1)

if __name__ == ' __main__':
    sys.exit(main(sys.argv))

and its output:

$ ./prs-test
Issue count: 33754 number of edges: 100
PageInfo: {u'endCursor': u'Y3Vyc29yOjEwMA==', u'hasNextPage': True} cursor: Y3Vyc29yOjEwMA==

Issue count: 33754 number of edges: 100
PageInfo: {u'endCursor': u'Y3Vyc29yOjIwMA==', u'hasNextPage': True} cursor: Y3Vyc29yOjIwMA==

Issue count: 33754 number of edges: 100
PageInfo: {u'endCursor': u'Y3Vyc29yOjMwMA==', u'hasNextPage': True} cursor: Y3Vyc29yOjMwMA==

Issue count: 33754 number of edges: 100
PageInfo: {u'endCursor': u'Y3Vyc29yOjQwMA==', u'hasNextPage': True} cursor: Y3Vyc29yOjQwMA==

Issue count: 33754 number of edges: 100
PageInfo: {u'endCursor': u'Y3Vyc29yOjUwMA==', u'hasNextPage': True} cursor: Y3Vyc29yOjUwMA==

Issue count: 33754 number of edges: 100
PageInfo: {u'endCursor': u'Y3Vyc29yOjYwMA==', u'hasNextPage': True} cursor: Y3Vyc29yOjYwMA==

Issue count: 33754 number of edges: 100
PageInfo: {u'endCursor': u'Y3Vyc29yOjcwMA==', u'hasNextPage': True} cursor: Y3Vyc29yOjcwMA==

Issue count: 33754 number of edges: 100
PageInfo: {u'endCursor': u'Y3Vyc29yOjgwMA==', u'hasNextPage': True} cursor: Y3Vyc29yOjgwMA==

Issue count: 33754 number of edges: 100
PageInfo: {u'endCursor': u'Y3Vyc29yOjkwMA==', u'hasNextPage': True} cursor: Y3Vyc29yOjkwMA==

Issue count: 33754 number of edges: 100
PageInfo: {u'endCursor': u'Y3Vyc29yOjEwMDA=', u'hasNextPage': False} cursor: Y3Vyc29yOjEwMDA=

done

Thank you,

Ed

5 Likes

Hi Ed,

Apologies for the delay response, are you still having issues with pagination?

I am facing the same issue.

It seems the node in position 1000 just doesnt have a next cursor to go to.

There is very little documentation or support for this problem.

please assist.

I’m facing too.

This limitation should be documented.

1 Like

+1 facing the same issue

1 Like

Thank you I was waiting fir this information and data

1 Like

Same issue. I get nothing after the 1000th node. Did anyone find a work around?

2 Likes

I am facing the same issue by querying repositories

query($language: String!, $endCursor: String, $numberOfRepos: Int){
  search(query: $language, type: REPOSITORY, first: $numberOfRepos, after: $endCursor) {
    pageInfo {
      startCursor
      hasNextPage
      endCursor
    }
    edges {
      node {
        ... on Repository {
          nameWithOwner
          isArchived
          stargazers{
              totalCount
          }
        }
      }
    }
  }
}

And then my input variables are

{
	"language": "stars:>=500 sort:stars",
	"endCursor": "Y3Vyc29yOjk5OQ==",
	"numberOfRepos": 1
}

and this is the response I got

{
    "data": {
        "search": {
            "pageInfo": {
                "startCursor": "Y3Vyc29yOjEwMDA=",
                "hasNextPage": false,
                "endCursor": "Y3Vyc29yOjEwMDA="
            },
            "edges": [
                {
                    "node": {
                        "nameWithOwner": "foreversd/forever",
                        "isArchived": false,
                        "stargazers": {
                            "totalCount": 12814
                        }
                    }
                }
            ]
        }
    }
}

I believe this is not correct as, I cannot believe that the next repo has less than 500 stars.
I have tried by querying repos with less than 5, but the result is the same.

Any ideas? Thanks

PS: I think that was the 1000th node as well.

Using the same setup, the API v4 is still only returning the first 1000 results of a search query. Is this intended behaviour?

1 Like

@andreagriffiths11 Could you guys please address this issue? Thanks

1 Like

Hi @matteogioioso,

Thanks for the ping! The behavior you observed is expected – it’s not possible to fetch more than 1000 results for a specific search query. This is true for the web UI, REST API and GraphQL API. There’s no way around this restriction.

However, there might be a different approach you might use. You could slice your query, e.g. based on the date the pull request was created.

For example: 

{
  search(query: "repo:atom/atom is:pr is:open created:>2020-04-01", type: ISSUE, last: 100) {
    edges {
      node {
        ... on PullRequest {
          url
          title
          createdAt
        }
      }
    }
  }
}

You’d then run multiple searches to collect the results for each slice (and the number of results would be less than 1000 since you’re limiting the query to a small period), then combine results for all slices, and finally sort the whole list as you like. I know this is an ideal solution and will pass on the feedback to the team to take a look.

Hope this helps.

3 Likes

@andreagriffiths11 Thank you very much for the response. Maybe I did not notice it, but is this mention somewhere in the documentation?

Thanks

Unfortunately, it is still not possible to paginate over more than 1000 results. Is there any way to maybe implement some kind of “IF” statement? Maybe we could get all issues with a title longer than 10 symbols and then <10, which would double the amount of returned issues?