Inconsistent error -- Something went wrong while executing your query. This may be the result of a timeout, or it could be a GitHub bug

Hey folks,

Working on Hacktoberfest and we’re noticing a rather inconsistent error appearing from the GraphQL API: Something went wrong while executing your query. This may be the result of a timeout, or it could be a GitHub bug. (there is also an ID, I’ll include a few below).

We’re authenticating users through OAuth, and are using each user’s own token to make identical GraphQL requests, fetching the user’s 100 most recent PRs:

query ($nodeId: ID!) {
  node(id: $nodeId) {
    ... on User {
      pullRequests(last: 100) {
        nodes {
          additions
          authorAssociation
          changedFiles
          closed
          commits {
            totalCount
          }
          createdAt
          deletions
          id
          isDraft
          locked
          labels(first: 100) {
            edges {
              node {
                name
              }
            }
          }
          merged
          reviewDecision
          repository {
            nameWithOwner
            repositoryTopics(first: 100) {
              edges {
                node {
                  topic {
                    name
                  }
                }
              }
            }
          }
          title
          url
        }
      }
    }
  }
}

As mentioned above, we run this same query for every user that authenticates with us using their own token, and for the vast majority, the request comes back successfully. However, for ~530 users as of writing, the query has failed with the above error.

I have tested the query with my own account, which returns a full 100 PRs without error, so it doesn’t seem like it is a request size issue, so I’m stumped what’s going on. Any insight would be appreciated, we’re worried this’ll impact some participants if we can’t fetch their PRs accurately.

For anyone in GH Eng/Support, here’s a sample of a few of the error Ids:

  • 10FB:2ACC:2897F3:58C825:615374C3
  • 9E5C:237C:F966B:286292:615372D8
  • D302:1BDC:272042:4E10FE:61536C43
  • 501A:6064:264267:4CA97A:615367D2
  • A4F1:3311:228991:488C69:615364DA

Cheers, Matt.

1 Like

Another nugget of information to help eng/support diagnose this:

We had one user that was consistently having this error returned when we attempted to run this query against their account using their token.

The user changed the ‘Include private contributions on my profile’ setting under the ‘Contributions’ heading of ‘Your profile’ settings (github.com/settings/profile) so that it was unchecked, and now the query we’re making against the API to fetch their PRs is working.


As an aside, since originally posting, we’ve now seen this error on over 2k different user accounts.

As a further update, we’ve since had other users that we’ve found to be consistently getting this error, and disabling ‘Include private contributions on my profile’ did not resolve it for them.

We’ve now seen over 21k instances of this error, over roughly 4k different user accounts. The vast majority of these accounts only intermittently see the error, but we do have multiple confirmed users that are consistently getting this error and it is preventing us from tracking their pull requests to let them participate in Hacktoberfest.

Hi @mattipv4 :wave:

First off, I’m so sorry for the delay in response here. So thank you for continuing to update this thread with helpful updates!

Next, I want to ask if you’re experiencing this on an instance of GHES? If so, we should be getting this logged as a ticket with our Enterprise Support team:

Following that path will provide us with stricter SLA times, when replying to your submissions. Here, we may miss it, and leave it open to the Community members themselves, to help out.

If this is not occurring inside of GHES, and within a standard user account (your reproduction in your personal account suggests not) then I’m wondering if this happens from a GH app? Or a manual run of these queries?


That said, I took a look at your request IDs (thank you for sharing!) and it looks to be coming from a Slack-bot. At least, that’s what I found for 10FB:2ACC:2897F3:58C825:615374C3.

I wasn’t able to find log entries for your other request IDs =(

Were they collected more than 7 days ago?

For the ID that does return data and suggests this is being performed by a Slack bot: Is that interacting with an instance of GHES? It wasn’t clear to me in the log entry, unfortunately.


However, GraphQL requests, tend to be very heavy, and this is a fairly clear example of a heavy query, that might do well with processing these in smaller chunks. I’m guessing (yes, guessing) that these are timeouts, and the error message you’re receiving is less than helpful, to identify that it’s timing out.

Is it possible to introduce batches of requests, rather than running them all at once, when they’re run?

Or perhaps I misunderstand…


And to reiterate, if you’re running against an instance of GitHub Enterprise, it would be best to reach out to the Enterprise Support team. They’ll be able to look at more details and respond more directly, than we would feel comfortable discussing publicly.

Cheers!

1 Like

Hey!

Thanks for responding :slight_smile:

This is happening with the GitHub.com GraphQL API, not a GHE instance.

The queries are running authenticated as each user, through the token we get when the user authenticates with the Hacktoberfest GitHub app via OAuth.

We have two different processes that are running these queries, though both run the same query against the GitHub API in the same way, but different triggers:

  1. We have a batch processing job that runs every 6 hours, iterating through users sequentially and fetching their PRs via this query
  2. Users can also trigger a manual refresh of their PRs through a web request to our API which then triggers this query (we ratelimit this to once per user every 15 minutes)

(Both are Node.js apps, one being a Restify server written in Node, the other being a pure Node script running as a background job, not Slack bots)

Neither feels like it should be generating significant enough traffic to be overwhelming GitHub’s API to the point of timeouts (and if that were the case would it not cause wider issues across the public API?) – or do you mean that each request in itself is too complex, and that we should batch the PR fetching for each user into smaller chunks rather than just one request for 100 PRs?

Here are a few fresh IDs from our logs to help in digging:

  • F6B6:3AE5:6E86E8:13EA4BE:615CC700
  • 94B4:764B:99B08A:12F7C93:615CC5CD
  • AA98:1F31:929838:1224776:615CC4D5
  • BF27:2C05:835988:15E735A:615CC317

Those are all from the last 30 minutes or so.

Cheers, Matt.

Hi @nethgato,

I’m one of those affected by the issue, i.e. my PRs do not show up on my Hacktober’s profile. Do let me know whether you’d like me to do/test anything under my account.

Cheers,
rjc

FWIW, it is working now.

Hi. Not sure if this is related, but when I visit Hacktoberfest 2021, my profile never loads. It just says, “PROFILE Loading…”.

There are some errors in the console, but I wouldn’t know where to begin to try to find something useful from them.

Anyway, I know there are still a few weeks to go, but I’d love to get things working sooner rather than later. Any suggestions/advice/help…?

I should add that I was sent here after speaking with someone a few times over the last few days from the Hacktoberfest Discord server…so I may or may not be in the right place. :stuck_out_tongue_winking_eye:

Hi all. Sorry for the “intrusion”, but I just went back to the page, cleared all app data via dev tools, refreshed, went through the registration steps, and all looks well.

We’re definitely still seeing lots of occurrences of the error from GitHub across other accounts – @rjc did you change anything on your account that might’ve resulted in your account being able to be processed?

Sweet, definitely unrelated to this thread :+1:

Hey @mattipv4 :wave:

Thanks so much for the updated request IDs! Already, there’s more detailed information in our logs, and I’m going to do a scrub shortly as I’m signing on for the day.

Though to your point:

Neither feels like it should be generating significant enough traffic to be overwhelming GitHub’s API to the point of timeouts (and if that were the case would it not cause wider issues across the public API?) – or do you mean that each request in itself is too complex, and that we should batch the PR fetching for each user into smaller chunks rather than just one request for 100 PRs?

The latter would likely be true. A single heavy GQL request could easily be hitting a timeout on our side. But again, I’ve only just now seen that there’s better logging with your current IDs and I’ll review them in more detail here in a minute.

Thanks again and I’ll get back to this thread soon! :bow:

1 Like

Still going through the logs, but I do in fact see timeouts on our side, when attempting to complete a POST action with the request ID you provided:

  • F6B6:3AE5:6E86E8:13EA4BE:615CC700

And to double check @mattipv4 – this (I assume) is still ongoing enough to be problematic for our Hacktober friends?

Either way, I’m going to raise this with our engineering team to see if there’s any action we can take on our side.

Thanks again for reporting this :bow:

Hey!

Ah okay, good to know it is just a timeout and not a pure bug then.

Definitely still an issue, we’ve seen 5k occurrences of the error in the last 12hrs, and we think we have about 200 or so users that are consistently hitting this (though some of those may also be folks that’ve de-authed the app), and a lot more that randomly run into it (even my account had an occurrence of it earlier).

Appreciate you looking into this and escalating it, hope to hear some good news from eng :slight_smile:

Matt.

Hey again @mattipv4 :wave:

Thanks again for your replies! Without revealing our “cards,” too much, there are a couple open issues on our side, to assist with situations like this. However yes, we are hitting timeouts, and it’s unfortunate that the message returned suggests that there might be a bug.

We have open requests to both our engineering and docs teams to adjust the language in the error returned when particular timeouts are seen with GQL specifically.

After reviewing your logs with teammates, we see that it’s likely due to the first 100 labels. We had seen this before with another customer, and request that you attempt to lower that to 50, see how it goes, lower to 25, etc.

There are certainly improvements we can make to both our error messaging and to our documentation and we’re following up internally.

I hope that these quick adjustments help out! Especially if nearly the full month remaining, we’d love to have a smooth experience for your (and our) users.

Oh, interesting, would not have expected it to be the labels that were causing this. I’ll try lowering the count there and see how it goes, and will report back on our findings.

Thanks again for digging into this!

1 Like

Spoke too soon. I see the same thing again, albeit this time showing contributions up to yesterday and no more. No, I hadn’t changed anything.

:wave: Just as an update, I deployed a change yesterday to reduce our labels subrequest from 100 to 25.

Anecdotally over the last few batch runs we’ve done, this change does appear to have reduced our error rate by roughly 10%, but we still saw a roughly 2% error rate across users when requesting PRs.

I intend to push out a further change in a short while to reduce our labels request even further, down to just 20, and then also update the topics request to be for just 20 (which I realise is all that the UI will allow anyway).

I’ll report back once I have an update on how that change impacts things.

1 Like

:wave: Further update from over the weekend, I rolled out a further reduction in the query so that we’re only requesting 20 labels and 20 topics for each of the 100 PRs.

Unfortunately, it seems this change had little impact as after inspecting the logs for the last few runs there is still a ~2% error rate when requesting users’ PRs from GitHub.

A few fresh IDs for any further digging:

  • 59abdc943cfe4ea68464659c418f6203
  • e5ec440e6af646868287ef10a375e649
  • a8c2c3d307a64356aca98e03e59d28bb
  • 3af8fc44fe5045bb907d54321b5122c5

And for reference, this is the full query we’re running now (a few new props got added to allow us to better handle abuse):

query ($nodeId: ID!) {
  node(id: $nodeId) {
    ... on User {
      pullRequests(first: 100, orderBy: { field: CREATED_AT, direction: DESC }) {
        nodes {
          additions
          authorAssociation
          changedFiles
          closed
          commits {
            totalCount
          }
          createdAt
          deletions
          id
          isDraft
          locked
          labels(first: 20) {
            edges {
              node {
                name
              }
            }
          }
          merged
          reviewDecision
          repository {
            id
            isArchived 
            isDisabled
            isPrivate
            licenseInfo {
              spdxId
            }
            nameWithOwner
            primaryLanguage {
              name
            }
            repositoryTopics(first: 20) {
              edges {
                node {
                  topic {
                    name
                  }
                }
              }
            }
            visibility
          }
          title
          url
        }
      }
    }
  }
}

Is there anything else obvious on y’all’s end that we can do to reduce this error rate, outside of us just reducing the 100 PRs down to a smaller number and paginating (would prefer to avoid this, so that we’re sticking to making a single API call per user)?

Cheers, Matt.

Hey Matt!

Thanks so much for reporting your updated experience. Also cool to see you in one of our internal Slack channels.

While we also have other folks we’re trying to get to digging, I’ll be looking at these new request IDs shortly, to see if there are any obvious patterns.

1 Like