I have a GitHub App with few thousands users (https://github.com/apps/code-inspector). I face currently one major problem: when we get the access token, we sometimes cannot checkout the repository. Sometimes, we get the error "the repository XXX does not exist". But the repository does exist since it works on other, further attempts.
The token seems also invalid when I try to get the list of repository, I get an authentication error.
If that helps, I am using PyGitHub to get the token and interact with the API.
Any idea where it could come from?
Note that it does not seem related to the library I am using.
From time to time, when I try to clone the repository, I get the error Invalid username or password.\nfatal: Authentication failed for ....
I am using a command like this to clone the repository git clone https://x-access-token:<token-generated-from-github-app>@github.com/<full_name>.git
If I clone from another machine, I have no problem. And the clone will succeed on this machine if it tried few minutes after.
Is there a mechanism to avoid to checkout too often?
I'm having the exact same issue in my GitHub App https://github.com/apps/skeema-io , which is written in Golang. For the past few weeks, a portion of my git clone calls (using x-access-token exactly like you) are randomly failing with `remote: Invalid username or password.\nAuthentication failed for ...`
According to my logs, the problem started on the night of April 27 and has become more frequent over time, especially this past week.
This is a guess, but so far I believe the root cause is an internal technical issue on GitHub's side, specifically either database replication lag or cache inconsistency. My suspicion is that if you create a new access token and then immediately use it to clone a repo, the token is sometimes being checked against a db/cache that is lagging -- i.e. the INSERT corresponding to the access token's row has not yet replicated to the db/cache that is being queried to perform the auth check.
Today I added the following work-arounds to my application, and this seems to have solved the problem so far:
The idea is to just give the new access token time to replicate.
One additional mitigation measure that I'd suggest:
The idea with this one is to reduce the number of new access tokens you need, reducing the frequency of the entire situation.
Hope this helps! I'm surprised more people aren't hitting this!
Thanks @evanelias for the details! I implemented a similar strategy (wich caches) and it still faces problems. Sometimes, the same token is used and then, does not work for few minutes and works again. I also implemented a threshold mechanism there I do not checkout the same repository more than once per minute.
However, this is becoming problematic. Can a GitHub staff provides some insights here?
> Sometimes, the same token is used and then, does not work for few minutes and works again
Still sounds like DB replication lag to me :) If there are multiple replicas in a region, and one is lagging more than the others, it may just be random luck which replica is queried for any given auth check.
For context, the GitHub eng blog had a recent postmortem post indicating that they've been actively moving queries off of an overloaded master db onto replicas. And subsequently there was another outage in late April just a couple days before this error started coming up. Maybe unrelated, I'm just speculating here. But I could certainly understand if GitHub staff can't comment on it yet, if this is something they're still actively working on, e.g. the ongoing sharding efforts mentioned in the post-mortem post.