We have noticed since around 9:30 pm UTC today that our GitHub actions in our organisation are not running and are staying in the queued state.
Strangely enough, it is working for my own personal account and haven't seen too many posts on here regarding such issues. So just confirming whether anyone else has this issue at the moment? We have raised this with GitHub support already just to see what's happening.
We checked our billing and all seems to be okay, and I'm pretty sure they usually fail and tell you that you are out of credits anyway rather than keep your job in pending.
Our public repoistories have the issue also, so here is an example if anyone wants to see
I also forked our repo just to test and it runs
Recently, some other users also reported the same issue:
I have reported this issue to the appropriate engineering team, they will evaluate and investigate the issue. If they have any update, I will notify you in time.
@cfarrend Sorry for the late reply. And glad to hear that the problem has gone on your side.
Yeah, in the past few days the engineers had taken some measures to let the problem be mitigated.
About the reason of the problem, the problem is still under investigation. If the engineers release any explanation, I will tell you.
Apologize again for the inconvenience has caused to you.
I got the following reply from the appropriate engineers:
We had a hiccup on our backend infrastructure that impacted a small percentage of orgs and repos < 0.5% using actions. This resulted in jobs not starting for those repos.
Just want to say hi and confirm if you have received my previous reply which has provided the explanation for this issue.
Do you have any other question for this issue? Feel free to tell us if you need any help about this ticket.
Thanks for your response and help with this issue.
I got an official reply from GitHub support (Feb 28, 11:34PM UTC - 20 days after the incident):
Due to an issue with our backend systems, less than 0.1% of organizations using Actions saw a delay in workflows starting between 02-13-2020 and 02-14-2020. Your organization was identified as one of the potentially impacted ones due to this issue.
The impact was within our SLO (Service Level Objective) and we did not update the public status of Actions. We know how important the reliability of our service is for our customers, and we take this responsibility very seriously and apologize for these disruptions to run workflows.
Our engineering team deployed fixes to mitigate on 02-13-2020, a full fix at 02-14-2020 11:00 UTC after which the backlog of jobs started processing. We have taken steps to ensure we identify these issues and communicate early to the impacted customers.
Our hope is that for these future incidents we can get an update from GitHub, it's just a very unfortunate situation for us but hopeful in the fact that GitHub can improve from this.