Serializing/queueing deployment workflows (AWS re:invent)

So I went to a great session at AWS re:invent on using github actions for deploying to AWS.  While I’ve already moved a ton of our workflows to actions, deploys is the final piece so it was a very relevant talk/demo (and great presenters)

One question I had was if you set up a workflow on a push to master which triggers a deploy, do other pushes queue up behind this (because clearly 2 deploys running at the same time would result in :boom:.  The presenters didn’t know so I just tested it…and…they don’t :frowning: .

Merging a PR to master triggers a deploy and then a merge of a seperate PR to master while the other action is running causes 2 actions running at the same time which gives me a bad day.

Is anyone else solving this currently?  I’m thinking of some kind of lock when a deploy is running but it will mean a ton of minutes are used for the ‘waiting’ deploy while it’s trying to get the lock.  Some of our deploys run for quite a long time and we regularly have multiple PRs in “merge flight” so it’s a real problem.

Anyway, interested to see if anyone else is doing this and/or solved it.  Or if there’s something on the GH actions near term roadmap?

Cheers

Dave

ps: and if the chaps who presented at DOP322 frequent this board, thanks again for the great talk!

2 Likes

Thanks @dnorth98 . There is a plan to linearize deployment/workflows, it has been approved and in the roadmap. Will announce once it’s completed. :slight_smile:

Some info for you:

Add a ‘queue’ column to workflow_builds records, and new deferred and superseded states for those records. Can use this additional state to implement many concurrency strategies. 

2 Likes

If you deploy via SSH, then one workaround is to lock servers for deployment. You can implement something similar to Laravel’s deploy:lock/deploy:unlock:

  1. Check for ./deploy.lock file in deployment script.
  2. If ./deploy.lock is present, then stop current deployment with some help message.
  3. If ./deploy.lock is not present, then create it. Maybe you’d want to write some helpful text to lock file to be used in subsequent deployments.
  4. Do your stuff.
  5. Remove ./deploy.lock.

There’s cap deploy:lock plugin in case you use Ruby.

I’ve used similar scenario for five years in large-scale web app with many deployments per day, so I can prove it’d solve your problem.

We also had a history file of that deployment locks to help to identify problematic deployment, because there were no environment between staging and production, no canary releases and rolling deployments, and we made E2E-testing in production.

So, have a deploy lock file to identify if there’s active deployment.

The subject is also discussed here:

Github actions: Cancel redundant builds 

@ilyazub wrote:

So, have a deploy lock file to identify if there’s active deployment.

While this approach is interesting, how would it work if you are deploying via kubectl apply? There is no place that can hold that lock file, is there? Also, when implementing locks, you always need some mechanism to prevent deadlocks, and a mecahnisme to unlock in case it “got stuck”, no?

1 Like

Yeah this doesn’t quite work for our case (or I suspect a lot of cases where you’re handing off to an orchestrator to handle the deploy).  In our case we’re using AWS Elastic Beanstalk and in that case, you hand off your code to beanstalk and it orchestrates the deploys to instances.  You cannot submit another deploy while one is in progress.

Our existing CI/CD system (buddy.works) just queues deploys on the same branch so this is a non-issue.  But with actions, multiple merges of PRs to master would keep trouncing each other.  And you want to make sure the the previous deploy is finished (cannot just kill it) before kicking off the next.

Ideally, we’d be able to queue/serialize executions of a specific workflow on a specific branch.  Of course this can be implemented with some custom code in the workflow but then as I noted in my original post, you are paying for build minutes while the subsequent execution spins on some kind of lock.

2 Likes

I just want to reiterate, since we pay for minutes, any solution which involves an idle loop in the workflow is a non-starter.

Yep, that’s the good question. I was always solving this kind of problems with deployment manually.

@dannyben wrote:

Also, when implementing locks, you always need some mechanism to prevent deadlocks, and a mecahnisme to unlock in case it “got stuck”, no?

Don’t know how to solve this with kubectl, since I haven’t used it.

@dannyben wrote:

While this approach is interesting, how would it work if you are deploying via kubectl apply? There is no place that can hold that lock file, is there?

Thanks! There’s also an acceptable solution in that thread: https://github.community/t5/GitHub-Actions/Github-actions-Cancel-redundant-builds/m-p/44840/highlight/true#M5942. If permalink won’t work, here’s link to proposed GitHub Action: https://github.com/marketplace/actions/workflow-run-cleanup-action

@dannyben wrote:

The subject is also discussed here:

Github actions: Cancel redundant builds

1 Like

In the case of deployment, cancelling is potentially as bad as running multiple concurrent deploys.

I think Semaphore CI has the best solution to this: Pipeline queues

Their queues are configurable at the workflow/repo/organisation level, and define whether they allow parallel execution. This means it’s possible to have things like:

  • A deploy workflow running on the main branch only have a single instance running at once.
  • A deploy workflow in the backend repository queue up behind workflows running in the infrastructure repository.

These are some critical features for a reliable continuous deployment environment.