Step by Step Towards Zero Downtime Deployment - Rolling Updates are Here
by Puja Abbassi on Jun 9, 2015
Update: With swarm CLI 0.18.0 we introduced different update strategies that you can choose from. The one described in this blog post is the default one-by-one
strategy. There’s another strategy called hot-swap
, which gives you zero downtime no matter how many instances your component has. We are continuously improving upon update our strategies.
Zero Downtime Deployment is one of the promises that microservices architectures and container technologies make. However, both only help you on your way there, but actually doing a rolling update without your application going down at all is still not trivial. Besides a lot of prerequisites, often times there’s quite some manual work involved. On Giant Swarm it’s now possible with only a single command.
Until recently, we had two ways to do rolling updates:
The first option, which we used in our documentation deployment, was to have two or more components with basically the same image deployed, e.g. “content-master” and “content-slave”. In front of these two components you would put a proxy or load balancer to serve the content of these. Now a rolling update would look like following (staying with the above-mentioned example):
- Push a new image for your container.
swarm update content-master
and wait for it to be up again.swarm update content-slave
and wait for it to be up again.
Now this is already very easy, but it only works in very few cases, like e.g. our documentation that is based on Hugo. This was and is still available to all our users.
The second option, which would work for many more use cases, was not available publicly and involved getting your hands dirty with fleet. This option we previously used for updating our website with zero downtime. Our website deployment invloves a flask app, which is scaled to at least three instances. Doing a rolling update would involve following steps:
- SSH into the cluster that it runs on.
- Find the units corresponding to the single instances of the component.
- Stop one unit and wait for it to go down.
- Start the unit again and wait for it to come up.
- Repeat 3. and 4. for all units.
One problem with this was that it took quite some time and manual work to stop, wait, start, wait for each instance, and this would only increase with more instances of a component. Another problem was that you needed access to the underlying cluster, which is first risky, as you suddenly have access to a lot of things that can break your infrastructure and second not something we would want to bother our users with, as our goal is to abstract away said infrastructure (remember?).
Thus, we are happy to announce that we now have automated the whole process and that rolling updates are available to all users of Giant Swarm without even having to update their client or anything. It’s actually one single command and it’s already there in our CLI:
$ swarm update mycomponent
As simple as that! Do swarm update
on any component that is scaled to more than one instance and you automagically get a rolling update. Ideally this deploys your updated image to the component with zero downtime. However, this is just a first step towards a more sophisticated implementation, thus, there might be short downtimes in cases with only few (2 or 3) instances and/or with images that take a lot of time to start.
“What if I have a component that has only one instance?”, you say? Here, swarm update
will just do a regular update of that component to an updated image. However, if your component is stateless, you can do
$ swarm scaleup mycomponent 3
$ swarm update mycomponent
$ swarm scaledown mycomponent 3
and again you get zero downtime deployment (be sure to increase instances by a higher number if you fear your image will take more time starting up). Try it out!
To be clear, this is an early implementation of the feature and there’s still some use cases that it doesn’t support, but it’s our first step towards making zero downtime deployments easier for you. If you have special use cases or scenarios, which you think we should consider in the future, please tell us about it on our Gitter Group, on IRC channel #giantswarm
on freenode, or via email.
You May Also Like
These Related Stories
Get Visual with Our Web UI
As of today, all users have access to The Giant Swarm Web UI. Use it to get an overview of the status of your services. It also features a novel way t …
Introducing A Better Way To Define Services
We’re moving to a new format for defining services on Giant Swarm. In this blog post we want to explain to you the migration timeline, main changes, b …
Self Driving Clusters - Managed Autoscaling Kubernetes on AWS
Giant Swarm provides managed Kubernetes clusters for our customers, which are operated 24/7 by our operations team. Each customer has their own privat …