Deploying to production in Kubernetes can be quite stressful. Even after meaningful and reliable automated tests have successfully passed, there is still room for things to go wrong and lead to a nasty incident when pressing the final button.

Thankfully, Kubernetes is made to be resilient to this kind of scenario, and rolling back is a no-brainer. But still, rolling back means that, at least for some time, all of the users were negatively impacted by the faulty change…

What if we could smoke test our change in production before it actually hits real users? What if we could roll out a change incrementally to some users instead of all of them at once? What if we could detect a faulty deployment and roll it back automatically?
Well, that, my friend, is what Canary Deployment is all about!

Minimizing the impact on real users while deploying a risky change to production.

🎬 Hi there, I’m Jean!

In this 3 parts series, we’re going to explore several ways to do Canary Deployment in Kubernetes, and the first one is…
🥁
… using Ingress NGINX! 🎊

Requirements


Before we start, make sure you have the following tools installed:

Note: for MacOS users or Linux users using Homebrew, simply run:
brew install kind kubectl helm k6

All set? Let’s go! 🏁

Creating Kind Cluster


Kind is a tool for running local Kubernetes clusters using Docker container “nodes”. It was primarily designed for testing Kubernetes itself, but may be used for local development or CI.

I don’t expect you to have a demo project in handy, so I built one for you.

git clone https://github.com/jhandguy/canary-deployment.git
cd canary-deployment

Alright, let’s spin up our Kind cluster! 🚀

➜ kind create cluster --image kindest/node:v1.27.3 --config=kind/cluster.yaml
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.27.3) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂

Installing NGINX Ingress Controller


NGINX Ingress Controller is one of the many available Kubernetes Ingress Controllers, which acts as a load balancer and satisfies routing rules specified in Ingress resources, using the NGINX reverse proxy.

NGINX Ingress Controller can be installed via its Helm chart.

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx/ingress-nginx --name-template ingress-nginx --create-namespace -n ingress-nginx --values kind/ingress-nginx-values.yaml --version 4.8.3 --wait

Now, if everything goes according to plan, you should be able to see the ingress-nginx-controller Deployment running.

➜ kubectl get deploy -n ingress-nginx
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
ingress-nginx-controller   1/1     1            1           4m35s

Configuring Ingress Canary Annotations


Now that our NGINX Ingress Controller is up and running, let’s get into the thick of it, shall we? 🧐

helm install sample-app/helm-charts/ingress-nginx --name-template sample-app --create-namespace -n sample-app --wait

If everything goes fine, you should eventually see two Deployments with the READY state.

➜ kubectl get deploy -n sample-app
NAME                READY   UP-TO-DATE   AVAILABLE   AGE
sample-app          1/1     1            1           100s
sample-app-canary   1/1     1            1           100s

Alright, let’s take a look at what’s under all this!

➜ ls -1 sample-app/helm-charts/ingress-nginx/templates
canary
deployment.yaml
ingress.yaml
service.yaml
serviceaccount.yaml

➜ ls -1 sample-app/helm-charts/ingress-nginx/templates/canary
deployment.yaml
ingress.yaml
service.yaml

As you can see, most of the resources have been duplicated, except for the serviceaccount.yaml:

  • The deployment.yaml and the canary/deployment.yaml are strictly the same, apart from the name and the image tag.
  • The service.yaml and the canary/service.yaml are also very much the same, except for the name and the label selector.

This is actually for a simple reason: in the end, the stable and the canary deployments are meant to be very much alike, apart from the image built inside the container, exactly like for a blue/green deployment.

Now, the ingress.yaml and the canary/ingress.yaml are similar, but not very much the same though: you’ll notice some extra annotations in the canary/ingress.yaml.

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  ...
  annotations:
    ...
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "{{ .Values.canary.weight }}"
    nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
...

This is where the magic happens! 🪄

Let’s dive into each of these annotations real quick:

  • nginx.ingress.kubernetes.io/canary
    When set to "true", this will let the Ingress Controller know that this is the Ingress routing traffic to the canary Deployment.
  • nginx.ingress.kubernetes.io/canary-weight
    When given a value, this will tell the Ingress Controller how to split traffic between the stable and the canary Deployments. For instance, a weight of 50 would result in a 50/50 split.
  • nginx.ingress.kubernetes.io/canary-by-header
    When given a key, this will allow us to force a request into hitting the canary or the stable deployment, no matter the canary-weight. By using the header value always, the request will always land in the canary Deployment, and when using the value never, the request will always land in the stable one.

Right now, the canary-weight is 0 and the canary-by-header is X-Canary. This means that normal requests will never land in the canary Deployment unless a header X-Canary: always is passed to the request.

➜ kubectl describe ingress sample-app-canary -n sample-app
...
Annotations:  ...
              nginx.ingress.kubernetes.io/canary: true
              nginx.ingress.kubernetes.io/canary-by-header: X-Canary
              nginx.ingress.kubernetes.io/canary-weight: 0
...

Let’s give it a try!

curl localhost/success -H "Host: sample.app"

As you can see, no matter how many times you run this command, the pod will always be the same. In fact, this is the stable pod.

➜ curl localhost/success -H "Host: sample.app"
{"node":"kind-control-plane","namespace":"sample-app","pod":"sample-app-6bd9dc6d5d-jstn2","deployment":"stable"}

➜ curl localhost/success -H "Host: sample.app"
{"node":"kind-control-plane","namespace":"sample-app","pod":"sample-app-6bd9dc6d5d-jstn2","deployment":"stable"}

➜ curl localhost/success -H "Host: sample.app"
{"node":"kind-control-plane","namespace":"sample-app","pod":"sample-app-6bd9dc6d5d-jstn2","deployment":"stable"}

This is due to the fact that the current canary-weight is 0 thus 100% of the traffic will normally go to the stable Deployment.

Now let’s try with the X-Canary: always header!

curl localhost/success -H "Host: sample.app" -H "X-Canary: always"

Aha! We are now consistently hitting the canary pod.

➜ curl localhost/success -H "Host: sample.app" -H "X-Canary: always"
{"node":"kind-control-plane","namespace":"sample-app","pod":"sample-app-canary-b559b9d75-gfvxp","deployment":"canary"}

➜ curl localhost/success -H "Host: sample.app" -H "X-Canary: always"
{"node":"kind-control-plane","namespace":"sample-app","pod":"sample-app-canary-b559b9d75-gfvxp","deployment":"canary"}

➜ curl localhost/success -H "Host: sample.app" -H "X-Canary: always"
{"node":"kind-control-plane","namespace":"sample-app","pod":"sample-app-canary-b559b9d75-gfvxp","deployment":"canary"}

Did you pay attention to what we just did? 😉
This has a name: Smoke Testing! 🔥

That’s right, we just smoke-tested a change, in production, while isolating it from the rest of the users as normal requests are routed to the stable Deployment thanks to the canary-weight. However, we bypassed that during Smoke Testing by utilizing the canary-by-header to force the request into hitting the canary Deployment.
Isn’t that awesome?! No more panic attacks when smoke tests fail on Canary, your users aren’t affected. 😌

Rolling-out Canary Deployments Incrementally


So we have solved one problem, yet one remains: How do we expose a change to our real users incrementally, to minimize the impact if something goes south?

Well, you’ve probably guessed it, canary-weight to the rescue! 🚀

We can simply slowly increase the canary-weight in order to let some users being routed to the canary Deployment. Let’s give it a try!

helm upgrade sample-app sample-app/helm-charts/ingress-nginx -n sample-app --reuse-values --set canary.weight=50 --wait

Let’s verify that our canary Ingress now has a canary-weight of 50!

➜ kubectl describe ingress sample-app-canary -n sample-app
...
Annotations:  ...
              nginx.ingress.kubernetes.io/canary: true
              nginx.ingress.kubernetes.io/canary-by-header: X-Canary
              nginx.ingress.kubernetes.io/canary-weight: 50
...

All good, let’s test it out!

curl localhost/success -H "Host: sample.app"

As you can see, traffic is now equally split between the stable and canary Deployments.

➜ curl localhost/success -H "Host: sample.app"
{"node":"kind-control-plane","namespace":"sample-app","pod":"sample-app-6bd9dc6d5d-jstn2","deployment":"stable"}

➜ curl localhost/success -H "Host: sample.app"
{"node":"kind-control-plane","namespace":"sample-app","pod":"sample-app-canary-5754f4bbc7-lvthj","deployment":"canary"}

➜ curl localhost/success -H "Host: sample.app"
{"node":"kind-control-plane","namespace":"sample-app","pod":"sample-app-6bd9dc6d5d-jstn2","deployment":"stable"}

➜ curl localhost/success -H "Host: sample.app"
{"node":"kind-control-plane","namespace":"sample-app","pod":"sample-app-canary-5754f4bbc7-lvthj","deployment":"canary"}

Load Testing with k6


Now we’ve made a few requests, let’s see how it behaves under load and whether or not it indeed guarantees about 50% traffic distribution!

For Load Testing, I really recommend k6 from the Grafana Labs team. It is a dead-simple yet super powerful tool with very extensive documentation.

See for yourself!

k6 run k6/script.js

After about 1 minute, k6 should be done executing the load test and show you the results.

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: k6/script.js
     output: -

  scenarios: (100.00%) 1 scenario, 20 max VUs, 1m30s max duration (incl. graceful stop):
           * load: Up to 20.00 iterations/s for 1m0s over 2 stages (maxVUs: 20, gracefulStop: 30s)


     ✓ status code is 200
     ✓ node is kind-control-plane
     ✓ namespace is sample-app
     ✓ pod is sample-app-*
     ✓ deployment is stable or canary

   ✓ checks.........................: 100.00% ✓ 3095      ✗ 0
     data_received..................: 160 kB  2.7 kB/s
     data_sent......................: 71 kB   1.2 kB/s
     http_req_blocked...............: avg=48.21µs  min=3µs    med=18µs   max=2.38ms  p(90)=23µs   p(95)=30.09µs
     http_req_connecting............: avg=24.56µs  min=0s     med=0s     max=1.48ms  p(90)=0s     p(95)=0s
   ✓ http_req_duration..............: avg=4.87ms   min=800µs  med=4.24ms max=34.52ms p(90)=7.85ms p(95)=9.88ms
       { expected_response:true }...: avg=4.87ms   min=800µs  med=4.24ms max=34.52ms p(90)=7.85ms p(95)=9.88ms
     http_req_failed................: 0.00%   ✓ 0         ✗ 619
     http_req_rate..................: 50.00%  ✓ 619       ✗ 619
     ✓ { deployment:canary }........: 47.65%  ✓ 295       ✗ 324
     ✓ { deployment:stable }........: 52.34%  ✓ 324       ✗ 295
     http_req_receiving.............: avg=132.17µs min=24µs   med=136µs  max=610µs   p(90)=179µs  p(95)=213.09µs
     http_req_sending...............: avg=70.69µs  min=15µs   med=72µs   max=736µs   p(90)=94.2µs p(95)=101µs
     http_req_tls_handshaking.......: avg=0s       min=0s     med=0s     max=0s      p(90)=0s     p(95)=0s
     http_req_waiting...............: avg=4.67ms   min=719µs  med=4.02ms max=34.2ms  p(90)=7.64ms p(95)=9.65ms
     http_reqs......................: 619     10.316511/s
     iteration_duration.............: avg=5.59ms   min=1.08ms med=5.02ms max=41.22ms p(90)=8.68ms p(95)=10.52ms
     iterations.....................: 619     10.316511/s
     vus............................: 0       min=0       max=0
     vus_max........................: 20      min=20      max=20


running (1m00.0s), 00/20 VUs, 619 complete and 0 interrupted iterations
load ✓ [======================================] 00/20 VUs  1m0s  00.71 iters/s

That sounds about right!
Out of 619 requests, 295 (48%) were served by the canary Deployment while 324 (52%) were served by the stable one. Pretty good!

Wrapping up


That’s it! You can now delete your Kind cluster.

kind delete cluster

To summarize, using Ingress NGINX we were able to:

  • Smoke test our change in complete isolation from the rest of the users;
  • Incrementally expose our change to some users to minimize risk.

Was it worth it? Did that help you understand how to implement Canary Deployment in Kubernetes using Ingress NGINX?

If so, follow me on Twitter, I’ll be happy to answer any of your questions and you’ll be the first one to know when a new article comes out! 👌

See you next week, for Part 2 of my series Canary Deployment in Kubernetes!

Bye-bye! 👋