Vertical Pod Autoscaler in Kubernetes
Table of Contents
Vertical Pod Autoscaler in Kubernetes
Table of Contents
In Kubernetes, we usually think about the Horizontal Pod Autoscaler (HPA) when referring to autoscaling. In most cases, it will be the preferred way of scaling services, based on CPU usage, memory usage, or custom metrics.
If you haven’t already, go read Horizontal Pod Autoscaler in Kubernetes (Part 1) - Simple Autoscaling using Metrics Server and learn how to implement a Horizontal Pod Autoscaler using Metrics Server!
However, while HPA can scale up and down replicas based on the current load, it is not capable of optimizing resource usage over the long term: This is where the Vertical Pod Autoscaler (VPA) comes in.
The VPA can be leveraged to optimize resource usage over time, based on mid to long-term observation.
Please note that to avoid a race condition, the VPA should only be used together with HPAs that are based on custom metrics. In addition, the VPA should not be used with JVM-based services due to limited visibility into the actual memory usage of the workload (learn more about its limitations here).
If you haven’t already, go read Horizontal Pod Autoscaler in Kubernetes (Part 2) - Advanced Autoscaling using Prometheus Adapter and learn how to implement a Horizontal Pod Autoscaler using Prometheus Adapter!
🎬 Hi there, I’m Jean!
In this article, we’re going to learn how to use Vertical Pod Autoscaler (VPA) to vertically scale services in Kubernetes automatically based on resource metrics! 💪
Before we start, make sure you have the following tools installed:
Note: for MacOS users or Linux users using Homebrew, simply run:
brew install kind kubectl helm k6
All set? Let’s go! 🏁
Kind is a tool for running local Kubernetes clusters using Docker container “nodes”. It was primarily designed for testing Kubernetes itself, but may be used for local development or CI.
I don’t expect you to have a demo project in handy, so I built one for you.
git clone https://github.com/jhandguy/vertical-pod-autoscaler.git
cd vertical-pod-autoscaler
Alright, let’s spin up our Kind cluster! 🚀
➜ kind create cluster --image kindest/node:v1.27.3 --config=kind/cluster.yaml
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.27.3) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂
cert-manager is a Kubernetes addon that automates the management and issuance of TLS certificates from various issuing sources. It ensures certificates are valid and up to date periodically, and attempts to renew certificates at an appropriate time before expiry.
cert-manager can be installed via its Helm chart.
helm repo add jetstack https://charts.jetstack.io
helm install jetstack/cert-manager --name-template cert-manager --create-namespace -n cert-manager --values kind/cert-manager-values.yaml --version 1.13.2 --wait
If everything went fine, you should be able to see three newly spawned Deployments with the READY state!
➜ kubectl get deploy -n cert-manager
NAME READY UP-TO-DATE AVAILABLE AGE
cert-manager 1/1 1 1 6m27m
cert-manager-cainjector 1/1 1 1 6m27m
cert-manager-webhook 1/1 1 1 6m27m
NGINX Ingress Controller is one of the many available Kubernetes Ingress Controllers, which acts as a load balancer and satisfies routing rules specified in Ingress resources, using the NGINX reverse proxy.
NGINX Ingress Controller can be installed via its Helm chart.
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx/ingress-nginx --name-template ingress-nginx --create-namespace -n ingress-nginx --values kind/ingress-nginx-values.yaml --version 4.8.3 --wait
Now, if everything goes according to plan, you should be able to see the ingress-nginx-controller Deployment running.
➜ kubectl get deploy -n ingress-nginx
NAME READY UP-TO-DATE AVAILABLE AGE
ingress-nginx-controller 1/1 1 1 4m35s
Metrics Server is a source of container resource metrics, which collects them from Kubelets and exposes them in Kubernetes API server through Metrics API for use by Horizontal Pod Autoscaler and Vertical Pod Autoscaler.
Metrics Server can be installed via its Helm chart.
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server
helm install metrics-server/metrics-server --name-template metrics-server --create-namespace -n metrics-server --values kind/metrics-server-values.yaml --version 3.11.0 --wait
Now, if everything goes according to plan, you should be able to see the metrics-server Deployment running.
➜ kubectl get deploy -n metrics-server
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 38s
Vertical Pod Autoscaler (VPA) is a component of the Kubernetes Autoscaler that frees users from the necessity of setting up-to-date resource limits and requests for the containers in their pods.
When configured, it will set the requests automatically based on usage and thus allow proper scheduling onto nodes so that the appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in the initial container configuration.
It can both down-scale pods that are over-requesting resources, and also up-scale pods that are under-requesting resources based on their usage over time.
Note: VPA is still in its beta phase, using it for production is at your own risk.
As of this writing, Kubernetes does not provide an official Helm chart, so I went ahead and built one!
helm install helm-chart --name-template vertical-pod-autoscaler --create-namespace -n vertical-pod-autoscaler --wait
If everything goes fine, you should eventually see three Deployments with the READY state!
➜ kubectl get deploy -n vertical-pod-autoscaler
NAME READY UP-TO-DATE AVAILABLE AGE
vert...scaler-admission-controller 1/1 1 1 2m32s
vert...scaler-recommender 1/1 1 1 2m32s
vert...scaler-updater 1/1 1 1 2m32s
As you can observe, the VPA is split into 3 separate components:
Now that the Vertical Pod Autoscaler is up and running, let’s get to it, shall we? 🧐
helm install sample-app/helm-chart --name-template sample-app --create-namespace -n sample-app --wait
If everything goes fine, you should eventually see one Deployment with the READY state.
➜ kubectl get deploy -n sample-app
NAME READY UP-TO-DATE AVAILABLE AGE
sample-app 3/3 3 3 58s
Alright, now let’s have a look at the VPA!
➜ kubectl describe vpa -n sample-app
...
Spec:
Resource Policy:
Container Policies:
Container Name: sample-app
Controlled Resources:
cpu
memory
Max Allowed:
Cpu: 100m
Memory: 200Mi
Min Allowed:
Cpu: 10m
Memory: 20Mi
Target Ref:
API Version: apps/v1
Kind: Deployment
Name: sample-app
Update Policy:
Update Mode: Auto
As you can see, this VPA is configured to scale the service based on its CPU and memory resources. Its spec states that the minimum allowed CPU/memory is 10m/20Mi and the maximum is 100m/200Mi.
Finally, its update policy is in “Auto” mode, meaning that VPA assigns resource requests on pod creation as well as updates them on existing pods using the preferred update mechanism.
Currently, both the resource requests and limits are matching the VPA’s minimum allowance.
➜ kubectl get pods -n sample-app -o yaml | grep -A 6 'resources:'
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
--
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
--
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
Now, let’s give some load to our service and see what happens!
For Load Testing, I really recommend k6 from the Grafana Labs team. It is a dead-simple yet super powerful tool with very extensive documentation.
See for yourself!
k6 run k6/script.js
While k6 is gradually increasing strain on the pods’ CPU usage, let’s watch out for any EvictedByVPA
events in a second tab: eventually, you should see all 3 pods get evicted simultaneously!
➜ kubectl get events -n sample-app -w | grep EvictedByVPA
... Pod was evicted by VPA Updater to apply resource recommendation.
... Pod was evicted by VPA Updater to apply resource recommendation.
... Pod was evicted by VPA Updater to apply resource recommendation.
As soon as this happens, have a look at the updated pods’ resource requests and limits: the CPU/memory requests/limits should have doubled in value (CPU from 10m to 20m and memory from 10Mi to 20Mi).
➜ kubectl get pods -n sample-app -o yaml | grep -A 6 'resources:'
resources:
limits:
cpu: 20m
memory: "20971520"
requests:
cpu: 20m
memory: "20971520"
--
resources:
limits:
cpu: 20m
memory: "20971520"
requests:
cpu: 20m
memory: "20971520"
--
resources:
limits:
cpu: 20m
memory: "20971520"
requests:
cpu: 20m
memory: "20971520"
This means Vertical Pod Autoscaler successfully evicted the pods in order to increase their resource requests and limits! 🎉
Once k6 is done, have a look at the Load Test summary and the result of the status code
counter metric in particular.
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: k6/script.js
output: -
scenarios: (100.00%) 1 scenario, 200 max VUs, 10m30s max duration (incl. graceful stop):
* load: Up to 40.00 iterations/s for 10m0s over 3 stages (maxVUs: 200, gracefulStop: 30s)
✗ status code is 200
↳ 97% — ✓ 17724 / ✗ 365
✗ node is kind-control-plane
↳ 97% — ✓ 17724 / ✗ 365
✗ namespace is sample-app
↳ 97% — ✓ 17724 / ✗ 365
✗ pod is sample-app-*
↳ 97% — ✓ 17724 / ✗ 365
✓ checks.........................: 97.98% ✓ 70896 ✗ 1460
data_received..................: 4.2 MB 7.1 kB/s
data_sent......................: 2.1 MB 3.5 kB/s
http_req_blocked...............: avg=18.39µs min=2µs med=8µs max=2.93ms p(90)=17µs p(95)=20µs
http_req_connecting............: avg=5.71µs min=0s med=0s max=2.74ms p(90)=0s p(95)=0s
✓ http_req_duration..............: avg=188.13ms min=491µs med=4.57ms max=59.99s p(90)=269.43ms p(95)=646.78ms
{ expected_response:true }...: avg=129.55ms min=491µs med=4.62ms max=7.3s p(90)=261.24ms p(95)=602.03ms
http_req_failed................: 2.01% ✓ 365 ✗ 17724
http_req_receiving.............: avg=98.96µs min=0s med=75µs max=4.27ms p(90)=159µs p(95)=209µs
http_req_sending...............: avg=49.92µs min=7µs med=34µs max=14.39ms p(90)=72µs p(95)=93µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=187.99ms min=451µs med=4.38ms max=59.99s p(90)=269.31ms p(95)=646.69ms
http_reqs......................: 18089 30.147771/s
iteration_duration.............: avg=188.59ms min=666µs med=5.11ms max=1m0s p(90)=270.13ms p(95)=647.49ms
iterations.....................: 18089 30.147771/s
vus............................: 0 min=0 max=145
vus_max........................: 200 min=200 max=200
running (10m00.0s), 000/200 VUs, 18089 complete and 0 interrupted iterations
load ✓ [======================================] 000/200 VUs 10m0s 00.41 iters/s
Uh-oh… It looks like we had some downtime! 😱
Thankfully, our service was able to restart relatively fast and 365 out of 17724 requests failed. But for a service with a slower startup time, this could have led to an incident! 🚨
This is due to the fact that vertical scaling, in essence, cannot happen without a restart: a pod’s CPU and/or memory cannot be increased in place. Instead, the pod must be terminated and a new one created with increased resources.
So how do we ensure the availability of our service during vertical autoscaling then?
This is where the Pod Disruption Budget (PDB) comes in!
We’ll get to that in a minute, let’s uninstall our Helm release first! (we won’t be needing this one anymore)
helm uninstall sample-app -n sample-app
To prevent downtimes during pod disruption such as the one we previously experienced, a Pod Disruption Budget (PDB) can be configured.
A PDB limits the number of pods that are down simultaneously from voluntary disruptions. It can be configured to sustain either a minimum amount of available pods (minAvailable
), or a maximum amount of unavailable pods (maxUnavailable
).
Let’s see what happens if we try to autoscale vertically the same application but with a Pod Disruption Budget with maxUnavailable: 1
.
helm install sample-app/helm-chart --name-template sample-app --set podDisruptionBudget.enabled=true --create-namespace -n sample-app --wait
Once again, you should eventually see one Deployment with the READY state.
➜ kubectl get deploy -n sample-app
NAME READY UP-TO-DATE AVAILABLE AGE
sample-app 3/3 3 3 32s
Alright, now let’s have a look at the PDB!
➜ kubectl describe pdb -n sample-app
Name: sample-app
Namespace: sample-app
Max unavailable: 1
Selector: app=sample-app
Status:
Allowed disruptions: 1
Current: 3
Desired: 2
Total: 3
As you can see, this PDB is configured to prevent more than 1 pod to be unavailable during a voluntary pod disruption (such as pod eviction by VPA).
Now, let’s see how our service is going to behave under load with a PDB!
k6 run k6/script.js
Once again, while k6 is gradually increasing strain on the pods’ CPU usage, let’s watch out for any EvictedByVPA
events in a second tab: eventually, you should see all 3 pods get evicted but this time, only one by one!
➜ kubectl get events -n sample-app -w | grep EvictedByVPA
... Pod was evicted by VPA Updater to apply resource recommendation.
... Pod was evicted by VPA Updater to apply resource recommendation.
... Pod was evicted by VPA Updater to apply resource recommendation.
Once k6 is done, have a look at the Load Test summary and the result of the status code
counter metric in particular.
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: k6/script.js
output: -
scenarios: (100.00%) 1 scenario, 200 max VUs, 10m30s max duration (incl. graceful stop):
* load: Up to 40.00 iterations/s for 10m0s over 3 stages (maxVUs: 200, gracefulStop: 30s)
✓ status code is 200
✓ node is kind-control-plane
✓ namespace is sample-app
✓ pod is sample-app-*
✓ checks.........................: 100.00% ✓ 72356 ✗ 0
data_received..................: 4.2 MB 7.0 kB/s
data_sent......................: 2.1 MB 3.5 kB/s
http_req_blocked...............: avg=19.7µs min=2µs med=7µs max=2.95ms p(90)=16µs p(95)=21µs
http_req_connecting............: avg=6.86µs min=0s med=0s max=2.24ms p(90)=0s p(95)=0s
✓ http_req_duration..............: avg=103.43ms min=452µs med=6.4ms max=5.29s p(90)=259.88ms p(95)=484.22ms
{ expected_response:true }...: avg=103.43ms min=452µs med=6.4ms max=5.29s p(90)=259.88ms p(95)=484.22ms
http_req_failed................: 0.00% ✓ 0 ✗ 18089
http_req_receiving.............: avg=99.34µs min=8µs med=77µs max=5.42ms p(90)=166µs p(95)=212µs
http_req_sending...............: avg=51.86µs min=9µs med=33µs max=18.76ms p(90)=71µs p(95)=96.59µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=103.28ms min=418µs med=6.2ms max=5.29s p(90)=259.8ms p(95)=483.99ms
http_reqs......................: 18089 30.148343/s
iteration_duration.............: avg=103.9ms min=625.62µs med=6.98ms max=5.29s p(90)=260.24ms p(95)=485.04ms
iterations.....................: 18089 30.148343/s
vus............................: 0 min=0 max=69
vus_max........................: 200 min=200 max=200
running (10m00.0s), 000/200 VUs, 18089 complete and 0 interrupted iterations
load ✓ [======================================] 000/200 VUs 10m0s 00.41 iters/s
Yay! 🎉
This time, our service handled pod disruption beautifully and not a single request failed!
Thanks to the Pod Disruption Budget, a pod can only be evicted if all other pods are up, ensuring that at least 2 pods are available to handle the traffic.
This is what we call: High availability! 🚀
That’s it! You can now stop and delete your Kind cluster.
kind delete cluster
To summarize, using Vertical Pod Autoscaler (VPA) we were able to:
Was it worth it? Did that help you understand how to implement Vertical Pod Autoscaler in Kubernetes?
If so, follow me on Twitter, I’ll be happy to answer any of your questions and you’ll be the first one to know when a new article comes out! 👌
Bye-bye! 👋