Vertical Pod Autoscaler in Kubernetes

In Kubernetes, we usually think about the Horizontal Pod Autoscaler (HPA) when referring to autoscaling. In most cases, it will be the preferred way of scaling services, based on CPU usage, memory usage, or custom metrics.

If you haven’t already, go read Horizontal Pod Autoscaler in Kubernetes (Part 1) - Simple Autoscaling using Metrics Server and learn how to implement a Horizontal Pod Autoscaler using Metrics Server!

However, while HPA can scale up and down replicas based on the current load, it is not capable of optimizing resource usage over the long term: This is where the Vertical Pod Autoscaler (VPA) comes in.

The VPA can be leveraged to optimize resource usage over time, based on mid to long-term observation.

Please note that to avoid a race condition, the VPA should only be used together with HPAs that are based on custom metrics. In addition, the VPA should not be used with JVM-based services due to limited visibility into the actual memory usage of the workload (learn more about its limitations here).

If you haven’t already, go read Horizontal Pod Autoscaler in Kubernetes (Part 2) - Advanced Autoscaling using Prometheus Adapter and learn how to implement a Horizontal Pod Autoscaler using Prometheus Adapter!

🎬 Hi there, I’m Jean!

In this article, we’re going to learn how to use Vertical Pod Autoscaler (VPA) to vertically scale services in Kubernetes automatically based on resource metrics! 💪

Requirements

Before we start, make sure you have the following tools installed:

Note: for MacOS users or Linux users using Homebrew, simply run:
brew install kind kubectl helm k6

All set? Let’s go! 🏁

Creating Kind Cluster

Kind is a tool for running local Kubernetes clusters using Docker container “nodes”. It was primarily designed for testing Kubernetes itself, but may be used for local development or CI.

I don’t expect you to have a demo project in handy, so I built one for you.

git clone https://github.com/jhandguy/vertical-pod-autoscaler.git
cd vertical-pod-autoscaler

Alright, let’s spin up our Kind cluster! 🚀

➜ kind create cluster --image kindest/node:v1.27.3 --config=kind/cluster.yaml
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.27.3) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂

Installing cert-manager

cert-manager is a Kubernetes addon that automates the management and issuance of TLS certificates from various issuing sources. It ensures certificates are valid and up to date periodically, and attempts to renew certificates at an appropriate time before expiry.

cert-manager can be installed via its Helm chart.

helm repo add jetstack https://charts.jetstack.io
helm install jetstack/cert-manager --name-template cert-manager --create-namespace -n cert-manager --values kind/cert-manager-values.yaml --version 1.13.2 --wait

If everything went fine, you should be able to see three newly spawned Deployments with the READY state!

➜ kubectl get deploy -n cert-manager
NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
cert-manager              1/1     1            1           6m27m
cert-manager-cainjector   1/1     1            1           6m27m
cert-manager-webhook      1/1     1            1           6m27m

Installing NGINX Ingress Controller

NGINX Ingress Controller is one of the many available Kubernetes Ingress Controllers, which acts as a load balancer and satisfies routing rules specified in Ingress resources, using the NGINX reverse proxy.

NGINX Ingress Controller can be installed via its Helm chart.

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx/ingress-nginx --name-template ingress-nginx --create-namespace -n ingress-nginx --values kind/ingress-nginx-values.yaml --version 4.8.3 --wait

Now, if everything goes according to plan, you should be able to see the ingress-nginx-controller Deployment running.

➜ kubectl get deploy -n ingress-nginx
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
ingress-nginx-controller   1/1     1            1           4m35s

Installing Metrics Server

Metrics Server is a source of container resource metrics, which collects them from Kubelets and exposes them in Kubernetes API server through Metrics API for use by Horizontal Pod Autoscaler and Vertical Pod Autoscaler.

Metrics Server can be installed via its Helm chart.

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server
helm install metrics-server/metrics-server --name-template metrics-server --create-namespace -n metrics-server --values kind/metrics-server-values.yaml --version 3.11.0 --wait

Now, if everything goes according to plan, you should be able to see the metrics-server Deployment running.

➜ kubectl get deploy -n metrics-server
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   1/1     1            1           38s

Installing Vertical Pod Autoscaler

Vertical Pod Autoscaler (VPA) is a component of the Kubernetes Autoscaler that frees users from the necessity of setting up-to-date resource limits and requests for the containers in their pods.

When configured, it will set the requests automatically based on usage and thus allow proper scheduling onto nodes so that the appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in the initial container configuration.

It can both down-scale pods that are over-requesting resources, and also up-scale pods that are under-requesting resources based on their usage over time.

Note: VPA is still in its beta phase, using it for production is at your own risk.

As of this writing, Kubernetes does not provide an official Helm chart, so I went ahead and built one!

helm install helm-chart --name-template vertical-pod-autoscaler --create-namespace -n vertical-pod-autoscaler --wait

If everything goes fine, you should eventually see three Deployments with the READY state!

➜ kubectl get deploy -n vertical-pod-autoscaler
NAME                               READY UP-TO-DATE AVAILABLE AGE
vert...scaler-admission-controller 1/1   1          1         2m32s
vert...scaler-recommender          1/1   1          1         2m32s
vert...scaler-updater              1/1   1          1         2m32s

As you can observe, the VPA is split into 3 separate components:

The Recommender computes the recommended resource requests for pods based on historical and current usage of the resources. The current recommendations are then put in the status of the VPA resource, where they can be inspected;
The Updater decides which pods should be restarted based on resource allocation recommendations calculated by Recommender. If a pod should be updated, Updater will try to evict the pod. It respects the pod disruption budget, by using the Eviction API to evict pods. Updater does not perform the actual resources update but relies on Admission Controller to update pod resources when the pod is recreated after eviction.
The Admission Controller will get a request from the API server for each pod creation and will either decide there’s no matching VPA configuration or find the corresponding one and use the current recommendation to set resource requests in the pod.

Configuring Vertical Pod Autoscaler

Now that the Vertical Pod Autoscaler is up and running, let’s get to it, shall we? 🧐

helm install sample-app/helm-chart --name-template sample-app --create-namespace -n sample-app --wait

If everything goes fine, you should eventually see one Deployment with the READY state.

➜ kubectl get deploy -n sample-app
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
sample-app   3/3     3            3           58s

Alright, now let’s have a look at the VPA!

➜ kubectl describe vpa -n sample-app
...
Spec:
  Resource Policy:
    Container Policies:
      Container Name:  sample-app
      Controlled Resources:
        cpu
        memory
      Max Allowed:
        Cpu:     100m
        Memory:  200Mi
      Min Allowed:
        Cpu:     10m
        Memory:  20Mi
  Target Ref:
    API Version:  apps/v1
    Kind:         Deployment
    Name:         sample-app
  Update Policy:
    Update Mode:  Auto

As you can see, this VPA is configured to scale the service based on its CPU and memory resources. Its spec states that the minimum allowed CPU/memory is 10m/20Mi and the maximum is 100m/200Mi.

Finally, its update policy is in “Auto” mode, meaning that VPA assigns resource requests on pod creation as well as updates them on existing pods using the preferred update mechanism.

Currently, both the resource requests and limits are matching the VPA’s minimum allowance.

➜ kubectl get pods -n sample-app -o yaml | grep -A 6 'resources:'
      resources:
        limits:
          cpu: 10m
          memory: 20Mi
        requests:
          cpu: 10m
          memory: 20Mi
--
      resources:
        limits:
          cpu: 10m
          memory: 20Mi
        requests:
          cpu: 10m
          memory: 20Mi
--
      resources:
        limits:
          cpu: 10m
          memory: 20Mi
        requests:
          cpu: 10m
          memory: 20Mi

Now, let’s give some load to our service and see what happens!

For Load Testing, I really recommend k6 from the Grafana Labs team. It is a dead-simple yet super powerful tool with very extensive documentation.

See for yourself!

k6 run k6/script.js

While k6 is gradually increasing strain on the pods’ CPU usage, let’s watch out for any EvictedByVPA events in a second tab: eventually, you should see all 3 pods get evicted simultaneously!

➜ kubectl get events -n sample-app -w | grep EvictedByVPA
... Pod was evicted by VPA Updater to apply resource recommendation.
... Pod was evicted by VPA Updater to apply resource recommendation.
... Pod was evicted by VPA Updater to apply resource recommendation.

As soon as this happens, have a look at the updated pods’ resource requests and limits: the CPU/memory requests/limits should have doubled in value (CPU from 10m to 20m and memory from 10Mi to 20Mi).

➜ kubectl get pods -n sample-app -o yaml | grep -A 6 'resources:'
      resources:
        limits:
          cpu: 20m
          memory: "20971520"
        requests:
          cpu: 20m
          memory: "20971520"
--
      resources:
        limits:
          cpu: 20m
          memory: "20971520"
        requests:
          cpu: 20m
          memory: "20971520"
--
      resources:
        limits:
          cpu: 20m
          memory: "20971520"
        requests:
          cpu: 20m
          memory: "20971520"

This means Vertical Pod Autoscaler successfully evicted the pods in order to increase their resource requests and limits! 🎉

Once k6 is done, have a look at the Load Test summary and the result of the status code counter metric in particular.

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: k6/script.js
     output: -

  scenarios: (100.00%) 1 scenario, 200 max VUs, 10m30s max duration (incl. graceful stop):
           * load: Up to 40.00 iterations/s for 10m0s over 3 stages (maxVUs: 200, gracefulStop: 30s)


     ✗ status code is 200
      ↳  97% — ✓ 17724 / ✗ 365
     ✗ node is kind-control-plane
      ↳  97% — ✓ 17724 / ✗ 365
     ✗ namespace is sample-app
      ↳  97% — ✓ 17724 / ✗ 365
     ✗ pod is sample-app-*
      ↳  97% — ✓ 17724 / ✗ 365

   ✓ checks.........................: 97.98% ✓ 70896     ✗ 1460
     data_received..................: 4.2 MB 7.1 kB/s
     data_sent......................: 2.1 MB 3.5 kB/s
     http_req_blocked...............: avg=18.39µs  min=2µs   med=8µs    max=2.93ms  p(90)=17µs     p(95)=20µs
     http_req_connecting............: avg=5.71µs   min=0s    med=0s     max=2.74ms  p(90)=0s       p(95)=0s
   ✓ http_req_duration..............: avg=188.13ms min=491µs med=4.57ms max=59.99s  p(90)=269.43ms p(95)=646.78ms
       { expected_response:true }...: avg=129.55ms min=491µs med=4.62ms max=7.3s    p(90)=261.24ms p(95)=602.03ms
     http_req_failed................: 2.01%  ✓ 365       ✗ 17724
     http_req_receiving.............: avg=98.96µs  min=0s    med=75µs   max=4.27ms  p(90)=159µs    p(95)=209µs
     http_req_sending...............: avg=49.92µs  min=7µs   med=34µs   max=14.39ms p(90)=72µs     p(95)=93µs
     http_req_tls_handshaking.......: avg=0s       min=0s    med=0s     max=0s      p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=187.99ms min=451µs med=4.38ms max=59.99s  p(90)=269.31ms p(95)=646.69ms
     http_reqs......................: 18089  30.147771/s
     iteration_duration.............: avg=188.59ms min=666µs med=5.11ms max=1m0s    p(90)=270.13ms p(95)=647.49ms
     iterations.....................: 18089  30.147771/s
     vus............................: 0      min=0       max=145
     vus_max........................: 200    min=200     max=200


running (10m00.0s), 000/200 VUs, 18089 complete and 0 interrupted iterations
load ✓ [======================================] 000/200 VUs  10m0s  00.41 iters/s

Uh-oh… It looks like we had some downtime! 😱

Thankfully, our service was able to restart relatively fast and 365 out of 17724 requests failed. But for a service with a slower startup time, this could have led to an incident! 🚨

This is due to the fact that vertical scaling, in essence, cannot happen without a restart: a pod’s CPU and/or memory cannot be increased in place. Instead, the pod must be terminated and a new one created with increased resources.

So how do we ensure the availability of our service during vertical autoscaling then?

This is where the Pod Disruption Budget (PDB) comes in!

We’ll get to that in a minute, let’s uninstall our Helm release first! (we won’t be needing this one anymore)

helm uninstall sample-app -n sample-app

Configuring Pod Disruption Budget

To prevent downtimes during pod disruption such as the one we previously experienced, a Pod Disruption Budget (PDB) can be configured.

A PDB limits the number of pods that are down simultaneously from voluntary disruptions. It can be configured to sustain either a minimum amount of available pods (minAvailable), or a maximum amount of unavailable pods (maxUnavailable).

Let’s see what happens if we try to autoscale vertically the same application but with a Pod Disruption Budget with maxUnavailable: 1.

helm install sample-app/helm-chart --name-template sample-app --set podDisruptionBudget.enabled=true --create-namespace -n sample-app --wait

Once again, you should eventually see one Deployment with the READY state.

➜ kubectl get deploy -n sample-app
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
sample-app   3/3     3            3           32s

Alright, now let’s have a look at the PDB!

➜ kubectl describe pdb -n sample-app
Name:             sample-app
Namespace:        sample-app
Max unavailable:  1
Selector:         app=sample-app
Status:
    Allowed disruptions:  1
    Current:              3
    Desired:              2
    Total:                3

As you can see, this PDB is configured to prevent more than 1 pod to be unavailable during a voluntary pod disruption (such as pod eviction by VPA).

Now, let’s see how our service is going to behave under load with a PDB!

k6 run k6/script.js

Once again, while k6 is gradually increasing strain on the pods’ CPU usage, let’s watch out for any EvictedByVPA events in a second tab: eventually, you should see all 3 pods get evicted but this time, only one by one!

➜ kubectl get events -n sample-app -w | grep EvictedByVPA
... Pod was evicted by VPA Updater to apply resource recommendation.
... Pod was evicted by VPA Updater to apply resource recommendation.
... Pod was evicted by VPA Updater to apply resource recommendation.

Once k6 is done, have a look at the Load Test summary and the result of the status code counter metric in particular.

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: k6/script.js
     output: -

  scenarios: (100.00%) 1 scenario, 200 max VUs, 10m30s max duration (incl. graceful stop):
           * load: Up to 40.00 iterations/s for 10m0s over 3 stages (maxVUs: 200, gracefulStop: 30s)


     ✓ status code is 200
     ✓ node is kind-control-plane
     ✓ namespace is sample-app
     ✓ pod is sample-app-*

   ✓ checks.........................: 100.00% ✓ 72356     ✗ 0
     data_received..................: 4.2 MB  7.0 kB/s
     data_sent......................: 2.1 MB  3.5 kB/s
     http_req_blocked...............: avg=19.7µs   min=2µs      med=7µs    max=2.95ms  p(90)=16µs     p(95)=21µs
     http_req_connecting............: avg=6.86µs   min=0s       med=0s     max=2.24ms  p(90)=0s       p(95)=0s
   ✓ http_req_duration..............: avg=103.43ms min=452µs    med=6.4ms  max=5.29s   p(90)=259.88ms p(95)=484.22ms
       { expected_response:true }...: avg=103.43ms min=452µs    med=6.4ms  max=5.29s   p(90)=259.88ms p(95)=484.22ms
     http_req_failed................: 0.00%   ✓ 0         ✗ 18089
     http_req_receiving.............: avg=99.34µs  min=8µs      med=77µs   max=5.42ms  p(90)=166µs    p(95)=212µs
     http_req_sending...............: avg=51.86µs  min=9µs      med=33µs   max=18.76ms p(90)=71µs     p(95)=96.59µs
     http_req_tls_handshaking.......: avg=0s       min=0s       med=0s     max=0s      p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=103.28ms min=418µs    med=6.2ms  max=5.29s   p(90)=259.8ms  p(95)=483.99ms
     http_reqs......................: 18089   30.148343/s
     iteration_duration.............: avg=103.9ms  min=625.62µs med=6.98ms max=5.29s   p(90)=260.24ms p(95)=485.04ms
     iterations.....................: 18089   30.148343/s
     vus............................: 0       min=0       max=69
     vus_max........................: 200     min=200     max=200


running (10m00.0s), 000/200 VUs, 18089 complete and 0 interrupted iterations
load ✓ [======================================] 000/200 VUs  10m0s  00.41 iters/s

Yay! 🎉
This time, our service handled pod disruption beautifully and not a single request failed!

Thanks to the Pod Disruption Budget, a pod can only be evicted if all other pods are up, ensuring that at least 2 pods are available to handle the traffic.

This is what we call: High availability! 🚀

Wrapping up

That’s it! You can now stop and delete your Kind cluster.

kind delete cluster

To summarize, using Vertical Pod Autoscaler (VPA) we were able to:

Autoscale vertically our service, based on resource metrics;
Prevent downtime during pod eviction thanks to Pod Disruption Budget.

Was it worth it? Did that help you understand how to implement Vertical Pod Autoscaler in Kubernetes?

If so, follow me on X, I’ll be happy to answer any of your questions and you’ll be the first one to know when a new article comes out! 👌

Bye-bye! 👋