Horizontal Pod Autoscaler in Kubernetes (Part 2) — Advanced Autoscaling using Prometheus Adapter
Table of Contents
Horizontal Pod Autoscaler in Kubernetes (Part 2) — Advanced Autoscaling using Prometheus Adapter
Table of Contents
The Horizontal Pod Autoscaler (HPA) is a fundamental feature of Kubernetes. It enables automatic scale-up and scale-down of containerized applications based on CPU usage, memory usage, or custom metrics.
Traditionally, when scaling software, we first think of vertical scaling: the CPU and the RAM are increased so the application consuming them can perform better. While this seems like a flawless mechanism on paper, it actually comes with many drawbacks.
First, upgrading the CPU or RAM on a physical machine (or VM) requires downtime and unless a Pod Disruption Budget (PDB) is used to handle disruptions, all pods will be evicted and recreated in the new resized node.
Nodes resource usage is also not optimized, as scaling vertically means requiring sufficient resources in a single node, while horizontal scaling may have the same amount of resources distributed across multiple nodes.
Additionally, vertical scaling is not as resilient as horizontal scaling, as fewer replicas mean higher risks of disruptions in case of node failure.
Finally, reaching a certain threshold, scaling only vertically becomes very expensive and most importantly, isn’t limitless. In fact, there is only so much CPU and RAM a physical machine(or VM) alone can handle.
This is where horizontal scaling comes into play!
Eventually, it is more efficient to duplicate an instance, than increase its resources.
🎬 Hi there, I’m Jean!
In this 2 parts series, we’re going to explore several ways to scale services horizontally in Kubernetes, and the second one is…
🥁
… using Prometheus Adapter! 🎊
Before we start, make sure you have the following tools installed:
Note: for MacOS users or Linux users using Homebrew, simply run:
brew install kind kubectl helm k6
All set? Let’s go! 🏁
Kind is a tool for running local Kubernetes clusters using Docker container “nodes”. It was primarily designed for testing Kubernetes itself, but may be used for local development or CI.
I don’t expect you to have a demo project in handy, so I built one for you.
git clone https://github.com/jhandguy/horizontal-pod-autoscaler.git
cd horizontal-pod-autoscaler
Alright, let’s spin up our Kind cluster! 🚀
➜ kind create cluster --image kindest/node:v1.27.3 --config=kind/cluster.yaml
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.27.3) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂
NGINX Ingress Controller is one of the many available Kubernetes Ingress Controllers, which acts as a load balancer and satisfies routing rules specified in Ingress resources, using the NGINX reverse proxy.
NGINX Ingress Controller can be installed via its Helm chart.
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx/ingress-nginx --name-template ingress-nginx --create-namespace -n ingress-nginx --values kind/ingress-nginx-values.yaml --version 4.8.3 --wait
Now, if everything goes according to plan, you should be able to see the ingress-nginx-controller Deployment running.
➜ kubectl get deploy -n ingress-nginx
NAME READY UP-TO-DATE AVAILABLE AGE
ingress-nginx-controller 1/1 1 1 4m35s
Prometheus can be installed via its community Helm chart, which also provides Grafana out of the box.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-community/kube-prometheus-stack --name-template prometheus --create-namespace -n prometheus --version 54.2.2 --wait
If everything went fine, you should be able to see three newly spawned deployments with the READY state!
➜ kubectl get deploy -n prometheus
NAME READY UP-TO-DATE AVAILABLE
prometheus-grafana 1/1 1 1
prometheus-kube-prometheus-operator 1/1 1 1
prometheus-kube-state-metrics 1/1 1 1
Prometheus Adapter is a source of custom metrics, which collects them from Prometheus and exposes them in Kubernetes API server through Metrics API for use by Horizontal Pod Autoscaler and Vertical Pod Autoscaler.
Unlike Metrics Server which is limited to resource metrics, Prometheus Adapter exposes custom metrics measurable from within a Pod’s container, such as memory usage, GC duration, but also request throughput, latency, etc.
If you haven’t already, go read Horizontal Pod Autoscaler in Kubernetes (Part 1) - Simple Autoscaling using Metrics Server and learn how to implement a Horizontal Pod Autoscaler using Metrics Server!
Prometheus Adapter can be installed via its Helm chart.
helm install prometheus-community/prometheus-adapter --name-template prometheus-adapter --create-namespace -n prometheus-adapter --values kind/prometheus-adapter-values.yaml --version 4.9.0 --wait
Now, if all goes well, you should see the prometheus-adapter Deployment running with the READY state.
➜ kubectl get deploy -n prometheus-adapter
NAME READY UP-TO-DATE AVAILABLE AGE
prometheus-adapter 1/1 1 1 52s
Now that Prometheus Adapter is up and running, let’s get to it, shall we? 🧐
helm install golang-sample-app/helm-chart --name-template sample-app --create-namespace -n sample-app --set prometheus.enabled=true --wait
If everything goes fine, you should eventually see one Deployment with the READY state.
➜ kubectl get deploy -n sample-app
NAME READY UP-TO-DATE AVAILABLE AGE
sample-app 2/2 2 2 28s
Alright, now let’s have a look at the HPA!
➜ kubectl describe hpa -n sample-app
...
Metrics: ( current / target )
"golang_sample_app_requests_per_second" on pods: <unknown> / 10
Min replicas: 2
Max replicas: 8
As you can see, this HPA is configured to scale the service based on requests per second (rps), with an average of 10.
Note: as you probably have noticed, the current value for the request throughput is unknown, this is expected as the service hasn’t served any requests yet, thus no timeseries for this metric are present in Prometheus.
This means that as soon as the request throughput breaches the 10rps threshold, the HPA will trigger an upscale.
Under minimal load, the HPA will still retain a replica count of 2, while the maximum amount of Pods the HPA is allowed to spin up under high load is 8.
Note: in a production environment, it is recommended to have a minimum replica count of at least 3, to guarantee maintained availability in the case of Pod Disruption.
Now, this is the moment you’ve certainly expected… It’s Load Testing time! 😎
For Load Testing, I really recommend k6 from the Grafana Labs team. It is a dead-simple yet super powerful tool with very extensive documentation.
See for yourself!
k6 run k6/script.js
While the load test is running, I suggest watching the HPA in a separate tab.
kubectl get hpa -n sample-app -w
As the load test progresses and the 2 starting Pods start to handle more than 10rps each, you should see the Prometheus metric’s target increasing, and ultimately, the replica count reaching its maximum!
Deployment/sample-app <unknown>/10 2 8 2
Deployment/sample-app 4360m/10 2 8 2
Deployment/sample-app 6236m/10 2 8 2
Deployment/sample-app 12471m/10 2 8 2
Deployment/sample-app 15577m/10 2 8 3
Deployment/sample-app 21231m/10 2 8 3
Deployment/sample-app 21231m/10 2 8 5
Deployment/sample-app 16752m/10 2 8 5
Deployment/sample-app 18362m/10 2 8 5
Deployment/sample-app 15921m/10 2 8 6
Deployment/sample-app 15543m/10 2 8 6
Deployment/sample-app 16530m/10 2 8 8
Deployment/sample-app 15397m/10 2 8 8
Deployment/sample-app 14428m/10 2 8 8
Deployment/sample-app 11897m/10 2 8 8
Deployment/sample-app 12370m/10 2 8 8
Deployment/sample-app 12423m/10 2 8 8
Deployment/sample-app 12414m/10 2 8 8
Deployment/sample-app 12423m/10 2 8 8
Deployment/sample-app 10962m/10 2 8 8
Now, let’s quickly have a look at the Load Test summary and the result of the http_req_duration
metric in particular!
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: k6/script.js
output: -
scenarios: (100.00%) 1 scenario, 100 max VUs, 5m30s max duration (incl. graceful stop):
* load: Up to 100.00 iterations/s for 5m0s over 2 stages (maxVUs: 100, gracefulStop: 30s)
✓ status code is 200
✓ node is kind-control-plane
✓ namespace is sample-app
✓ pod is sample-app-*
✓ checks.........................: 100.00% ✓ 60356 ✗ 0
data_received..................: 3.5 MB 12 kB/s
data_sent......................: 1.7 MB 5.8 kB/s
http_req_blocked...............: avg=18.43µs min=1µs med=8µs max=3.9ms p(90)=17µs p(95)=20µs
http_req_connecting............: avg=5.41µs min=0s med=0s max=3.7ms p(90)=0s p(95)=0s
✓ http_req_duration..............: avg=16.74ms min=498µs med=2.77ms max=1.48s p(90)=14.52ms p(95)=64.78ms
{ expected_response:true }...: avg=16.74ms min=498µs med=2.77ms max=1.48s p(90)=14.52ms p(95)=64.78ms
http_req_failed................: 0.00% ✓ 0 ✗ 15089
http_req_receiving.............: avg=97.15µs min=9µs med=74µs max=3.33ms p(90)=151µs p(95)=197µs
http_req_sending...............: avg=48.11µs min=6µs med=34µs max=3.12ms p(90)=69µs p(95)=86µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=16.59ms min=452µs med=2.61ms max=1.48s p(90)=14.32ms p(95)=64.69ms
http_reqs......................: 15089 50.297179/s
iteration_duration.............: avg=17.18ms min=610.5µs med=3.21ms max=1.48s p(90)=15.08ms p(95)=65.33ms
iterations.....................: 15089 50.297179/s
vus............................: 0 min=0 max=18
vus_max........................: 100 min=100 max=100
running (5m00.0s), 000/100 VUs, 15089 complete and 0 interrupted iterations
load ✓ [======================================] 000/100 VUs 5m0s 000.65 iters/s
As you can observe, our service has performed very well under heavy load, with a Success Share of 100%, a median latency of ~3ms, and a 95th percentile latency of ~50ms!
We have the HPA to thank for that, as it scaled the Deployment from 2 to 8 Pods swiftly and automatically, based on request throughput!
We definitely would not have had the same results without an HPA… Actually, why don’t you try it yourself? 😉
Just delete the HPA (kubectl delete hpa sample-app -n sample-app
), run the load test again (k6 run k6/script.js
) and see what happens! (spoiler alert: it’s not pretty 😬)
That’s it! You can now stop and delete your Kind cluster.
kind delete cluster
To summarize, using Prometheus Adapter we were able to:
Was it worth it? Did that help you understand how to implement Horizontal Pod Autoscaler in Kubernetes using Prometheus Adapter?
If so, follow me on Twitter, I’ll be happy to answer any of your questions and you’ll be the first one to know when a new article comes out! 👌
Bye-bye! 👋