Kubernetes Horizontal Pod Autoscaling with metrics API

Table of Contents


This is the first article in a series dedicated to installing and configuring Kubernetes Horizontal Pod Autoscaling:

  1. Horizontal Pod Autoscaling with metrics API
  2. Horizontal Pod Autoscaling with custom metrics API

In this article, we will explore how to set up and configure Horizontal Pod Autoscaling with the metrics API. This will enable you to scale your application based on current CPU or memory usage.

Source code for the demo app and kubernetes objects can be found here.

Horizontal Pod Autoscaling overview

Horizontal Pod Autoscaling (HPA) is one type of autoscaling available in Kubernetes, along with node autoscaling and Vertical Pod Autoscaling. The primary goal of HPA is to adjust the number of running application replicas (pods) to match changes in application load. HPA scales your application horizontally by increasing or decreasing the number of pods.

It is important to note that HPA is not the same as Vertical Pod Autoscaling, which focuses on automatically updating the resources (requested/limit) of a deployment. If you are looking for this type of scaling, please refer to resources on Vertical Pod Autoscaling.

HPA relies on querying data from Kubernetes Metrics server.The Metrics Server adds extra APIs to your cluster, including metrics.k8s.io, custom.metrics.k8s.io, or external.metrics.k8s.io. In this article, we will be working with the first API type.

   │  │
 ┌─▼──┴───────┐ 1   ┌──────────────────┐
 │    HPA     ├────►│  Metrics Server  │
 └─────┬──────┘     └──────────────────┘
3   ┌────────────┐
       └─────►│ Deployment │
              │      │     │
     ┌───────┬┘  ┌───┴───┐ └─┬───────┐
     │ Pod 1 │   │ Pod 2 │   │ Pod N │
     └───────┘   └───────┘   └───────┘
  1. HPA queries the Metrics Server for resource data.
  2. Based on the data obtained in step 1, HPA calculates the desired number of replicas.
  3. If the desired number of replicas is different from the current number, HPA updates the replica count.
  4. The process repeats, starting at step 1."

Desired replicas count is evaluated every 15 seconds:

  • Scale up is triggered immediately if the result of the scaling rule suggests it.
  • Scale down is triggered only after 5 minutes if the scaling rule suggests it. Scaledowns will occur gradually, smoothing out the impact of fluctuating metric values.
  • If multiple metrics are configured, the HPA will calculate each metric in turn and then choose the one with the highest replica count.

Configuring Horizontal Pod Autoscaling


  • Kubernetes cluster
  • Kubectl
  • Metrics server

Setting up Kubernetes with minikube

  1. Install kubectl following instructions on the official documentation page
  2. Install minikube following instructions on the official documentation page
  3. Start minikube (please note that we have to enable HPAContainerMetrics feature for HPA to work with metrics API)
$ minikube start --feature-gates=HPAContainerMetrics=true
$ kubectl get po -A
NAMESPACE     NAME                               READY   STATUS    RESTARTS      AGE
kube-system   coredns-565d847f94-gchc9           1/1     Running   0             42s
kube-system   etcd-minikube                      1/1     Running   0             54s

Great, now we have local Kubernetes cluster v1.25.3.

Installing metrics API

Official installation K8S object definition can be found here, but we are going to use updated configs with extra flag --kubelet-insecure-tls to allow it run on our minikube cluster

$ kubectl apply -f https://raw.githubusercontent.com/ilyamochalov/source-code-mics/main/k8s/HPA/metrics-server.yaml
$ kubectl get po -A | grep metrics 
kube-system   metrics-server-55dd79d7bf-9bgrv    1/1     Running   0             2m40s

Check if metrics API is available now

$ kubectl get --raw "/apis/" | jq .
      "name": "metrics.k8s.io",
      "versions": [
          "groupVersion": "metrics.k8s.io/v1beta1",
          "version": "v1beta1"
      "preferredVersion": {
        "groupVersion": "metrics.k8s.io/v1beta1",
        "version": "v1beta1"

Creating a demo app

I have created a demo Python app with an endpoint triggering CPU intensive task. Full source code can be found here.

from flask import Flask

app = Flask(__name__)

def root():
    return "UP"

def cpu_intensive_task():
    result = 1
    for i in range(1, 10000):
        result = result * i

    return str(result)

if __name__ == "__main__":

This app is containerized and can be pulled from ilyamochalov/k8s-autoscaling-cpu-demo:latest

Deploy demo app on kubernetes

Let’s crate a namespace, deployment and service (source file) with kubectl apply -f https://raw.githubusercontent.com/ilyamochalov/source-code-mics/main/k8s/HPA/cpu-demo-app/k8s-deploy.yaml

apiVersion: v1
kind: Namespace
  name: hpa-demo
    name: hpa-demo

apiVersion: apps/v1
kind: Deployment
  name: cpu-demo
  namespace: hpa-demo
    app: cpu-demo
  replicas: 1
      app: cpu-demo
        app: cpu-demo
        - name: cpu-demo
          image: ilyamochalov/k8s-autoscaling-cpu-demo:latest
          imagePullPolicy: Always
              cpu: 10m
              memory: 20Mi
              cpu: 1
              memory: 500Mi
            - name: http
              protocol: TCP
              containerPort: 5000

apiVersion: v1
kind: Service
  name: cpu-demo
  namespace: hpa-demo
    app: cpu-demo
    - name: http
      protocol: TCP
      port: 5000

We can call the app endpoint via kubernetes port forwarding:

  • in one terminal run
$ kubectl port-forward -n hpa-demo svc/cpu-demo 5000:5000
Forwarding from -> 5000
Forwarding from [::1]:5000 -> 5000
Handling connection for 5000
  • in another terminal run
$ curl

Checking data from metrics.k8s.io for our application

We can check our application metrics in the API with the request like below:

$ kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/hpa-demo/pods/cpu-demo-bd4cf554b-m4qmc" | jq .
  "kind": "PodMetrics",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "name": "cpu-demo-bd4cf554b-m4qmc",
    "namespace": "hpa-demo",
    "creationTimestamp": "2023-01-16T06:19:38Z",
    "labels": {
      "app": "cpu-demo",
      "pod-template-hash": "bd4cf554b"
  "timestamp": "2023-01-16T06:19:31Z",
  "window": "15.027s",
  "containers": [
      "name": "cpu-demo",
      "usage": {
        "cpu": "217674n",
        "memory": "20164Ki"

Configuring HPA

Add HPA Configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
  name: cpu-demo
  namespace: hpa-demo
    apiVersion: apps/v1
    kind: Deployment
    name: cpu-demo
  minReplicas: 1
  maxReplicas: 5
    - type: ContainerResource
        name: cpu
        container: cpu-demo
          type: AverageValue
          averageValue: 0.75

HPA in action

In one terminal let’s describe HPA object and wrap command output with watch that help to execute the command periodically:

$ watch 'kubectl -n hpa-demo describe hpa/cpu-demo'

In another terminal let’s periodically check raw output from metrics API:

$ watch 'kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/hpa-demo/pods/cpu-demo-bd4cf554b-m4qmc" | jq .'

You should see a view like on the screenshot below:

HPA metrics API and HPA object outputs
HPA metrics API and HPA object outputs

You can notice that HPA returns current CPU average value rounded up to 1m.

Next, let’s send some requests with ab (make sure that port forwarding from k8s service is still active):

$ ab -n 10000000 -c 10
HPA metrics API and HPA object outputs after load was applied
HPA metrics API and HPA object outputs after load was applied

After some time, the current CPU value will grow higher than the configured HPA target value. In this case, HPA will instruct to scale up. It’s important to note that after a scale up event, we will have 2 pods and the current CPU value in the HPA object will be averaged by the number of pods currently running.

HPA configurations

Please read HPA kubernetes API to see full list of available options. Notes:

  • If you use averageUtilization, be aware that its value is represented as a percentage of the requested value of the resource for the pods
  • For common web applications, its not recommended to scale on memory
  • Scale up / scale down behavior [can be configured]9https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#configurable-scaling-behavior)


  • Horizontal Pod Autoscaling with metrics API helps to scale based on pod CPU and Memory usage

In the next article, we will dive into Horizontal Pod Autoscaling with custom metrics API. Stay tuned!

Ilya Mochalov
Ilya Mochalov
DevOps and Platform Engineering

IT professional living in Shanghai, China