Kubernetes Tutorial Series: Pod Autoscaling and Cluster Autoscaling

This is the fifth tutorial of the Kubernetes Tutorial Series. In this article we will learn how Pod Autoscaling as well as Cluster Autoscaling works in Kubernetes. Other articles from this series:


Autoscaling happens in Kubernetes at two levels, one at the Pod level and the other at cluster or node level.

Pod level: When the CPU Utilization or Memory utilization or any other custom metrics like number of requests per seconds increases above the threshold, the number of Pods serving the request must be increased to accommodate the traffic. This is done using Horizontal Pod Autoscaling which we will see in a moment.

Cluster level: When the number of worker nodes in the cluster cannot accommodate any new Pods because of the lack of resources then we need to provision new nodes which will be part of our cluster. This is done using Cluster Autoscaler which we will take a look in a moment.

Horizontal Pod Autoscaling

As the name suggests it scales Pod up or down horizontally. Whenever we want to scale up or down the number of Pods dynamically based on metrics like CPU, memory or any custom metrics like number of requests per second, we use Horizontal Pod Autoscaler.

For the HPA to work we need to have metrics server installed because it is responsible for sending the metrics information of the pods to HPA. Look the below video to see how metrics-server works along with Kubelet and HPA to send metrics.

Run the below command to install metrics-server.

git clone https://github.com/kubernetes-incubator/metrics-server.git
cd metrics-server/
kubectl create -f deploy/1.8+/

Once metrics-server is installed and the metrics-server pod is showing metrics, we are ready to move on. Test whether metrics-server is working or not, run the below command:

kubectl top nodes

Let’s see how hpa works in action. Copy and paste the below manifest file and name it hpaapp.yaml .

kind: Service
apiVersion: v1
  name: train-schedule-service
  type: NodePort
    app: train-schedule
  - protocol: TCP
    port: 8080
    nodePort: 30010


apiVersion: extensions/v1beta1
kind: Deployment
  name: train-schedule-deployment
    app: train-schedule
  replicas: 2
      app: train-schedule
        app: train-schedule
      - name: train-schedule
        image: linuxacademycontent/train-schedule:autoscaling
        - containerPort: 8080
            path: /
            port: 8080
          initialDelaySeconds: 15
          timeoutSeconds: 1
          periodSeconds: 10
            cpu: 100m

The above file creates a service and a deployment. The deployment specifies the replicas as 2 and requests 100m of CPU for each Pod. Let us now create a horizontal pod autoscaler. Copy and paste the below code and name it hpa.yaml.

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
  name: train-schedule
  namespace: default
    apiVersion: apps/v1
    kind: Deployment
    name: train-schedule-deployment
  minReplicas: 1
  maxReplicas: 10
  - type: Resource
      name: cpu
      targetAverageUtilization: 50

In the above file we can see that:

  • It’s kind is HorizontalPodAutoscaler and the apiVersion is autoscaling/v2beta1.

  • In the spec section, we can see that scaleTargetRef is referring to the deployment created above.

  • The minimum number of pods is set to 2.

  • The maximum number of pod is set to 10.

  • In the metics section we can see the resource we will check will be CPU and the threshold is 50 percent.

Run the above files by running the below commands.

kubectl create -f hpaapp.yaml
kubectl create -f hpa.yaml

Now when you run the below command you should see the current CPU utilization of the pods.

kubectl get hpa

Now let’s increase the load to the website by running the below commands, so that we can see the autoscale of Pods in action (please run it in a different terminal):

$ kubectl run -i --tty load-generator --image=busybox /bin/sh

Hit enter for command prompt

$ while true; do wget -q -O- http://train-schedule-service.default.svc.cluster.local; done

And now when you run kubectl get hpa you will see the CPU utilization getting increased and replicas of Pods increasing and reaching to 10 from 2. This may take upto 5 minutes.

Now it can happen that some of the pods may go in Pending state. You may ask Why? In the case where you have used a machine with CPU cores which is sufficient for running all the 10 replicas of the Pod, you will not see any Pod in Pending state. But if you do not have a machine which can fit all the 10 replicas then they will go in Pending state, this is because the scheduler can’t find any nodes with required resources that are needed for the Pods. This is where Cluster Autoscaler kicks in.

To learn more about Resource Allocation, Follow Kubernetes Turtorial Series: Resource Allocation.

Cluster Autoscaler

Cluster Autoscaler is a component that automatically adjusts the size of a Kubernetes Cluster so that all pods have a place to run and there are no unneeded nodes. It is always good to have Cluster Autoscaler with Horizontal Pod Autoscaler.

In the below video you can see how HPA and CA work with each other to provide autoscaling in Kubernetes. In short, the video explains that when the CPU Utilization on the Pods goes above 80% then HPA spins up new Pods and when the node cannot accomodate more Pods, CA kicks in and spins up a new node and the Pending Pods run on the new provisioned node. And once the load decreases, HPA terminates Pods and then CA sees for nodes which can be removed and removes it after waiting 10 minutes.

Setting up Cluster Autoscaler is very simple. Follow this link to configure Cluster Autoscaler.

Once CA is configured, any Pending Pods which cannot find appropriate resources on the existing nodes will be scheduled on the newly provisioned node by CA.

Let’s now stop the load on the application that you created above in the Horizontal Pod Autoscaling section to see scale down of Pods and nodes. In the terminal where we created the container with busybox image, terminate the load generation by typing <Ctrl> + C.

And when you again run kubectl get hpa you will see the CPU utilization coming down and the Pod count also coming down to 2. And after some time the node is also terminated because it is no longer needed.

Below video shows everything we just learned in detail.

With this you have learnt all about autoscaling in Kubernetes. Next up is how to grant access to users in a Kubernetes Cluster.

Kubernetes Tutorial Series: RBAC in Kubernetes

Please let me know if you have any queries in the comments section below.

Drop Me a Line, Let Me Know What You Think

© 2020 by Samrat Priyadarshi