Auto-scaling in Kubernetes covers two concepts:

  • Auto-scaling the number of replicas (pods) in a deployment, which is driven by a metric, such as CPU, memory, or other custom metrics.
  • Auto-scaling clusters, which is driven by pod scheduling. If pods can’t be scheduled, auto-scaler adds node to the node pool, and removes them when cluster is over-provisioned.

How scaling works

The first component of auto-scaling is pod auto-scaling. The pod auto-scaler monitors a metric from a resource and calculates the desired scale. As long as the metric is within a range, the scale doesn’t change.

First step

When the metric goes beyond a given metric, the pod auto-scaler adjusts the scale. There are multiple strategies to adjust, a common one is to increase the current scale by one - therefore the deployment desired replicas is increased to 3:

Second step

The total resources required by the pods (1.5 CPUs) is increased beyond the available resources (1 CPU), so the pod can’t be scheduled on an existing node. The cluster auto-scaler notices that situation and adds a node to the node pool ; to which the pod gets scheduled:

Third step

If the metric goes beyond a certain target, the pod auto-scaler might decide to scale down the number of desired replicas in the deployment. One of the pod gets deleted:

Fourth step

After some time, the cluster auto-scaler notices that one of the nodes is empty, and removes it from the pool:

Fifth step

Scaling deployments with the Horizontal Pod Auto-scaler

The standard way to scale pods is through the Horizontal Pod Autoscaler. The auto-scaler targets a deployment and any metric of the Kubernetes metrics API, then scales deployments up and down based on configured rules.

Out of the box it can mainly target CPU and memory utilization. But the metrics API can be extended with custom metrics adapters. For example, there is one for Azure that can use any random metric in the Azure Monitor. There is one for Prometheus as well, and there seems to be some for the other clouds.

Unfortunately the metrics adapter for Azure can’t really be used for Azure Storage Queues at the moment1. If what is needed is a metric that isn’t proposed by one of these adapters, it’s pretty easy to extend the metrics API. Kidnap Brendan Burns and get him to build it.

The alternative is to use the Kubernetes API to handle scale.

Scaling deployments with the Kubernetes API

The Kubernetes API is accessible from a pod, so it can be used for that. The strategy there consists in building a monitor for the metric, which will then reconfigure the deployment to the required scale.

Kubernetes officially supports client SDKs for Go and Python, but there is one for .net as well (although it is a beta).

I built an extensible autoscaler that does just that. It comes with a metrics adapter for Azure Storage Queues, and will scale deployments up or down based on configurable target values.

Cluster auto-scaler

The Kubernetes cluster auto-scaler increases or decreases the number of nodes in a node pool based on pod scheduling. For that purpose, it is important that pod specify the compute resources they use.

Enabling it on Azure can be done following this documentation. At the moment of this writing, it is still a preview feature.

Running the demo

Autoscaler comes with a demo application that simulates processing messages coming from a storage queue. It consists in a generator, that sends messages to a storage queue, and of consumers, which dequeue those messages and simulate work by pausing the thread for a configured amount of time.

  1. Get a Kubernetes cluster with support for cluster auto-scaler, and an Azure storage account.

    One node when starting

  2. Clone the repository for the demo app and follow the instructions to get it deployed.
  3. Clone the repository for autoscaler and follow the instructions to get it deployed, set a high limit relatively high (e.g. 50 or 100)

    Deployments are ready and idle

  4. Start the message generator.

    Generator is running

  5. Observe that the number of pods and nodes is starting to increase

    Pods and nodes are scaling up

  6. Stop the message generator.

    5 nodes at full scale

  7. At some point the message count will get down and the number of pods will start to decrease:

    Scaling pods down

  8. Similarly, the number of nodes will start to decrease as well. It takes some time, the cluster auto-scaler status can be checked with kubectl -n kube-system describe configmap cluster-autoscaler-status:

    Scaling nodes down

  9. Up to the point where only one pod and one node are left:

    Back to normal


  1. since the Azure metrics adapter is relying on Azure Monitor, and storage queues report on their capacity only once a day. There’s an issue logged on GitHub