How I let my Kubernetes cluster on AWS automatically scale on demand in 3 easy steps

4 min readMay 27, 2021

They asked me to add an image to make article interesting and I added this.

Hi there, I know you are here since we are all still trying to make most out of our Kubernetes clusters, learning about the tools and new features that can be integrated into Kubernetes to make it more intelligent and self aware.

Yes, you just heard the word self aware. Why does any system matters to be self aware, you guessed it right, because we don’t have to worry about that anymore.

We have K8s cluster deployed on AWS on top of EC2 instance using KOPS. As the team size grew in my organisation, being a DevOps engineer, I was asked to design an automation system which automatically provisions new testing environments for each developer on their demand. Once they are done with the tests, the infrastructure needs to be scaled down automatically.

If you are here, I know you have certainly heard of the features like Horizontal Pod AutoScaling (HPA) and Vertical Pod Autoscaling on Kubernetes. But have you ever wondered, how to automatically provision infrastructure for those new replicas created by HPA? If you have, cluster autoscaler is the answer for that.

Kubernetes cluster autoscaler is a tool which monitors the resource requests coming to your cluster, automatically alters the size of the cluster to make room for the new requests and also scales down any additional resources once the demand expires. Now, let’s see how can we set this up.

Step 1

Give the cluster access to perform the autoscaling activities on AWS. You can do this on two ways and I would recommend the second approach.

Directly modify the IAM role attached to your K8s nodes to have the autoscaling policies.
OR
Run kops edit cluster. Then add additional polices to node.

kind: Cluster
...
spec:
  additionalPolicies:
    node: |
      [
        {
          "Effect": "Allow",
          "Action": [
            "autoscaling:DescribeAutoScalingGroups",
            "autoscaling:DescribeAutoScalingInstances",
            "autoscaling:DescribeLaunchConfigurations",
            "autoscaling:SetDesiredCapacity",
            "autoscaling:TerminateInstanceInAutoScalingGroup",
            "autoscaling:DescribeTags"
          ],
          "Resource": "*"
        }
      ]
...

Step 2

Edit the required instance groups which needs to be automatically scaled to add two additional tags :

Open up the instance group editor using kops edit <instance-group-name>

kind: InstanceGroup
...
spec:
  cloudLabels:
    service: k8s_node
    k8s.io/cluster-autoscaler/enabled: ""
    k8s.io/cluster-autoscaler/<YOUR CLUSTER NAME>: ""
...

The maxSize and minSize attributes of the Instance groups should be sensibly chosen based on the requirements. Those are the upper and lower limits of the autoscaling.

Now we can apply the changes to the cloud using :

kops update cluster --yes
kops rolling-update cluster --yes

(It’s always better to review the changes once before applying those by running without--yes flag)

Step 3

Go to https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml .

Switch to the release that corresponds to your Kubernetes cluster version. Make a local copy of this yaml workload. Then, replace the placeholder <YOUR CLUSTER NAME with the name of the name of your cluster on this yaml and apply this file onto the cluster.

Boom.. That’s all !!!!

Test

Now let’s test whether it’s working. Define some resource requests on your deployment definition. Scale up a number of replicas where the total resource request exceeds the sum of capacity of minimum instances defined on the the instance group.

Let’s say I have an activemq deployment with memory request of 2GB and I’ve scheduled this onto my instancegroup where autoscaling is enabled where the minSize is 1 and Instance Type is m5.large . (Can be done using NodeSelector field available on Kubernetes Deployments).

As we know that m5.large has 8GB of memory, Autoscaler should kick in if request go beyond that limit. So let’s do the following :

kubectl scale deployment activemq --replicas=5

If we list the pods immediately, we should see few of them in pending states. If we quickly go to the EC2 dashboards or list the nodes using kubectl get nodes we can see new nodes coming up.

Once all pods started running, we can scale down replicas to 2 and wait for ten minutes to see the newly added EC2 instances getting automatically shut-down.

Things to keep in mind

It’s worthy to go through all the items listed on their read me : https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#common-notes-and-gotchas

Mainly, the following ones :

By default, cluster autoscaler will wait 10 minutes between scale down operations, you can adjust this using the --scale-down-delay-after-add, --scale-down-delay-after-delete, and --scale-down-delay-after-failure flag. E.g. --scale-down-delay-after-add=5m to decrease the scale down delay to 5 minutes after a node has been added.
By default, cluster autoscaler will not terminate nodes running pods in the kube-system namespace. You can override this default behaviour by passing in the --skip-nodes-with-system-pods=false flag.

Apart from these, it’s always important to sensibly choose the size limits of the instancegroups. Or else, some faulty configurations, automation systems or the HPA configurations can result on disappointing AWS bills.

Having said that, realistic resource requests plays the major role in making most out of this Cluster Autoscaler tool.