Azure Kubernetes Service (AKS): How to over-provision node pools

URL has been copied successfully!

Reading Time: 3 minutes

Sometimes you have an application that needs to scale super-fast, so fast that you can’t wait for a new Kubernetes node to spin up before your pods can be scheduled. Azure Kubernetes Service has a cluster auto scaler which can be enabled on cluster build or added after. This is awesome, it will automatically add a new node when needed based on CPU and memory. It takes time for new nodes to spin up, sometimes up to 15 minutes which is no good when you have customers waiting.

In this article I am going to show you how you can always have a node ready for when you need to scale your application. In fact, if your app starts to use any resources on this node a new one will start to spin up, so you always have one ready. This is called over provisioning.

Pod Priority

If you did not know for each pod you can actually set a pod priority. Couple this with pre-emption, another Kubernetes feature, you can specify some pods to be classed as lower priority. This means they will get removed in order to make space for the higher priority pods waiting to be scheduled, aka your application. This will cause the auto scalar to create a new node and schedule the lower priority pods on to it. Hence giving you a new node ready for your application to scale to.

To make sure we are not actually using load on the nodes and only reserving the capacity with our lower priority nodes we can use the Kubernetes pause container. This is something that comes with Kubernetes but has a different purpose. The container will basically just sleep until a signal is received, perfect for this.

So basically, we run enough of the lower priority containers to give us an extra node in our node pool. When your application needs to scale Kubernetes will then remove the lower priority pod from the node and your application pod will be scheduled. The lower priority pod will then go into pending. The cluster auto scaler will then go ahead and scale a new node.

The manifest

Below you will find the manifest needed to set this up. All you must do is configure the Replica count, the CPU requests, and nodepoolname to match your needs. It really is that simple.

If you want it for the full cluster, you can remove the node affinity section in the deployment section.

Let’s explain a little first though as you will be deploying a few resources.

First, we are creating a PriorityClass called overprovisioning and setting it to -1. Super low priority.

Next, we are creating a new ServiceAccount, ClusterRole, and ClusterRoleBinding. You will see that the new service account has limited access on the cluster. This is good for security reasons, if you exec into the auto scaler pod you will not have full access to the cluster.

Then we create the lower priority deployment. This is where you set the number of replicas and the CPU requests needed to ensure you always have one node free. You will see the container image is pause.

And finally, we create another deployment. This is actually the new auto scaler. The last line is where we match the auto scaler to the new service account.

So go ahead and apply the yaml to your cluster. You should then see some pods being created and then a new node spins up. As mentioned above you may need to tweak the replica and CPU count, but once you have that you are good to go.

To run this on multiple node pools you will need to change the deployment name of the overprovisioning deployment. Just add the node pool name to the beginning or end. An example for a windows node pool can be found at the bottom of this article. You will also notice there is an image tag. This image works with windows.

Before you deploy the manifest, you will need a new namesapce to keep things tidy. Use the following command to do that.

kubectl create namespace overprovisioning

apiVersion: scheduling.k8s.io/v1beta1
kind: PriorityClass
metadata:
  name: overprovisioning
value: -1
globalDefault: false
description: "Priority class used by overprovisioning."

---

kind: ServiceAccount
apiVersion: v1
metadata:
  name: cluster-proportional-autoscaler-overprovision
  namespace: overprovisioning

---

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: cluster-proportional-autoscaler-overprovision
rules:
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["list", "watch"]
  - apiGroups: [""]
    resources: ["replicationcontrollers/scale"]
    verbs: ["get", "update"]
  - apiGroups: ["extensions","apps"]
    resources: ["deployments/scale", "replicasets/scale"]
    verbs: ["get", "update"]
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "create"]

---

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: cluster-proportional-autoscaler-overprovision
subjects:
  - kind: ServiceAccount
    name: cluster-proportional-autoscaler-overprovision
    namespace: overprovisioning
roleRef:
  kind: ClusterRole
  name: cluster-proportional-autoscaler-overprovision
  apiGroup: rbac.authorization.k8s.io

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: overprovisioning
  namespace: overprovisioning
spec:
  replicas: 8
  selector:
    matchLabels:
      run: overprovisioning
  template:
    metadata:
      labels:
        run: overprovisioning
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: agentpool
                operator: In
                values:
                - nodepoolname  
      priorityClassName: overprovisioning
      containers:
      - name: reserve-resources
        image: k8s.gcr.io/pause
        resources:
          requests:
            cpu: "200m"

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: overprovisioning-autoscaler
  namespace: overprovisioning
  labels:
    app: overprovisioning-autoscaler
spec:
  selector:
    matchLabels:
      app: overprovisioning-autoscaler
  replicas: 1
  template:
    metadata:
      labels:
        app: overprovisioning-autoscaler
    spec:
      containers:
        - image: k8s.gcr.io/cluster-proportional-autoscaler-amd64:1.1.2
          name: autoscaler
          command:
            - ./cluster-proportional-autoscaler
            - --namespace=overprovisioning
            - --configmap=overprovisioning-autoscaler
            - --default-params={"linear":{"coresPerReplica":1}}
            - --target=deployment/overprovisioning
            - --logtostderr=true
            - --v=2
      serviceAccountName: cluster-proportional-autoscaler-overprovision

Windows node pool manifest

Just change the node pool name and set your replica count and CPU requests to your needs.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: overprovisioning-winnp1
  namespace: overprovisioning
spec:
  replicas: 8
  selector:
    matchLabels:
      run: overprovisioning-winnp1
  template:
    metadata:
      labels:
        run: overprovisioning-winnp1
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: agentpool
                operator: In
                values:
                - windowsnodepoolname  
      priorityClassName: overprovisioning
      containers:
      - name: reserve-resources
        image: k8s.gcr.io/pause:3.4.1
        resources:
          requests:
            cpu: "200m"

Thanks for reading and I hope you found this helpful. If you have any comments or suggestions on how to improve this let me know.

Azure Kubernetes Service (AKS): How to over-provision node pools

Published by Pixel Robots. on March 12, 2021 March 12, 2021

Pod Priority

The manifest

Windows node pool manifest

Pixel Robots.

0 Comments

Leave a Reply Cancel reply

AKS Node Disruption Policy is Now in Preview: Control When Your Nodes Get Reimaged

Inspektor Gadget Is Now an AKS Extension (Preview)

Azure Container Linux for AKS: Flatcar Grows Up

Azure Kubernetes Service (AKS): How to over-provision node pools

Published by Pixel Robots. on March 12, 2021 March 12, 2021

Pod Priority

The manifest

Windows node pool manifest

Pixel Robots.

0 Comments

Leave a Reply Cancel reply

Related Posts

AKS Node Disruption Policy is Now in Preview: Control When Your Nodes Get Reimaged

Inspektor Gadget Is Now an AKS Extension (Preview)

Azure Container Linux for AKS: Flatcar Grows Up