Exploring Azure Kubernetes Service's Node Autoprovision: A Deep Dive into the Latest Public Preview Feature

URL has been copied successfully!

Reading Time: 8 minutes

After finally finding the time to delve into Azure Kubernetes Service’s (AKS) latest public preview feature, Node Autoprovision (NAP), I’m excited to share my insights with you. This feature, drawing from the open-source AWS tool Karpenter, promises to significantly streamline and optimize node management in AKS.

Understanding Node Autoprovision in AKS

Node Autoprovision (NAP) in AKS is a game-changer for managing node pools. As your workloads expand and diversify in complexity, needing various CPU, memory, and capability configurations, managing your VM configurations can become quite daunting. This is where NAP steps in.

NAP dynamically decides the optimal VM configuration for your pending pod resource requirements, ensuring that your workloads run efficiently and cost-effectively. This feature is rooted in the open-source Karpenter project, and its implementation in AKS is also open-source.

Getting Started with NAP

Before diving into NAP, there are a few prerequisites:

Azure Subscription: Make sure you have an active Azure subscription.
Azure CLI
aks-preview Azure CLI Extension 0.5.170 or newer: Install and update this extension to access the NAP

az extension add --name aks-preview
az extension update --name aks-preview

1 2	az extension add --name aks-preview az extension update --name aks-preview

Register NodeAutoProvisioningPreview Feature Flag:

# Register the feature flag
az feature register --namespace "Microsoft.ContainerService" --name "NodeAutoProvisioningPreview"

# Verify registration status
az feature show --namespace "Microsoft.ContainerService" --name "NodeAutoProvisioningPreview"

# Refresh the registration of the resource provider when the above command shows registerd
az provider register --namespace Microsoft.ContainerService

# Register the feature flag

az feature register --namespace "Microsoft.ContainerService" --name "NodeAutoProvisioningPreview"

# Verify registration status

az feature show --namespace "Microsoft.ContainerService" --name "NodeAutoProvisioningPreview"

# Refresh the registration of the resource provider when the above command shows registerd

az provider register --namespace Microsoft.ContainerService

Current Limitations

Currently, Windows and Azure Linux node pools are not supported, and kubelet configuration via node pool configuration is not supported. NAP is only available for new clusters.

Enabling Node Autoprovision

Ready to get started with Node Autoprovision in AKS? It’s simpler than you might think. Here’s a quick guide to help you set up a new cluster with Node Autoprovision enabled. Remember, this feature is all about making your life easier when it comes to managing node pools!

Open Your Command Line Interface: First things first, fire up your CLI. This is where all the magic happens.
Create a New AKS Cluster: You’re going to use the az aks create command. This is the heart of the operation. Make sure to include the --node-provisioning-mode parameter and set it to "Auto". This little switch is what tells AKS to use Node Autoprovisioning for your cluster. Node Autoprovisioning works best with specific networking configurations. You’ll need to use overlay networking with network Cilium dataplane. Here’s the command you’ll need:

az group create --name rg-aks-nap-uks --location uksouth

az aks create --name aks-nap-uks --resource-group rg-aks-nap-uks --node-provisioning-mode Auto --network-plugin azure --network-plugin-mode overlay --network-dataplane cilium --nodepool-taints CriticalAddonsOnly=true:NoSchedule

az group create --name rg-aks-nap-uks --location uksouth

az aks create --name aks-nap-uks --resource-group rg-aks-nap-uks --node-provisioning-mode Auto --network-plugin azure --network-plugin-mode overlay --network-dataplane cilium --nodepool-taints CriticalAddonsOnly=true:NoSchedule

You may notice I have tainted the system node pool to ensure only critical workloads can run on it. This is a best practice, but also makes our testing easier.

Execute and Await the Magic: Hit enter, and let Azure do its thing. Creating a cluster might take a few minutes, so grab a cup of coffee and watch as your new, Node Autoprovisioning-enabled AKS cluster comes to life.

And there you have it! With these simple steps, you’ve successfully enabled Node Autoprovision in your AKS cluster. This setup is going to streamline how your cluster handles workloads, automatically scaling and managing nodes based on the needs of your applications. No more manual tweaking of node pools – just efficient, automated scaling that makes managing AKS clusters a breeze. But wait there is more, there always is.

Let’s test it out

For this we are just going to use the azure vote application.

kubectl apply -f <(cat <<EOF
- apiVersion: v1
  kind: Namespace
  metadata:
    name: azure-vote
  spec:
    finalizers:
      - kubernetes
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    name: azure-vote-back
    namespace: azure-vote
  spec:
    replicas: 1
    selector:
      matchLabels:
        app: azure-vote-back
    template:
      metadata:
        labels:
          app: azure-vote-back
      spec:
        nodeSelector:
          beta.kubernetes.io/os: linux
        containers:
          - name: azure-vote-back
            image: mcr.microsoft.com/oss/bitnami/redis:6.0.8
            env:
              - name: ALLOW_EMPTY_PASSWORD
                value: 'yes'
            resources:
              requests:
                cpu: 100m
                memory: 128Mi
              limits:
                cpu: 250m
                memory: 256Mi
            ports:
              - containerPort: 6379
                name: redis
- apiVersion: v1
  kind: Service
  metadata:
    name: azure-vote-back
    namespace: azure-vote
  spec:
    ports:
      - port: 6379
    selector:
      app: azure-vote-back
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    name: azure-vote-front
    namespace: azure-vote
  spec:
    replicas: 1
    selector:
      matchLabels:
        app: azure-vote-front
    template:
      metadata:
        labels:
          app: azure-vote-front
      spec:
        nodeSelector:
          beta.kubernetes.io/os: linux
        containers:
          - name: azure-vote-front
            image: mcr.microsoft.com/azuredocs/azure-vote-front:v1
            resources:
              requests:
                cpu: 100m
                memory: 128Mi
              limits:
                cpu: 250m
                memory: 256Mi
            ports:
              - containerPort: 80
            env:
              - name: REDIS
                value: azure-vote-back
- apiVersion: v1
  kind: Service
  metadata:
    name: azure-vote-front
    namespace: azure-vote
  spec:
    type: LoadBalancer
    ports:
      - port: 80
    selector:
      app: azure-vote-front
EOF
)

kubectl apply -f <(cat <<EOF

- apiVersion: v1

kind: Namespace

metadata:

spec:

finalizers:

- kubernetes

- apiVersion: apps/v1

kind: Deployment

metadata:

namespace: azure-vote

spec:

replicas: 1

selector:

matchLabels:

app: azure-vote-back

template:

metadata:

labels:

app: azure-vote-back

spec:

nodeSelector:

beta.kubernetes.io/os: linux

containers:

- name: azure-vote-back

image: mcr.microsoft.com/oss/bitnami/redis:6.0.8

env:

- name: ALLOW_EMPTY_PASSWORD

value: 'yes'

resources:

requests:

cpu: 100m

memory: 128Mi

limits:

cpu: 250m

memory: 256Mi

ports:

- containerPort: 6379

- apiVersion: v1

kind: Service

metadata:

namespace: azure-vote

spec:

ports:

- port: 6379

selector:

app: azure-vote-back

- apiVersion: apps/v1

kind: Deployment

metadata:

namespace: azure-vote

spec:

replicas: 1

selector:

matchLabels:

app: azure-vote-front

template:

metadata:

labels:

app: azure-vote-front

spec:

nodeSelector:

beta.kubernetes.io/os: linux

containers:

- name: azure-vote-front

image: mcr.microsoft.com/azuredocs/azure-vote-front:v1

resources:

requests:

cpu: 100m

memory: 128Mi

limits:

cpu: 250m

memory: 256Mi

ports:

- containerPort: 80

env:

- name: REDIS

value: azure-vote-back

- apiVersion: v1

kind: Service

metadata:

namespace: azure-vote

spec:

type: LoadBalancer

ports:

- port: 80

selector:

app: azure-vote-front

EOF

)

After a short while a new node will be spun up for the workload to run on. This new node is one managed by Karpenter. If you do a kubectl get nodes -o wide you will notice the new node and it may even have a different Kernal Version. If you describe the node you will see a few labels relating to Karpenter.

Another way to see the new Karpenter nodes are being created is by looking at the Kubernetes events. You can use the following command for that:

kubectl get events -A --field-selector source=karpenter -w

1	kubectl get events -A --field-selector source=karpenter -w

Taking a deeper look

When diving into the realm of Node Autoprovision in AKS, one of the coolest features you’ll encounter is the use of the new Kubernetes kind NodePool. This is where the customization and optimization of your Kubernetes cluster really shines. Let’s break down how you can leverage this to tailor your cluster to your specific needs.

Starting with VM SKUs: Think of VM SKUs as your building blocks. Node Autoprovision cleverly uses these as a starting point to figure out the best possible match for your workloads, especially those that are waiting in the wings (a.k.a., in a pending state). The goal here is to ensure that your workloads are not just up and running, but doing so on the most suited VM type – balancing performance, cost, and resource availability.
Customizing Your Initial Pool: This is where you get to play architect. You have the power to dictate what VM SKUs make up your initial node pool. Whether it’s specific SKU families or particular VM types, this is your opportunity to mold the pool to fit your exact requirements. And let’s not forget about setting the maximum resources – crucial for keeping things efficient and within your desired scope.
Reserved Instances? No Problem: Got some reserved VM SKUs in your arsenal? Great! You can choose to initiate your node pool exclusively with these reserved instances. This is particularly handy for optimizing costs and making the most out of your existing investments.
Multiple Node Pool Definitions: Variety is the spice of life, and it’s no different in AKS. You can define multiple node pools within your cluster, each tailored to different needs or workloads. And while AKS rolls out a default node pool for you, remember that this isn’t set in stone. You have the flexibility to tweak and modify this default to better suit your requirements.

Let’s take a look at this default NodePool definition:

kubectl get NodePool default -o yaml

1	kubectl get NodePool default -o yaml

You will notice in the image under requirements we have some keys and values. This is where you can set the configuration you want for your node pool. Below is a list of the possible entries group.

VM SKU Family (karpenter.azure.com/sku-family): Selects the VM SKU family, like D, F, L, etc., setting a broad category for the VM.
Explicit SKU Name (karpenter.azure.com/sku-name): Specifies a precise VM SKU, such as “Standard_A1_v2.”
SKU Version (karpenter.azure.com/sku-version): Determines the SKU version, noted without the “v”, like 1 or 2.
Capacity Type (karpenter.sh/capacity-type): Chooses between Spot and On-Demand VMs for cost-effectiveness and availability.
CPU Specifications (karpenter.azure.com/sku-cpu): Defines the number of CPUs in the VM, for instance, 16.
Memory Specifications (karpenter.azure.com/sku-memory): Sets the memory size for the VM in MiB, like 131072.
GPU Name (karpenter.azure.com/sku-gpu-name): Specifies the GPU name, such as A100, for GPU-intensive tasks.
GPU Manufacturer (karpenter.azure.com/sku-gpu-manufacturer): Selects the GPU manufacturer, like Nvidia, providing a specific brand for GPU requirements.
GPU Count (karpenter.azure.com/sku-gpu-count): Determines the number of GPUs per VM, for example, 2.
Networking Acceleration (karpenter.azure.com/sku-networking-accelerated): Indicates whether the VM has accelerated networking capabilities, options being true or false.
Premium IO Storage Support (karpenter.azure.com/sku-storage-premium-capable): Specifies if the VM supports Premium IO storage, choices being true or false.
Ephemeral OS Disk Size (karpenter.azure.com/sku-storage-ephemeralos-maxsize): Sets a size limit for the Ephemeral OS disk in GB, like 70.
Availability Zone (topology.kubernetes.io/zone): Selects the Availability Zone(s) for the VM, such as [uksouth-1, uksouth-2, uksouth-3].
Operating System (kubernetes.io/os): Chooses the operating system, with Linux being the option during the preview phase.
CPU Architecture (kubernetes.io/arch): Specifies the CPU architecture, options being [amd64, arm64], important for compatibility with specific workloads.

Azure Quota and Node Pool limits

NAP is designed to intelligently align your workload scheduling with the available Azure resources. But there’s more to it – you have the power to set specific limits to ensure optimal resource usage. This can be done by adding the following to the node pool manifest we just looked at.

  # Resource limits constrain the total size of the cluster.
  # Limits prevent Karpenter from creating new instances once the limit is exceeded.
  limits:
    cpu: "1000"
    memory: 1000Gi

# Resource limits constrain the total size of the cluster.

# Limits prevent Karpenter from creating new instances once the limit is exceeded.

limits:

cpu: "1000"

memory: 1000Gi

Multiple Node Pools and Weighting

As you can have multiple node pools configured with different settings and even some that will use reserved instances it’s good to be able to set which node pool gets used first. This can be done by adding the following to the node pool manifest.

  # Priority given to the node pool when the scheduler considers which to select. Higher weights indicate higher priority when comparing node pools.
  # Specifying no weight is equivalent to specifying a weight of 0.
  weight: 10

# Priority given to the node pool when the scheduler considers which to select. Higher weights indicate higher priority when comparing node pools.

# Specifying no weight is equivalent to specifying a weight of 0.

weight: 10

AKS and Node Image Updates

As you know you have to keep you cluster updated and that is still important with NAP. But it is a bit easier as NAP manages the Kubernetes version and VM OS disk updates (node image) for you.

If for example you update your AKS control plane from version 1.27 to 1.28 the NAP nodes will automatically get updated to the new version, 1.28.

With the node pool image NAP nodes are automatically updated to the newest image version. You normally get a new node image once a week. If you want to, you can pin the nodes to a certain node image version. For this you have to edit the aksnodeclass Kubernetes object. By default aks will deploy one called default. in this manifest you need to add the imageVersion key value. You can edit this default manifest by using the following command:

kubectl edit aksnodeclass default

1	kubectl edit aksnodeclass default

To find out the latest image version number you can check the aks release notes. The imageVersion is the date portion on the Node Image as only Ubuntu 22.04 is supported, for example, “AKSUbuntu-2204-202311.07.0” would be “202311.07.0”

If you remove the imageVersion line the node pool will be updated to the latest node image version. So be careful.

Smart Scaling Down with Node Disruption in AKS

Unlike the normal behaviour of Kubernetes where it will not actively reschedule your workloads to scale down the cluster, NAP can and probably will. This can be a good thing to ensure costs are reduced and you are using the right VM size for your workload. But it’s ok you can configure this in your NodePool manifest:

disruption:
    # Describes which types of Nodes NAP should consider for consolidation
    consolidationPolicy: WhenUnderutilized | WhenEmpty
    # 'WhenUnderutilized', NAP will consider all nodes for consolidation and attempt to remove or replace Nodes when it discovers that the Node is underutilized and could be changed to reduce cost
    #  `WhenEmpty`, NAP will only consider nodes for consolidation that contain no workload pods
    # The amount of time NAP should wait after discovering a consolidation decision
    # This value can currently only be set when the consolidationPolicy is 'WhenEmpty'
    # You can choose to disable consolidation entirely by setting the string value 'Never'
    consolidateAfter: 30s

disruption:

# Describes which types of Nodes NAP should consider for consolidation

consolidationPolicy: WhenUnderutilized | WhenEmpty

# 'WhenUnderutilized', NAP will consider all nodes for consolidation and attempt to remove or replace Nodes when it discovers that the Node is underutilized and could be changed to reduce cost

# `WhenEmpty`, NAP will only consider nodes for consolidation that contain no workload pods

# The amount of time NAP should wait after discovering a consolidation decision

# This value can currently only be set when the consolidationPolicy is 'WhenEmpty'

# You can choose to disable consolidation entirely by setting the string value 'Never'

consolidateAfter: 30s

Final Thoughts

Azure’s Node Autoprovision feature, based on Karpenter, brings a new level of efficiency and automation to AKS. Its ability to dynamically provision nodes based on workload requirements is a significant step towards more cost-effective and efficient Kubernetes operations.

Exploring Azure Kubernetes Service’s Node Autoprovision: A Deep Dive into the Latest Public Preview Feature

Published by Pixel Robots. on December 19, 2023 January 11, 2024

Understanding Node Autoprovision in AKS

Getting Started with NAP

Current Limitations

Enabling Node Autoprovision

Let’s test it out

Taking a deeper look

Azure Quota and Node Pool limits

Multiple Node Pools and Weighting

AKS and Node Image Updates

Smart Scaling Down with Node Disruption in AKS

Final Thoughts

Pixel Robots.

0 Comments

Leave a Reply Cancel reply

Inspektor Gadget Is Now an AKS Extension (Preview)

Azure Container Linux for AKS: Flatcar Grows Up

AKS MaxUnavailable Fallback Now in Preview

Exploring Azure Kubernetes Service’s Node Autoprovision: A Deep Dive into the Latest Public Preview Feature

Published by Pixel Robots. on December 19, 2023 January 11, 2024

Understanding Node Autoprovision in AKS

Getting Started with NAP

Current Limitations

Enabling Node Autoprovision

Let’s test it out

Taking a deeper look

Azure Quota and Node Pool limits

Multiple Node Pools and Weighting

AKS and Node Image Updates

Smart Scaling Down with Node Disruption in AKS

Final Thoughts

Pixel Robots.

0 Comments

Leave a Reply Cancel reply

Related Posts

Inspektor Gadget Is Now an AKS Extension (Preview)

Azure Container Linux for AKS: Flatcar Grows Up

AKS MaxUnavailable Fallback Now in Preview