Reading Time: 8 minutes
Share:
Twitter
LinkedIn
Facebook
Reddit
Whatsapp
Follow by Email

After finally finding the time to delve into Azure Kubernetes Service’s (AKS) latest public preview feature, Node Autoprovision (NAP), I’m excited to share my insights with you. This feature, drawing from the open-source AWS tool Karpenter, promises to significantly streamline and optimize node management in AKS.

Understanding Node Autoprovision in AKS

Node Autoprovision (NAP) in AKS is a game-changer for managing node pools. As your workloads expand and diversify in complexity, needing various CPU, memory, and capability configurations, managing your VM configurations can become quite daunting. This is where NAP steps in.

NAP dynamically decides the optimal VM configuration for your pending pod resource requirements, ensuring that your workloads run efficiently and cost-effectively. This feature is rooted in the open-source Karpenter project, and its implementation in AKS is also open-source.

Getting Started with NAP

Before diving into NAP, there are a few prerequisites:

  1. Azure Subscription: Make sure you have an active Azure subscription.
  2. Azure CLI
  3. aks-preview Azure CLI Extension 0.5.170 or newer: Install and update this extension to access the NAP
  1. Register NodeAutoProvisioningPreview Feature Flag:

Current Limitations

Currently, Windows and Azure Linux node pools are not supported, and kubelet configuration via node pool configuration is not supported. NAP is only available for new clusters.

Enabling Node Autoprovision

Ready to get started with Node Autoprovision in AKS? It’s simpler than you might think. Here’s a quick guide to help you set up a new cluster with Node Autoprovision enabled. Remember, this feature is all about making your life easier when it comes to managing node pools!

  1. Open Your Command Line Interface: First things first, fire up your CLI. This is where all the magic happens.
  2. Create a New AKS Cluster: You’re going to use the az aks create command. This is the heart of the operation. Make sure to include the --node-provisioning-mode parameter and set it to "Auto". This little switch is what tells AKS to use Node Autoprovisioning for your cluster. Node Autoprovisioning works best with specific networking configurations. You’ll need to use overlay networking with network Cilium dataplane. Here’s the command you’ll need:

You may notice I have tainted the system node pool to ensure only critical workloads can run on it. This is a best practice, but also makes our testing easier.

  1. Execute and Await the Magic: Hit enter, and let Azure do its thing. Creating a cluster might take a few minutes, so grab a cup of coffee and watch as your new, Node Autoprovisioning-enabled AKS cluster comes to life.

And there you have it! With these simple steps, you’ve successfully enabled Node Autoprovision in your AKS cluster. This setup is going to streamline how your cluster handles workloads, automatically scaling and managing nodes based on the needs of your applications. No more manual tweaking of node pools – just efficient, automated scaling that makes managing AKS clusters a breeze. But wait there is more, there always is.

Let’s test it out

For this we are just going to use the azure vote application.

After a short while a new node will be spun up for the workload to run on. This new node is one managed by Karpenter. If you do a kubectl get nodes -o wide you will notice the new node and it may even have a different Kernal Version. If you describe the node you will see a few labels relating to Karpenter.

Another way to see the new Karpenter nodes are being created is by looking at the Kubernetes events. You can use the following command for that:

Taking a deeper look

When diving into the realm of Node Autoprovision in AKS, one of the coolest features you’ll encounter is the use of the new Kubernetes kind NodePool. This is where the customization and optimization of your Kubernetes cluster really shines. Let’s break down how you can leverage this to tailor your cluster to your specific needs.

  1. Starting with VM SKUs: Think of VM SKUs as your building blocks. Node Autoprovision cleverly uses these as a starting point to figure out the best possible match for your workloads, especially those that are waiting in the wings (a.k.a., in a pending state). The goal here is to ensure that your workloads are not just up and running, but doing so on the most suited VM type – balancing performance, cost, and resource availability.
  2. Customizing Your Initial Pool: This is where you get to play architect. You have the power to dictate what VM SKUs make up your initial node pool. Whether it’s specific SKU families or particular VM types, this is your opportunity to mold the pool to fit your exact requirements. And let’s not forget about setting the maximum resources – crucial for keeping things efficient and within your desired scope.
  3. Reserved Instances? No Problem: Got some reserved VM SKUs in your arsenal? Great! You can choose to initiate your node pool exclusively with these reserved instances. This is particularly handy for optimizing costs and making the most out of your existing investments.
  4. Multiple Node Pool Definitions: Variety is the spice of life, and it’s no different in AKS. You can define multiple node pools within your cluster, each tailored to different needs or workloads. And while AKS rolls out a default node pool for you, remember that this isn’t set in stone. You have the flexibility to tweak and modify this default to better suit your requirements.

Let’s take a look at this default NodePool definition:

You will notice in the image under requirements we have some keys and values. This is where you can set the configuration you want for your node pool. Below is a list of the possible entries group.

  1. VM SKU Family (karpenter.azure.com/sku-family): Selects the VM SKU family, like D, F, L, etc., setting a broad category for the VM.
  2. Explicit SKU Name (karpenter.azure.com/sku-name): Specifies a precise VM SKU, such as “Standard_A1_v2.”
  3. SKU Version (karpenter.azure.com/sku-version): Determines the SKU version, noted without the “v”, like 1 or 2.
  4. Capacity Type (karpenter.sh/capacity-type): Chooses between Spot and On-Demand VMs for cost-effectiveness and availability.
  5. CPU Specifications (karpenter.azure.com/sku-cpu): Defines the number of CPUs in the VM, for instance, 16.
  6. Memory Specifications (karpenter.azure.com/sku-memory): Sets the memory size for the VM in MiB, like 131072.
  7. GPU Name (karpenter.azure.com/sku-gpu-name): Specifies the GPU name, such as A100, for GPU-intensive tasks.
  8. GPU Manufacturer (karpenter.azure.com/sku-gpu-manufacturer): Selects the GPU manufacturer, like Nvidia, providing a specific brand for GPU requirements.
  9. GPU Count (karpenter.azure.com/sku-gpu-count): Determines the number of GPUs per VM, for example, 2.
  10. Networking Acceleration (karpenter.azure.com/sku-networking-accelerated): Indicates whether the VM has accelerated networking capabilities, options being true or false.
  11. Premium IO Storage Support (karpenter.azure.com/sku-storage-premium-capable): Specifies if the VM supports Premium IO storage, choices being true or false.
  12. Ephemeral OS Disk Size (karpenter.azure.com/sku-storage-ephemeralos-maxsize): Sets a size limit for the Ephemeral OS disk in GB, like 70.
  13. Availability Zone (topology.kubernetes.io/zone): Selects the Availability Zone(s) for the VM, such as [uksouth-1, uksouth-2, uksouth-3].
  14. Operating System (kubernetes.io/os): Chooses the operating system, with Linux being the option during the preview phase.
  15. CPU Architecture (kubernetes.io/arch): Specifies the CPU architecture, options being [amd64, arm64], important for compatibility with specific workloads.

Azure Quota and Node Pool limits

NAP is designed to intelligently align your workload scheduling with the available Azure resources. But there’s more to it – you have the power to set specific limits to ensure optimal resource usage. This can be done by adding the following to the node pool manifest we just looked at.

Multiple Node Pools and Weighting

As you can have multiple node pools configured with different settings and even some that will use reserved instances it’s good to be able to set which node pool gets used first. This can be done by adding the following to the node pool manifest.

AKS and Node Image Updates

As you know you have to keep you cluster updated and that is still important with NAP. But it is a bit easier as NAP manages the Kubernetes version and VM OS disk updates (node image) for you.

If for example you update your AKS control plane from version 1.27 to 1.28 the NAP nodes will automatically get updated to the new version, 1.28.

With the node pool image NAP nodes are automatically updated to the newest image version. You normally get a new node image once a week. If you want to, you can pin the nodes to a certain node image version. For this you have to edit the aksnodeclass Kubernetes object. By default aks will deploy one called default. in this manifest you need to add the imageVersion key value. You can edit this default manifest by using the following command:

To find out the latest image version number you can check the aks release notes. The imageVersion is the date portion on the Node Image as only Ubuntu 22.04 is supported, for example, “AKSUbuntu-2204-202311.07.0” would be “202311.07.0”

If you remove the imageVersion line the node pool will be updated to the latest node image version. So be careful.

Smart Scaling Down with Node Disruption in AKS

Unlike the normal behaviour of Kubernetes where it will not actively reschedule your workloads to scale down the cluster, NAP can and probably will. This can be a good thing to ensure costs are reduced and you are using the right VM size for your workload. But it’s ok you can configure this in your NodePool manifest:

Final Thoughts

Azure’s Node Autoprovision feature, based on Karpenter, brings a new level of efficiency and automation to AKS. Its ability to dynamically provision nodes based on workload requirements is a significant step towards more cost-effective and efficient Kubernetes operations.

Share:
Twitter
LinkedIn
Facebook
Reddit
Whatsapp
Follow by Email

Pixel Robots.

I’m Richard Hooper aka Pixel Robots. I started this blog in 2016 for a couple reasons. The first reason was basically just a place for me to store my step by step guides, troubleshooting guides and just plain ideas about being a sysadmin. The second reason was to share what I have learned and found out with other people like me. Hopefully, you can find something useful on the site.

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *