AKS VMSS node pools: 3 config models and real-world lessons with deallocated nodes

URL has been copied successfully!

Reading Time: 2 minutes

Intro

I recently ran into capacity issues in Azure and saw firsthand how AKS VMSS node pools behave when you use deallocated nodes to speed up start-times. I’ll walk through the three node-pool zone models documented by Microsoft, then share what really happened in my setup and how I worked around it.

The three AKS VMSS node-pool deployment methods
How deallocated nodes restart under each method
A practical, under-the-radar pattern to improve reliability

Zone-spanning node pools (single VMSS across all zones)

Azure lets you spread a single node pool across multiple availability zones by specifying all your desired zones with --zones. AKS automatically balances the number of nodes in each zone.

Information

Nodes are deployed and balanced across every zone you list in the --zones parameter.

Warning

If a zonal outage occurs, nodes within the affected zone might be impacted even though nodes in other zones stay healthy. And when you use deallocate mode, deallocated nodes restart only in their original zone, so they can stay offline during a zone’s capacity shortage.

Real story

I ran a zone-spanning pool with --scale-down-mode Deallocate. When Azure was capacity-constrained in one zone, those deallocated nodes never came back up and AKS kept retrying in the wrong zone. My jobs queued until capacity finally returned.

Zone-aligned node pools (VMSS pinned to specific zone[s])

You can add separate node pools, each pinned to a single zone, by creating one pool per zone and passing --zones <zone-number> for each.

Information

Each node pool handles only its assigned zone, giving you precise control over placement and latency.

Warning

Deallocated nodes still restart only in their pinned zone. If that zone hits capacity or suffers an outage, your pool can’t recover until the zone heals.

Real story

We switched to three zone-aligned pools, thinking AKS would pick a healthy zone to spin up deallocated nodes. It didn’t. Each pool stayed in its own zone, and scaling failed when any one zone ran out of capacity.

Regional node pools (no availability zones)

When you omit the --zones parameter (or set it to null or an empty list), AKS creates a regional VMSS. Instances show up with a zone label of 0.

Information

Instances are regional and can be implicitly placed in any zone within the region, though there’s no guarantee of even spread.

Warning

In a full zonal outage, any or all instances might be affected because they aren’t tied to a specific zone.

Real story

My jobs were stateless, single-replica workloads. I removed zone assignments so the pool became regional. When deallocated nodes restarted, Azure placed them in whichever zone had capacity. Job reliability immediately improved.

Summary

Model	Zone resilience	Deallocated node restart behavior	Good for
Zone-spanning	Yes (auto-spread)	Restarts in same zone – can stall if zone full	Stateless multi-zone workloads
Zone-aligned	Yes (fixed zone pins)	Restarts only in that zone – brittle if busy	Strict zone isolation needs
Regional	Regional (no pinning)	Restarts anywhere region-wide (best odds)	Stateless jobs and burst workloads

Why deallocate mode matters

When you set --scale-down-mode Deallocate, nodes are stopped but not deleted. That preserves cached disks, avoids repeated image pulls, and gives much faster boots. For VMSS, existing VMs restart instead of being rebuilt, cutting cold-start times dramatically.

The catch is zone capacity. If Azure can’t allocate in a node’s home zone, the deallocated node sits offline until that zone frees up. That’s what tripped me up until I switched to a regional pool.

Final take-away

Zone-spanning + Deallocate = risky when any zone hits capacity limits
Zone-aligned = predictable but brittle if your chosen zone is busy
Regional + Deallocate = unofficial but highly reliable for stateless job workloads

AKS VMSS node pools: 3 config models and real-world lessons with deallocated nodes

Published by Pixel Robots. on August 5, 2025 August 5, 2025

Intro

Zone-spanning node pools (single VMSS across all zones)

Information

Warning

Real story

Zone-aligned node pools (VMSS pinned to specific zone[s])

Information

Warning

Real story

Regional node pools (no availability zones)

Information

Warning

Real story

Summary

Why deallocate mode matters

Final take-away

Pixel Robots.

0 Comments

Leave a Reply Cancel reply

Exploring AKS Hosted System Pools in Automatic SKU (Preview)

Flatcar Container Linux on AKS: First Look

Setting Up Distributed Tracing with Tempo on AKS: Azure Blob Storage Backend and Private Link Connectivity

AKS VMSS node pools: 3 config models and real-world lessons with deallocated nodes

Published by Pixel Robots. on August 5, 2025 August 5, 2025

Intro

Zone-spanning node pools (single VMSS across all zones)

Information

Warning

Real story

Zone-aligned node pools (VMSS pinned to specific zone[s])

Information

Warning

Real story

Regional node pools (no availability zones)

Information

Warning

Real story

Summary

Why deallocate mode matters

Final take-away

Pixel Robots.

0 Comments

Leave a Reply Cancel reply

Related Posts

Exploring AKS Hosted System Pools in Automatic SKU (Preview)

Flatcar Container Linux on AKS: First Look

Setting Up Distributed Tracing with Tempo on AKS: Azure Blob Storage Backend and Private Link Connectivity