Intro
I recently ran into capacity issues in Azure and saw firsthand how AKS VMSS node pools behave when you use deallocated nodes to speed up start-times. I’ll walk through the three node-pool zone models documented by Microsoft, then share what really happened in my setup and how I worked around it.
- The three AKS VMSS node-pool deployment methods
- How deallocated nodes restart under each method
- A practical, under-the-radar pattern to improve reliability
Zone-spanning node pools (single VMSS across all zones)
Azure lets you spread a single node pool across multiple availability zones by specifying all your desired zones with --zones
. AKS automatically balances the number of nodes in each zone.
Information
Nodes are deployed and balanced across every zone you list in the --zones
parameter.
Warning
If a zonal outage occurs, nodes within the affected zone might be impacted even though nodes in other zones stay healthy. And when you use deallocate mode, deallocated nodes restart only in their original zone, so they can stay offline during a zone’s capacity shortage.
Real story
I ran a zone-spanning pool with --scale-down-mode Deallocate
. When Azure was capacity-constrained in one zone, those deallocated nodes never came back up and AKS kept retrying in the wrong zone. My jobs queued until capacity finally returned.
Zone-aligned node pools (VMSS pinned to specific zone[s])
You can add separate node pools, each pinned to a single zone, by creating one pool per zone and passing --zones <zone-number>
for each.
Information
Each node pool handles only its assigned zone, giving you precise control over placement and latency.
Warning
Deallocated nodes still restart only in their pinned zone. If that zone hits capacity or suffers an outage, your pool can’t recover until the zone heals.
Real story
We switched to three zone-aligned pools, thinking AKS would pick a healthy zone to spin up deallocated nodes. It didn’t. Each pool stayed in its own zone, and scaling failed when any one zone ran out of capacity.
Regional node pools (no availability zones)
When you omit the --zones
parameter (or set it to null
or an empty list), AKS creates a regional VMSS. Instances show up with a zone label of 0
.
Information
Instances are regional and can be implicitly placed in any zone within the region, though there’s no guarantee of even spread.
Warning
In a full zonal outage, any or all instances might be affected because they aren’t tied to a specific zone.
Real story
My jobs were stateless, single-replica workloads. I removed zone assignments so the pool became regional. When deallocated nodes restarted, Azure placed them in whichever zone had capacity. Job reliability immediately improved.
Summary
Model | Zone resilience | Deallocated node restart behavior | Good for |
---|---|---|---|
Zone-spanning | Yes (auto-spread) | Restarts in same zone – can stall if zone full | Stateless multi-zone workloads |
Zone-aligned | Yes (fixed zone pins) | Restarts only in that zone – brittle if busy | Strict zone isolation needs |
Regional | Regional (no pinning) | Restarts anywhere region-wide (best odds) | Stateless jobs and burst workloads |
Why deallocate mode matters
When you set --scale-down-mode Deallocate
, nodes are stopped but not deleted. That preserves cached disks, avoids repeated image pulls, and gives much faster boots. For VMSS, existing VMs restart instead of being rebuilt, cutting cold-start times dramatically.
The catch is zone capacity. If Azure can’t allocate in a node’s home zone, the deallocated node sits offline until that zone frees up. That’s what tripped me up until I switched to a regional pool.
Final take-away
- Zone-spanning + Deallocate = risky when any zone hits capacity limits
- Zone-aligned = predictable but brittle if your chosen zone is busy
- Regional + Deallocate = unofficial but highly reliable for stateless job workloads
0 Comments