URL has been copied successfully!
URL has been copied successfully!
URL has been copied successfully!
URL has been copied successfully!
URL has been copied successfully!
Share:
Twitter
LinkedIn
Facebook
Reddit
Follow by Email
Copy link
Threads
Bluesky
Reading Time: 4 minutes

We’ve all been there. You upgrade a node pool, something breaks in production, and you’re left deciding whether to push forward with a fix or try to undo what you just did. Until recently, “undoing” a Kubernetes version upgrade in AKS wasn’t a first-class operation. You’d be looking at manual workarounds or blue-green setups to recover.

The new node pool rollback feature landing in AKS preview is useful because it gives you a proper recovery path without any of that. A merged PR in the Azure CLI extensions repo (#9314) adds two new commands to aks-preview: az aks nodepool rollback and az aks nodepool get-rollback-versions. Here’s what they do and when you’d want to use them.

What Is Node Pool Rollback?

The feature lets you revert a node pool back to the Kubernetes version and node image it was running before an upgrade. Both the orchestrator version and the node image (VHD) get rolled back together, so you’re not left in a mixed state.

It’s worth being clear about what this is and isn’t. It’s a version rollback, not a full state restore. Workload changes, config changes, and anything else outside of the node pool version are not affected. Think of it as an escape hatch for when an upgrade itself goes sideways.

Key things to know upfront:

  • Only available for seven days after an upgrade completes
  • Only goes back one step – no chaining rollbacks to skip multiple versions
  • No concurrent cluster operations during rollback
  • You must disable cluster autoupgrade first, or it’ll just re-upgrade after you roll back
  • Can’t roll back to a version that’s out of AKS support

Prerequisites

You’ll need the aks-preview CLI extension at the latest version and Azure CLI 2.64.0 or higher:

The feature requires API version 2025-08-02-preview or later under the hood, but as long as you’re running a current aks-preview extension you’ll be covered.

The New Commands

Checking What You Can Roll Back To

Before pulling the trigger, you can see what rollback versions are available for a node pool:

This queries the upgrade profile and surfaces the previously used versions. If the node pool hasn’t been upgraded at least once, there’s nothing to roll back to and the command will tell you so rather than returning a silent empty result.

Performing the Rollback

Once you’ve confirmed there’s a version to go back to:

That’s it. The rollback process is manual to trigger but fully automatic once started. AKS handles rolling all the nodes back to the previous version state. It’s all-or-nothing: if any node fails to roll back, the whole operation fails. This keeps the cluster in a clearly defined state rather than leaving you with a partially rolled-back pool.

Monitoring the Rollback

The two most useful options are the Activity Log on the cluster in the Azure portal. The rollback shows up as a standard AKS operation, so your existing alerting should pick it up.

When Should You Actually Use This?

The sweet spot for this feature is production incidents where you’ve upgraded a node pool and something breaks that you can’t quickly fix. Application compatibility issues, unexpected performance regressions, or infrastructure weirdness that wasn’t caught in pre-prod.

What it’s not good for, is staying on an older version indefinitely. Rollback is a temporary recovery tool. Security patches and updates come with newer versions, and rolling back exposes you to whatever vulnerabilities were addressed in the upgrade. Treat it as breathing room while you fix the underlying issue, then re-upgrade. The guidance suggests re-upgrading within days for critical security issues, within weeks for app compatibility problems, and no more than 30 days in any case.

One gotcha worth flagging. If your cluster is part of an Azure Kubernetes Fleet Manager autoupgrade profile, you need to remove it from the update group before rolling back. Otherwise Fleet Manager will just upgrade it again right after.

What About Node Image-Only Rollback?

If you ran a node image update (not a full Kubernetes version upgrade) within the last seven days, rollback will restore the previous VHD image while keeping the same Kubernetes version. So this isn’t limited to full version upgrades. Handy if a bad node image slips through.

Wrapping Up

The node pool rollback capability fills a real gap in the AKS upgrade story. It won’t replace solid upgrade strategies like blue-green node pools or proper pre-prod testing, but it gives you a genuine recovery path when production upgrades don’t go to plan.

It’s currently in preview and requires the aks-preview extension, so treat it accordingly in production environments. Worth getting familiar with before you need it though.

For more detail on the capabilities and limitations, the official AKS rollback docs are worth a read.

Share:
Twitter
LinkedIn
Facebook
Reddit
Follow by Email
Copy link
Threads
Bluesky

Pixel Robots.

I’m Richard Hooper aka Pixel Robots. I started this blog in 2016 for a couple reasons. The first reason was basically just a place for me to store my step by step guides, troubleshooting guides and just plain ideas about being a sysadmin. The second reason was to share what I have learned and found out with other people like me. Hopefully, you can find something useful on the site.

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *