URL has been copied successfully!
URL has been copied successfully!
URL has been copied successfully!
URL has been copied successfully!
URL has been copied successfully!
Share:
Twitter
LinkedIn
Facebook
Reddit
Follow by Email
Copy link
Threads
Bluesky
Reading Time: 8 minutes

After writing about Ingress NGINX retiring in March 2026, I recommended a two-phase migration approach. Start with Application Routing add-on for quick stability, then migrate to Envoy Gateway as the long-term strategic choice. Several readers asked for more detail on the Envoy Gateway piece, specifically around installation and the different configuration options for AKS.

I spent the last few weeks deploying Envoy Gateway across different test clusters, trying out the various deployment patterns, and documenting the gotchas I hit. This is that guide.

Why Envoy Gateway?

When I evaluated options after the NGINX retirement announcement, Envoy Gateway stood out for a few reasons. It’s the CNCF’s reference implementation of Gateway API, which means it’s going to stay aligned with where Kubernetes ingress is headed. The architecture is different from traditional ingress controllers in a way that actually makes sense for AKS.

Instead of one big controller with a LoadBalancer service, Envoy Gateway splits things into a control plane and data plane. The control plane (what you install with Helm) just manages configuration. When you create a Gateway resource, it spins up its own data plane with Envoy proxy pods and a dedicated LoadBalancer. This makes it easier to isolate different teams or environments within the same cluster. Your development team can have their own Gateway with a private LoadBalancer while production uses a public one with different scaling characteristics.

Envoy itself is battle-tested. Companies like Lyft, Pinterest, and Microsoft run it at massive scale. The Gateway API layer on top gives you the Kubernetes-native interface without having to write Envoy config directly.

Installing the Control Plane

First, install Envoy Gateway using Helm. This only installs the control plane controller, not any load balancers yet (those come when you create Gateway resources).

Wait for it to become available:

I run two replicas and set a PodDisruptionBudget because I learned the hard way that having a single control plane pod during a node upgrade can cause delays in configuration updates. Not critical, but annoying when you’re trying to roll out changes.

A note on upgrades: 

Helm doesn’t update CRDs automatically. Before upgrading Envoy Gateway to a new version, you need to pull the new Helm chart and manually apply the CRD YAML files first. Otherwise, the new controller version might fail to reconcile your existing resources. Pull the chart with helm pull oci://docker.io/envoyproxy/gateway-helm --version <version> --untar, then apply the CRDs from the gateway-helm/crds/ directory before running helm upgrade.

Three Configuration Options

Here’s where it gets interesting. You can configure Envoy Gateway for different scenarios by creating GatewayClass resources. I typically use three different configurations depending on the workload.

Before diving into the examples, let me explain the three components you’ll see and how they work together:

EnvoyProxy – This is an Envoy Gateway-specific resource that customizes how the data plane gets deployed. Think of it as the configuration template that defines LoadBalancer annotations (public vs private, Private Link Service), pod settings (termination grace period), and shutdown behavior (drain timeouts). Platform teams create these to define infrastructure patterns.

GatewayClass – This is a cluster-scoped resource from the Gateway API spec. It acts as a template that application teams can reference. GatewayClasses point to an EnvoyProxy resource (via parametersRef) to define what kind of infrastructure gets created. Platform teams create one GatewayClass per deployment pattern (public, private, etc.), and application teams just reference the class name.

Gateway – This is the actual instance that provisions infrastructure. When you create a Gateway resource and reference a GatewayClass, Envoy Gateway automatically creates a Deployment with Envoy proxy pods and a LoadBalancer service in the envoy-gateway-system namespace. Application teams create Gateways in their own namespaces, but the infrastructure lives in the platform namespace.

The layering makes sense once you see it: EnvoyProxy defines “how to deploy”, GatewayClass packages it as a reusable template, and Gateway creates the actual running infrastructure. This separation means platform teams control infrastructure patterns while application teams self-service their own Gateways.

Option 1: Public Internet-Facing

This is the simplest setup for workloads that need to be accessible from the internet. First, create an EnvoyProxy resource with graceful shutdown settings to prevent connection drops during upgrades:

Create a GatewayClass that uses this configuration:

Then create a Gateway using that class:

Check the external IP:

Notice the service is created in envoy-gateway-system, not in the default namespace where the Gateway resource lives. This confused me at first, but it makes sense for multi-tenancy. The platform team controls the infrastructure namespace, and application teams just create Gateway resources in their own namespaces.

Option 2: Internal Private LoadBalancer (Application Gateway)

For workloads that should only be accessible within your Azure VNet (like internal APIs or admin interfaces), you need a private LoadBalancer. This EnvoyProxy resource includes both the private LoadBalancer annotation and graceful shutdown settings:

Create the GatewayClass that references this configuration:

Now create a Gateway using the private class:

Get the private IP:

This gives you a private IP address within your AKS subnet. You’ll need to test from a VM in the same VNet or through a VPN/bastion connection.

If you’re using Azure Front Door Premium and want to connect via Private Link, you need to enable Private Link Service on the LoadBalancer. This is my preferred setup for production workloads because it keeps traffic off the public internet entirely. This configuration includes Private Link Service annotations and graceful shutdown:

Create the GatewayClass:

Create the Gateway:

Verify the Private Link Service was created:

Important:

Azure creates the Private Link Service automatically, but you need to manually approve the Private Endpoint connection from Front Door before traffic will flow. I wasted 20 minutes troubleshooting connectivity before realizing I needed to approve it in the Azure Portal under the Private Link Service resource. You can also approve it via CLI:

Testing with a Real Application

Let’s deploy a simple application to test the setup. I’ll use the AKS hello-world sample app because it shows useful information about the pod handling the request:

Wait for the pods to be ready:

Routing Traffic with HTTPRoute

Now we need to connect the Gateway to your application. This is where HTTPRoute comes in.

HTTPRoute is the Gateway API resource that defines routing rules. While the Gateway handles the infrastructure (LoadBalancer, listeners, ports), HTTPRoute handles the application-level routing. It defines which hostnames to accept, which paths to match, and which backend services to route traffic to. Think of it like the Ingress resource you’re used to, but more expressive and flexible.

The key difference from traditional Ingress is the separation of concerns. Application teams create HTTPRoutes in their own namespaces and reference a Gateway via parentRefs. They don’t need to worry about LoadBalancer configuration or TLS certificates at the infrastructure level. They just define routing logic: “when a request comes in for this hostname and path, send it to this service.”

This is part of what makes Gateway API powerful for multi-tenancy. One Gateway can serve multiple HTTPRoutes from different namespaces and teams. Platform teams control the Gateway (the infrastructure), and application teams control the HTTPRoutes (the routing).

Now create an HTTPRoute to connect your Gateway to the application. Change the parentRefs.name to match whichever Gateway you created (eg-public, eg-private, or eg-private-pls):

Get the Gateway IP and test:

You should see the AKS hello-world web page in the response. If you’re using a private gateway, remember you’ll need to test from within the VNet.

Understanding the Graceful Shutdown Settings

You might have noticed the graceful shutdown configuration I included in all three EnvoyProxy examples above. I learned the importance of this the hard way when I saw 502 errors during a rolling update in my test cluster. Without these settings, Envoy pods get terminated mid-request, which causes connection drops.

Here’s what those settings actually do when a pod needs to shut down:

  1. Kubernetes sends a SIGTERM signal to the Envoy pod
  2. Envoy stops accepting new connections immediately
  3. Envoy waits up to drainTimeout (120 seconds) for existing requests to finish
  4. After minDrainDuration (5 seconds minimum), Envoy can shut down if all connections have closed
  5. Kubernetes waits up to terminationGracePeriodSeconds (300 seconds) before force-killing the pod

The 5-minute termination grace period might seem excessive, but long-lived connections (WebSockets, streaming APIs, file uploads) need time to close cleanly. The 2-minute drain timeout handles most HTTP requests while still being reasonable. The 5-second minimum prevents race conditions where connections close too quickly and clients retry into a shutting-down pod.

You can adjust these values based on your application’s connection patterns. If you only have short-lived HTTP requests, you might reduce the drain timeout. If you have long-running gRPC streams or WebSocket connections, you might increase it. But for most workloads, the values I’ve shown are a solid starting point.

Wrapping Up

That’s Envoy Gateway on AKS. You’ve got three production-ready configuration patterns (public, private, private + PLS), graceful shutdown baked in to prevent connection drops during upgrades, and a working HTTPRoute to see traffic flowing.

The obvious next step is adding HTTPS. Envoy Gateway works well with cert-manager for automatic TLS certificate management via Let’s Encrypt. You add an HTTPS listener to your Gateway, create a Certificate resource, and cert-manager handles the renewal automatically. I’ll probably write a follow-up post on that setup since the integration with Gateway API is cleaner than what we had with Ingress. The cert-manager team published their roadmap for XListenerSet support (experimental in v1.20, released Feb 2026), which will eventually restore the self-service TLS workflow that multi-tenant Ingress users are familiar with. Worth watching if you need per-team TLS management on a shared Gateway.

For more advanced features like rate limiting, authentication policies, or WAF integration with Coraza, the Envoy Gateway documentation has solid examples. The SecurityPolicy and BackendTLSPolicy resources give you fine-grained control over things that required custom annotations or ConfigMap hacks with NGINX.

If you’re coming from my NGINX retirement post, this is the Phase 2 migration I recommended. You can run Application Routing add-on and Envoy Gateway side-by-side, migrate routes incrementally, and validate everything works before fully switching over. For teams migrating from Ingress, the ingress2eg tool can convert existing Ingress resources to Envoy Gateway HTTPRoute format as a starting point.

The learning curve is real, especially if you’re used to Ingress. But the architecture makes more sense for multi-team environments, and the Gateway API feels like where Kubernetes ingress should have been from the start. I’ve been running this setup in test clusters for a few weeks now, and it’s solid.

Share:
Twitter
LinkedIn
Facebook
Reddit
Follow by Email
Copy link
Threads
Bluesky

Pixel Robots.

I’m Richard Hooper aka Pixel Robots. I started this blog in 2016 for a couple reasons. The first reason was basically just a place for me to store my step by step guides, troubleshooting guides and just plain ideas about being a sysadmin. The second reason was to share what I have learned and found out with other people like me. Hopefully, you can find something useful on the site.

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *