After writing about Ingress NGINX retiring in March 2026, I recommended a two-phase migration approach. Start with Application Routing add-on for quick stability, then migrate to Envoy Gateway as the long-term strategic choice. Several readers asked for more detail on the Envoy Gateway piece, specifically around installation and the different configuration options for AKS.
I spent the last few weeks deploying Envoy Gateway across different test clusters, trying out the various deployment patterns, and documenting the gotchas I hit. This is that guide.
Why Envoy Gateway?
When I evaluated options after the NGINX retirement announcement, Envoy Gateway stood out for a few reasons. It’s the CNCF’s reference implementation of Gateway API, which means it’s going to stay aligned with where Kubernetes ingress is headed. The architecture is different from traditional ingress controllers in a way that actually makes sense for AKS.
Instead of one big controller with a LoadBalancer service, Envoy Gateway splits things into a control plane and data plane. The control plane (what you install with Helm) just manages configuration. When you create a Gateway resource, it spins up its own data plane with Envoy proxy pods and a dedicated LoadBalancer. This makes it easier to isolate different teams or environments within the same cluster. Your development team can have their own Gateway with a private LoadBalancer while production uses a public one with different scaling characteristics.
Envoy itself is battle-tested. Companies like Lyft, Pinterest, and Microsoft run it at massive scale. The Gateway API layer on top gives you the Kubernetes-native interface without having to write Envoy config directly.
Installing the Control Plane
First, install Envoy Gateway using Helm. This only installs the control plane controller, not any load balancers yet (those come when you create Gateway resources).
|
1 2 3 4 5 6 7 8 9 10 11 12 |
EG_VERSION="v1.7.0" helm install eg oci://docker.io/envoyproxy/gateway-helm \ --version "$EG_VERSION" \ -n envoy-gateway-system \ --create-namespace \ --set deployment.replicas=2 \ --set podDisruptionBudget.minAvailable=1 \ --set deployment.envoyGateway.resources.requests.cpu=250m \ --set deployment.envoyGateway.resources.requests.memory=256Mi \ --set deployment.envoyGateway.resources.limits.cpu=1 \ --set deployment.envoyGateway.resources.limits.memory=1024Mi |
Wait for it to become available:
|
1 2 3 4 |
kubectl wait deployment envoy-gateway \ -n envoy-gateway-system \ --for=condition=Available=True \ --timeout=5m |
I run two replicas and set a PodDisruptionBudget because I learned the hard way that having a single control plane pod during a node upgrade can cause delays in configuration updates. Not critical, but annoying when you’re trying to roll out changes.
A note on upgrades:
Helm doesn’t update CRDs automatically. Before upgrading Envoy Gateway to a new version, you need to pull the new Helm chart and manually apply the CRD YAML files first. Otherwise, the new controller version might fail to reconcile your existing resources. Pull the chart with helm pull oci://docker.io/envoyproxy/gateway-helm --version <version> --untar, then apply the CRDs from the gateway-helm/crds/ directory before running helm upgrade.
Three Configuration Options
Here’s where it gets interesting. You can configure Envoy Gateway for different scenarios by creating GatewayClass resources. I typically use three different configurations depending on the workload.
Before diving into the examples, let me explain the three components you’ll see and how they work together:
EnvoyProxy – This is an Envoy Gateway-specific resource that customizes how the data plane gets deployed. Think of it as the configuration template that defines LoadBalancer annotations (public vs private, Private Link Service), pod settings (termination grace period), and shutdown behavior (drain timeouts). Platform teams create these to define infrastructure patterns.
GatewayClass – This is a cluster-scoped resource from the Gateway API spec. It acts as a template that application teams can reference. GatewayClasses point to an EnvoyProxy resource (via parametersRef) to define what kind of infrastructure gets created. Platform teams create one GatewayClass per deployment pattern (public, private, etc.), and application teams just reference the class name.
Gateway – This is the actual instance that provisions infrastructure. When you create a Gateway resource and reference a GatewayClass, Envoy Gateway automatically creates a Deployment with Envoy proxy pods and a LoadBalancer service in the envoy-gateway-system namespace. Application teams create Gateways in their own namespaces, but the infrastructure lives in the platform namespace.
The layering makes sense once you see it: EnvoyProxy defines “how to deploy”, GatewayClass packages it as a reusable template, and Gateway creates the actual running infrastructure. This separation means platform teams control infrastructure patterns while application teams self-service their own Gateways.
Option 1: Public Internet-Facing
This is the simplest setup for workloads that need to be accessible from the internet. First, create an EnvoyProxy resource with graceful shutdown settings to prevent connection drops during upgrades:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
cat <<EOF | kubectl apply -f - apiVersion: gateway.envoyproxy.io/v1alpha1 kind: EnvoyProxy metadata: name: public-proxy namespace: envoy-gateway-system spec: provider: type: Kubernetes kubernetes: envoyDeployment: pod: terminationGracePeriodSeconds: 300 shutdown: drainTimeout: 120s minDrainDuration: 5s EOF |
Create a GatewayClass that uses this configuration:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
cat <<EOF | kubectl apply -f - apiVersion: gateway.networking.k8s.io/v1 kind: GatewayClass metadata: name: eg-public spec: controllerName: gateway.envoyproxy.io/gatewayclass-controller parametersRef: group: gateway.envoyproxy.io kind: EnvoyProxy name: public-proxy namespace: envoy-gateway-system EOF |
Then create a Gateway using that class:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
cat <<EOF | kubectl apply -f - apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: eg-public namespace: default spec: gatewayClassName: eg-public listeners: - name: http protocol: HTTP port: 80 EOF |
Check the external IP:
|
1 |
kubectl get svc -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gateway-name=eg-public |
Notice the service is created in envoy-gateway-system, not in the default namespace where the Gateway resource lives. This confused me at first, but it makes sense for multi-tenancy. The platform team controls the infrastructure namespace, and application teams just create Gateway resources in their own namespaces.
Option 2: Internal Private LoadBalancer (Application Gateway)
For workloads that should only be accessible within your Azure VNet (like internal APIs or admin interfaces), you need a private LoadBalancer. This EnvoyProxy resource includes both the private LoadBalancer annotation and graceful shutdown settings:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
cat <<EOF | kubectl apply -f - apiVersion: gateway.envoyproxy.io/v1alpha1 kind: EnvoyProxy metadata: name: internal-proxy namespace: envoy-gateway-system spec: provider: type: Kubernetes kubernetes: envoyService: type: LoadBalancer annotations: service.beta.kubernetes.io/azure-load-balancer-internal: "true" envoyDeployment: pod: terminationGracePeriodSeconds: 300 shutdown: drainTimeout: 120s minDrainDuration: 5s EOF |
Create the GatewayClass that references this configuration:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
cat <<EOF | kubectl apply -f - apiVersion: gateway.networking.k8s.io/v1 kind: GatewayClass metadata: name: eg-private spec: controllerName: gateway.envoyproxy.io/gatewayclass-controller parametersRef: group: gateway.envoyproxy.io kind: EnvoyProxy name: internal-proxy namespace: envoy-gateway-system EOF |
Now create a Gateway using the private class:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
cat <<EOF | kubectl apply -f - apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: eg-private namespace: default spec: gatewayClassName: eg-private listeners: - name: http protocol: HTTP port: 80 EOF |
Get the private IP:
|
1 |
kubectl get svc -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gateway-name=eg-private |
This gives you a private IP address within your AKS subnet. You’ll need to test from a VM in the same VNet or through a VPN/bastion connection.
Option 3: Private + Private Link Service (for Azure Front Door)
If you’re using Azure Front Door Premium and want to connect via Private Link, you need to enable Private Link Service on the LoadBalancer. This is my preferred setup for production workloads because it keeps traffic off the public internet entirely. This configuration includes Private Link Service annotations and graceful shutdown:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
cat <<EOF | kubectl apply -f - apiVersion: gateway.envoyproxy.io/v1alpha1 kind: EnvoyProxy metadata: name: internal-pls-proxy namespace: envoy-gateway-system spec: provider: type: Kubernetes kubernetes: envoyService: type: LoadBalancer annotations: service.beta.kubernetes.io/azure-load-balancer-internal: "true" service.beta.kubernetes.io/azure-pls-create: "true" service.beta.kubernetes.io/azure-pls-name: "envoy-gateway-pls" service.beta.kubernetes.io/azure-pls-proxy-protocol: "false" envoyDeployment: pod: terminationGracePeriodSeconds: 300 shutdown: drainTimeout: 120s minDrainDuration: 5s EOF |
Create the GatewayClass:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
cat <<EOF | kubectl apply -f - apiVersion: gateway.networking.k8s.io/v1 kind: GatewayClass metadata: name: eg-private-pls spec: controllerName: gateway.envoyproxy.io/gatewayclass-controller parametersRef: group: gateway.envoyproxy.io kind: EnvoyProxy name: internal-pls-proxy namespace: envoy-gateway-system EOF |
Create the Gateway:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
cat <<EOF | kubectl apply -f - apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: eg-private-pls namespace: default spec: gatewayClassName: eg-private-pls listeners: - name: http protocol: HTTP port: 80 EOF |
Verify the Private Link Service was created:
|
1 2 |
NODE_RG=$(az aks show -g <your-resource-group> -n <your-aks-cluster> --query nodeResourceGroup -o tsv) az network private-link-service list -g $NODE_RG -o table |
Important:
Azure creates the Private Link Service automatically, but you need to manually approve the Private Endpoint connection from Front Door before traffic will flow. I wasted 20 minutes troubleshooting connectivity before realizing I needed to approve it in the Azure Portal under the Private Link Service resource. You can also approve it via CLI:
|
1 2 3 4 5 |
az network private-endpoint-connection approve \ --resource-group $NODE_RG \ --service-name envoy-gateway-pls \ --name <connection-name> \ --description "Approved for Front Door" |
Testing with a Real Application
Let’s deploy a simple application to test the setup. I’ll use the AKS hello-world sample app because it shows useful information about the pod handling the request:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
cat <<EOF | kubectl apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: aks-helloworld namespace: default --- apiVersion: v1 kind: Service metadata: name: aks-helloworld namespace: default spec: type: ClusterIP ports: - port: 80 targetPort: 80 selector: app: aks-helloworld --- apiVersion: apps/v1 kind: Deployment metadata: name: aks-helloworld namespace: default spec: replicas: 2 selector: matchLabels: app: aks-helloworld template: metadata: labels: app: aks-helloworld spec: serviceAccountName: aks-helloworld containers: - name: aks-helloworld image: mcr.microsoft.com/azuredocs/aks-helloworld:v1 ports: - containerPort: 80 env: - name: TITLE value: "Envoy Gateway on AKS" EOF |
Wait for the pods to be ready:
|
1 |
kubectl wait --timeout=60s --for=condition=available deployment/aks-helloworld |
Routing Traffic with HTTPRoute
Now we need to connect the Gateway to your application. This is where HTTPRoute comes in.
HTTPRoute is the Gateway API resource that defines routing rules. While the Gateway handles the infrastructure (LoadBalancer, listeners, ports), HTTPRoute handles the application-level routing. It defines which hostnames to accept, which paths to match, and which backend services to route traffic to. Think of it like the Ingress resource you’re used to, but more expressive and flexible.
The key difference from traditional Ingress is the separation of concerns. Application teams create HTTPRoutes in their own namespaces and reference a Gateway via parentRefs. They don’t need to worry about LoadBalancer configuration or TLS certificates at the infrastructure level. They just define routing logic: “when a request comes in for this hostname and path, send it to this service.”
This is part of what makes Gateway API powerful for multi-tenancy. One Gateway can serve multiple HTTPRoutes from different namespaces and teams. Platform teams control the Gateway (the infrastructure), and application teams control the HTTPRoutes (the routing).
Now create an HTTPRoute to connect your Gateway to the application. Change the parentRefs.name to match whichever Gateway you created (eg-public, eg-private, or eg-private-pls):
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
cat <<EOF | kubectl apply -f - apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: aks-helloworld namespace: default spec: parentRefs: - name: eg-public # Change to eg-private or eg-private-pls if needed rules: - backendRefs: - name: aks-helloworld port: 80 matches: - path: type: PathPrefix value: / EOF |
Get the Gateway IP and test:
|
1 2 3 |
GATEWAY_HOST=$(kubectl get svc -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gateway-name=eg-public -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}') curl http://$GATEWAY_HOST/ |
You should see the AKS hello-world web page in the response. If you’re using a private gateway, remember you’ll need to test from within the VNet.
Understanding the Graceful Shutdown Settings
You might have noticed the graceful shutdown configuration I included in all three EnvoyProxy examples above. I learned the importance of this the hard way when I saw 502 errors during a rolling update in my test cluster. Without these settings, Envoy pods get terminated mid-request, which causes connection drops.
Here’s what those settings actually do when a pod needs to shut down:
- Kubernetes sends a SIGTERM signal to the Envoy pod
- Envoy stops accepting new connections immediately
- Envoy waits up to
drainTimeout(120 seconds) for existing requests to finish - After
minDrainDuration(5 seconds minimum), Envoy can shut down if all connections have closed - Kubernetes waits up to
terminationGracePeriodSeconds(300 seconds) before force-killing the pod
The 5-minute termination grace period might seem excessive, but long-lived connections (WebSockets, streaming APIs, file uploads) need time to close cleanly. The 2-minute drain timeout handles most HTTP requests while still being reasonable. The 5-second minimum prevents race conditions where connections close too quickly and clients retry into a shutting-down pod.
You can adjust these values based on your application’s connection patterns. If you only have short-lived HTTP requests, you might reduce the drain timeout. If you have long-running gRPC streams or WebSocket connections, you might increase it. But for most workloads, the values I’ve shown are a solid starting point.
Wrapping Up
That’s Envoy Gateway on AKS. You’ve got three production-ready configuration patterns (public, private, private + PLS), graceful shutdown baked in to prevent connection drops during upgrades, and a working HTTPRoute to see traffic flowing.
The obvious next step is adding HTTPS. Envoy Gateway works well with cert-manager for automatic TLS certificate management via Let’s Encrypt. You add an HTTPS listener to your Gateway, create a Certificate resource, and cert-manager handles the renewal automatically. I’ll probably write a follow-up post on that setup since the integration with Gateway API is cleaner than what we had with Ingress. The cert-manager team published their roadmap for XListenerSet support (experimental in v1.20, released Feb 2026), which will eventually restore the self-service TLS workflow that multi-tenant Ingress users are familiar with. Worth watching if you need per-team TLS management on a shared Gateway.
For more advanced features like rate limiting, authentication policies, or WAF integration with Coraza, the Envoy Gateway documentation has solid examples. The SecurityPolicy and BackendTLSPolicy resources give you fine-grained control over things that required custom annotations or ConfigMap hacks with NGINX.
If you’re coming from my NGINX retirement post, this is the Phase 2 migration I recommended. You can run Application Routing add-on and Envoy Gateway side-by-side, migrate routes incrementally, and validate everything works before fully switching over. For teams migrating from Ingress, the ingress2eg tool can convert existing Ingress resources to Envoy Gateway HTTPRoute format as a starting point.
The learning curve is real, especially if you’re used to Ingress. But the architecture makes more sense for multi-team environments, and the Gateway API feels like where Kubernetes ingress should have been from the start. I’ve been running this setup in test clusters for a few weeks now, and it’s solid.
0 Comments