Introduction
Hello, fellow Kubernetes enthusiasts! Welcome back to Pixel Robots! Today, we’re diving into a powerful new feature thats currently in preview for Azure Kubernetes Service (AKS) – Advanced Network Observability. If you’ve ever wished for superpowers or just Cilium Hubble to monitor and diagnose your container network, this is the feature for you. It’s part of the Advanced Container Networking Services suite and provides you with the insights you need to keep your containerized workloads running smoothly. Let’s explore what Advanced Network Observability is all about, why it’s a game-changer, and how you can start using it in your AKS clusters.
What is Advanced Network Observability?
Advanced Network Observability is like having a magnifying glass for your container network. It integrates effortlessly with both Cilium and non-Cilium data planes, giving you flexibility and control. Whether you’re tracking traffic volume, dropped packets, or DNS issues, this feature delivers detailed metrics and logs to help you troubleshoot and optimize your network performance.
Let’s look at some key features and benefits of the new offering.
Key Features
Node-Level Metrics
Monitor the health of your container network at the node level with metrics stored in Prometheus format. Visualize these metrics in Grafana to gain insights into traffic volume, dropped packets, and more.
Hubble Metrics (DNS and Pod-Level Metrics)
Hubble metrics provide granular details on source and destination pods, traffic volume, and more. For non-Cilium data planes, you also get DNS metrics covering DNS errors and requests without responses.
Hubble Flow Logs
Flow logs are your go-to for deep visibility into network activity. They log all communications to and from pods, helping you troubleshoot connectivity issues effectively.
- Hubble CLI: Fetches flow logs across the cluster with customizable filtering and formatting.
- Hubble UI: A user-friendly browser interface to explore network activity, displaying service-connection graphs and flow logs.
Benefits
- CNI-Agnostic: Works with all Azure CNI variants, including kubenet.
- Uniform Experience: Consistent performance across Cilium and non-Cilium data planes.
- eBPF-Based Observability: Utilizes eBPF for efficient and scalable network monitoring.
- Detailed Network Visibility: Comprehensive network flow logs help you understand application communication.
- Flexible Metrics Storage: Use Azure Managed Prometheus and Grafana or bring your own instances.
Metrics
Node-Level Metrics
Advanced Network Observability aggregates several node-level metrics to help you keep tabs on your cluster’s health. These metrics are labeled by cluster
and instance
(Node name).
Non-Cilium
For non-Cilium data planes, you get metrics for both Linux and Windows. Here’s a rundown of what you’ll be monitoring:
Metric Name | Description | Extra Labels | Linux | Windows |
---|---|---|---|---|
networkobservability_forward_count | Total forwarded packet count | direction | ✅ | ✅ |
networkobservability_forward_bytes | Total forwarded byte count | direction | ✅ | ✅ |
networkobservability_drop_count | Total dropped packet count | direction , reason | ✅ | ✅ |
networkobservability_drop_bytes | Total dropped byte count | direction , reason | ✅ | ✅ |
networkobservability_tcp_state | TCP currently active socket count by TCP state | state | ✅ | ✅ |
networkobservability_tcp_connection_remote | TCP currently active socket count by remote IP/port | address (IP), port | ✅ | ❌ |
networkobservability_tcp_connection_stats | TCP connection statistics (e.g., Delayed ACKs, TCPKeepAlive, TCPSackFailures) | statistic | ✅ | ✅ |
networkobservability_tcp_flag_counters | TCP packets count by flag | flag | ❌ | ✅ |
networkobservability_ip_connection_stats | IP connection statistics | statistic | ✅ | ❌ |
networkobservability_udp_connection_stats | UDP connection statistics | statistic | ✅ | ❌ |
networkobservability_udp_active_sockets | UDP currently active socket count | ✅ | ❌ | |
networkobservability_interface_stats | Interface statistics | InterfaceName , statistic | ✅ | ✅ |
Cilium
For clusters using the Cilium data plane, Advanced Network Observability supports only Linux (sorry, Windows folks). Here are the key metrics:
Metric Name | Description | Extra Labels | Linux | Windows |
---|---|---|---|---|
cilium_forward_count_total | Total forwarded packet count | direction | ✅ | ❌ |
cilium_forward_bytes_total | Total forwarded byte count | direction | ✅ | ❌ |
cilium_drop_count_total | Total dropped packet count | direction , reason | ✅ | ❌ |
cilium_drop_bytes_total | Total dropped byte count | direction , reason | ✅ | ❌ |
Pod-Level Metrics (Hubble Metrics)
Pod-level metrics provide a deep dive into traffic data for individual pods, labeled by cluster
, instance
(Node name), and either source
or destination
. Here’s what you’ll be tracking:
Metric Name | Description | Extra Labels | Linux | Windows |
---|---|---|---|---|
hubble_dns_queries_total | Total DNS requests by query | source or destination , query , qtypes (query type) | ✅ | ❌ |
hubble_dns_responses_total | Total DNS responses by query/response | source or destination , query , qtypes (query type), rcode (return code), ips_returned (number of IPs) | ✅ | ❌ |
hubble_drop_total | Total dropped packet count | source or destination , protocol , reason | ✅ | ❌ |
hubble_tcp_flags_total | Total TCP packets count by flag | source or destination , flag | ✅ | ❌ |
hubble_flows_processed_total | Total network flows processed (L4/L7 traffic) | source or destination , protocol , verdict , type , subtype | ✅ | ❌ |
Limitations
- Pod-level metrics are available only on Linux.
- Cilium data plane support starts with Kubernetes version 1.29.
- Metric labels might have slight differences between Cilium and non-Cilium clusters.
- Cilium data plane does not currently support DNS metrics.
Getting Started with Advanced Network Observability
Prerequisites
- Azure CLI version 2.56.0 or newer.
Install the AKS-Preview CLI Extension
First things first, let’s get the necessary CLI extension installed and updated.
–To install:
1 |
az extension add --name aks-preview |
– To update (making sure you’re on the latest version):
1 |
az extension update --name aks-preview |
Register the Feature Flag
Next up, you’ll need to register the AdvancedNetworkingPreview
feature flag. This might take a few minutes to reflect as ‘Registered’.
– To register:
1 |
az feature register --namespace "Microsoft.ContainerService" --name "AdvancedNetworkingPreview" |
– Check the registration status:
1 |
az feature show --namespace "Microsoft.ContainerService" --name "AdvancedNetworkingPreview" |
Once registered, refresh your resource provider registration with:
1 |
az provider register --namespace Microsoft.ContainerService |
Create a Resource Group
1 2 3 4 |
export RESOURCE_GROUP="your-resource-group" export LOCATION="your-location" az group create --name $RESOURCE_GROUP --location $LOCATION |
Create an AKS Cluster with Advanced Network Observability
For non-Cilium data planes:
1 2 3 4 5 6 7 8 9 10 |
export CLUSTER_NAME="your-cluster-name" az aks create \ --name $CLUSTER_NAME \ --resource-group $RESOURCE_GROUP \ --generate-ssh-keys \ --network-plugin azure \ --network-plugin-mode overlay \ --pod-cidr 192.168.0.0/16 \ --enable-advanced-network-observability |
For Cilium data planes (Kubernetes version 1.29+):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
export CLUSTER_NAME="your-cluster-name" az aks create \ --name $CLUSTER_NAME \ --resource-group $RESOURCE_GROUP \ --generate-ssh-keys \ --max-pods 250 \ --network-plugin azure \ --network-plugin-mode overlay \ --network-dataplane cilium \ --node-count 2 \ --pod-cidr 192.168.0.0/16 \ --kubernetes-version 1.29 \ --enable-advanced-network-observability |
Enable Advanced Network Observability on an Existing Cluster
1 2 3 4 5 |
az aks update \ --resource-group $RESOURCE_GROUP \ --name $CLUSTER_NAME \ --enable-advanced-network-observability |
Get Cluster Credentials
1 2 |
az aks get-credentials --name $CLUSTER_NAME --resource-group $RESOURCE_GROUP |
Set Up Azure Managed Prometheus and Grafana
Create Azure Monitor Resource
1 2 3 4 5 6 7 8 |
export AZURE_MONITOR_NAME="your-azure-monitor-name" az resource create \ --resource-group $RESOURCE_GROUP \ --namespace microsoft.monitor \ --resource-type accounts \ --name $AZURE_MONITOR_NAME \ --location uksouth \ --properties '{}' |
Create Grafana Instance
1 2 3 4 |
export GRAFANA_NAME="your-grafana-name" az grafana create \ --name $GRAFANA_NAME \ --resource-group $RESOURCE_GROUP |
Link Resources to AKS Cluster
1 2 3 4 5 6 7 8 |
grafanaId=$(az grafana show --name $GRAFANA_NAME --resource-group $RESOURCE_GROUP --query id --output tsv) azuremonitorId=$(az resource show --resource-group $RESOURCE_GROUP --name $AZURE_MONITOR_NAME --resource-type "Microsoft.Monitor/accounts" --query id --output tsv) az aks update \ --name $CLUSTER_NAME \ --resource-group $RESOURCE_GROUP \ --enable-azure-monitor-metrics \ --azure-monitor-workspace-resource-id $azuremonitorId \ --grafana-resource-id $grafanaId |
Visualization Using Grafana
Make sure the Azure Monitor pods are running using the kubectl get pods
command.
1 2 |
kubectl get pods -o wide -n kube-system | grep ama- |
Your output should look similar to the following example output:
1 2 3 4 5 6 7 |
ama-metrics-5bc6c6d948-zkgc9 2/2 Running 0 (21h ago) 26h ama-metrics-ksm-556d86b5dc-2ndkv 1/1 Running 0 (26h ago) 26h ama-metrics-node-lbwcj 2/2 Running 0 (21h ago) 26h ama-metrics-node-rzkzn 2/2 Running 0 (21h ago) 26h ama-metrics-win-node-gqnkw 2/2 Running 0 (26h ago) 26h ama-metrics-win-node-tkrm8 2/2 Running 0 (26h ago) 26h |
WMicrosoft have created sample dashboards. They can be found under the Dashboards > Azure Managed Prometheus folder. They have names like “Kubernetes / Networking / “. The suite of dashboards includes:
- Clusters: Shows Node-level metrics for your clusters.
- DNS (Cluster): Shows DNS metrics on a cluster or selection of Nodes.
- DNS (Workload): Shows DNS metrics for the specified workload (e.g. Pods of a DaemonSet or Deployment such as CoreDNS).
- Drops (Workload): Shows drops to/from the specified workload (e.g. Pods of a Deployment or DaemonSet).
- Pod Flows (Namespace): Shows L4/L7 packet flows to/from the specified namespace (i.e. Pods in the Namespace).
- Pod Flows (Workload): Shows L4/L7 packet flows to/from the specified workload (e.g. Pods of a Deployment or DaemonSet).
Visualize Using Hubble UI
First off we need to install the hubble cli:
1 2 3 4 5 6 7 8 |
HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt) HUBBLE_ARCH=amd64 if [ "$(uname -m)" = "aarch64" ]; then HUBBLE_ARCH=arm64; fi curl -L --fail --remote-name-all https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-${HUBBLE_ARCH}.tar.gz{,.sha256sum} sha256sum --check hubble-linux-${HUBBLE_ARCH}.tar.gz.sha256sum sudo tar xzvfC hubble-linux-${HUBBLE_ARCH}.tar.gz /bin rm hubble-linux-${HUBBLE_ARCH}.tar.gz{,.sha256sum} |
Now lets check to make sure the Hubble pods are running using the kubectl get pods command.
1 2 |
kubectl get pods -o wide -n kube-system -l k8s-app=hubble-relay |
Mutual TLS (mTLS) secures the Hubble Relay server. To allow the Hubble client to retrieve flows, you must obtain the appropriate certificates and configure the client with them. Apply the certificates using the following commands:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
#!/usr/bin/env bash set -euo pipefail set -x # Directory where certificates will be stored CERT_DIR="$(pwd)/.certs" mkdir -p "$CERT_DIR" declare -A CERT_FILES=( ["tls.crt"]="tls-client-cert-file" ["tls.key"]="tls-client-key-file" ["ca.crt"]="tls-ca-cert-files" ) for FILE in "${!CERT_FILES[@]}"; do KEY="${CERT_FILES[$FILE]}" JSONPATH="{.data['${FILE//./\\.}']}" # Retrieve the secret and decode it kubectl get secret hubble-relay-client-certs -n kube-system -o jsonpath="${JSONPATH}" | base64 -d > "$CERT_DIR/$FILE" # Set the appropriate hubble CLI config hubble config set "$KEY" "$CERT_DIR/$FILE" done hubble config set tls true hubble config set tls-server-name instance.hubble-relay.cilium.io |
Press Enter after the script to restart the bash window.
Now let’s verify the secrets were generated using the following kubectl get secrets command:
1 2 |
kubectl get secrets -n kube-system | grep hubble- |
Awesome, Its time to set up the Hubble UI.
Set Up Hubble UI
Save the following YAML to a file hubble-ui.yaml
and apply it or you can apply the copy I have in GitHub:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |
apiVersion: v1 kind: ServiceAccount metadata: name: hubble-ui namespace: kube-system --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: hubble-ui labels: app.kubernetes.io/part-of: retina rules: - apiGroups: - networking.k8s.io resources: - networkpolicies verbs: - get - list - watch - apiGroups: - "" resources: - componentstatuses - endpoints - namespaces - nodes - pods - services verbs: - get - list - watch - apiGroups: - apiextensions.k8s.io resources: - customresourcedefinitions verbs: - get - list - watch - apiGroups: - cilium.io resources: - "*" verbs: - get - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: hubble-ui labels: app.kubernetes.io/part-of: retina roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: hubble-ui subjects: - kind: ServiceAccount name: hubble-ui namespace: kube-system --- apiVersion: v1 kind: ConfigMap metadata: name: hubble-ui-nginx namespace: kube-system data: nginx.conf: | server { listen 8081; server_name localhost; root /app; index index.html; client_max_body_size 1G; location / { proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; # CORS add_header Access-Control-Allow-Methods "GET, POST, PUT, HEAD, DELETE, OPTIONS"; add_header Access-Control-Allow-Origin *; add_header Access-Control-Max-Age 1728000; add_header Access-Control-Expose-Headers content-length,grpc-status ,grpc-message; add_header Access-Control-Allow-Headers range,keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout; if ($request_method = OPTIONS) { return 204; } # /CORS location /api { proxy_http_version 1.1; proxy_pass_request_headers on; proxy_hide_header Access-Control-Allow-Origin; proxy_pass http://127.0.0.1:8090; } location / { try_files $uri $uri/ /index.html /index.html; } # Liveness probe location /healthz { access_log off; add_header Content-Type text/plain; return 200 'ok'; } } } --- kind: Deployment apiVersion: apps/v1 metadata: name: hubble-ui namespace: kube-system labels: k8s-app: hubble-ui app.kubernetes.io/name: hubble-ui app.kubernetes.io/part-of: retina spec: replicas: 1 selector: matchLabels: k8s-app: hubble-ui template: metadata: labels: k8s-app: hubble-ui app.kubernetes.io/name: hubble-ui app.kubernetes.io/part-of: retina spec: serviceAccount: hubble-ui serviceAccountName: hubble-ui automountServiceAccountToken: true containers: - name: frontend image: mcr.microsoft.com/oss/cilium/hubble-ui:v0.12.2 imagePullPolicy: Always ports: - name: http containerPort: 8081 livenessProbe: httpGet: path: /healthz port: 8081 readinessProbe: httpGet: path: / port: 8081 resources: {} volumeMounts: - name: hubble-ui-nginx-conf mountPath: /etc/nginx/conf.d/default.conf subPath: nginx.conf - name: tmp-dir mountPath: /tmp terminationMessagePolicy: FallbackToLogsOnError securityContext: {} - name: backend image: mcr.microsoft.com/oss/cilium/hubble-ui-backend:v0.12.2 imagePullPolicy: Always env: - name: EVENTS_SERVER_PORT value: "8090" - name: FLOWS_API_ADDR value: "hubble-relay:443" - name: TLS_TO_RELAY_ENABLED value: "true" - name: TLS_RELAY_SERVER_NAME value: ui.hubble-relay.cilium.io - name: TLS_RELAY_CA_CERT_FILES value: /var/lib/hubble-ui/certs/hubble-relay-ca.crt - name: TLS_RELAY_CLIENT_CERT_FILE value: /var/lib/hubble-ui/certs/client.crt - name: TLS_RELAY_CLIENT_KEY_FILE value: /var/lib/hubble-ui/certs/client.key livenessProbe: httpGet: path: /healthz port: 8090 readinessProbe: httpGet: path: /healthz port: 8090 ports: - name: grpc containerPort: 8090 resources: {} volumeMounts: - name: hubble-ui-client-certs mountPath: /var/lib/hubble-ui/certs readOnly: true terminationMessagePolicy: FallbackToLogsOnError securityContext: {} nodeSelector: kubernetes.io/os: linux volumes: - configMap: defaultMode: 420 name: hubble-ui-nginx name: hubble-ui-nginx-conf - emptyDir: {} name: tmp-dir - name: hubble-ui-client-certs projected: defaultMode: 0400 sources: - secret: name: hubble-relay-client-certs items: - key: tls.crt path: client.crt - key: tls.key path: client.key - key: ca.crt path: hubble-relay-ca.crt --- kind: Service apiVersion: v1 metadata: name: hubble-ui namespace: kube-system labels: k8s-app: hubble-ui app.kubernetes.io/name: hubble-ui app.kubernetes.io/part-of: retina spec: type: ClusterIP selector: k8s-app: hubble-ui ports: - name: http port: 80 targetPort: 8081 |
1 2 |
kubectl apply -f https://gist.githubusercontent.com/PixelRobots/db9234c2269dc1d4f9a1acb1f62adc50/raw/efcd19c8822e5345dae78ae3c797e1c7599aa5c5/hubble-ui.yaml |
Now for a bit of port forward magic so we can view the Hubble UI:
1 2 |
kubectl -n kube-system port-forward svc/hubble-ui 12000:80 |
You can now access Hubble UI at http://localhost:12000/
.
Conclusion
Advanced Network Observability in AKS is a game-changer for monitoring and diagnosing your network. With detailed metrics, flow logs, and flexible visualization options, it’s easier than ever to keep your applications running smoothly. Setting it up is straightforward, so why not enhance your AKS deployments today?
0 Comments