Reading Time: 7 minutes
Share:
Twitter
LinkedIn
Facebook
Reddit
Whatsapp
Follow by Email

Introduction

Hello, fellow Kubernetes enthusiasts! Welcome back to Pixel Robots! Today, we’re diving into a powerful new feature thats currently in preview for Azure Kubernetes Service (AKS) – Advanced Network Observability. If you’ve ever wished for superpowers or just Cilium Hubble to monitor and diagnose your container network, this is the feature for you. It’s part of the Advanced Container Networking Services suite and provides you with the insights you need to keep your containerized workloads running smoothly. Let’s explore what Advanced Network Observability is all about, why it’s a game-changer, and how you can start using it in your AKS clusters.

What is Advanced Network Observability?

Advanced Network Observability is like having a magnifying glass for your container network. It integrates effortlessly with both Cilium and non-Cilium data planes, giving you flexibility and control. Whether you’re tracking traffic volume, dropped packets, or DNS issues, this feature delivers detailed metrics and logs to help you troubleshoot and optimize your network performance.

Information

For Cilium data planes, Advanced Network Observability is available from Kubernetes version 1.29.
For non-Cilium data planes, it works on all Linux distributions, including Azure Linux, from version 2.0.

Image from: https://learn.microsoft.com/en-gb/azure/aks/advanced-network-observability-concepts?tabs=non-cilium

Let’s look at some key features and benefits of the new offering.

Key Features

Node-Level Metrics

Monitor the health of your container network at the node level with metrics stored in Prometheus format. Visualize these metrics in Grafana to gain insights into traffic volume, dropped packets, and more.

Hubble Metrics (DNS and Pod-Level Metrics)

Hubble metrics provide granular details on source and destination pods, traffic volume, and more. For non-Cilium data planes, you also get DNS metrics covering DNS errors and requests without responses.

Hubble Flow Logs

Flow logs are your go-to for deep visibility into network activity. They log all communications to and from pods, helping you troubleshoot connectivity issues effectively.

  • Hubble CLI: Fetches flow logs across the cluster with customizable filtering and formatting.
  • Hubble UI: A user-friendly browser interface to explore network activity, displaying service-connection graphs and flow logs.

Benefits

  • CNI-Agnostic: Works with all Azure CNI variants, including kubenet.
  • Uniform Experience: Consistent performance across Cilium and non-Cilium data planes.
  • eBPF-Based Observability: Utilizes eBPF for efficient and scalable network monitoring.
  • Detailed Network Visibility: Comprehensive network flow logs help you understand application communication.
  • Flexible Metrics Storage: Use Azure Managed Prometheus and Grafana or bring your own instances.

Metrics

Node-Level Metrics

Advanced Network Observability aggregates several node-level metrics to help you keep tabs on your cluster’s health. These metrics are labeled by cluster and instance (Node name).

Non-Cilium

For non-Cilium data planes, you get metrics for both Linux and Windows. Here’s a rundown of what you’ll be monitoring:

Metric NameDescriptionExtra LabelsLinuxWindows
networkobservability_forward_countTotal forwarded packet countdirection
networkobservability_forward_bytesTotal forwarded byte countdirection
networkobservability_drop_countTotal dropped packet countdirectionreason
networkobservability_drop_bytesTotal dropped byte countdirectionreason
networkobservability_tcp_stateTCP currently active socket count by TCP statestate
networkobservability_tcp_connection_remoteTCP currently active socket count by remote IP/portaddress (IP), port
networkobservability_tcp_connection_statsTCP connection statistics (e.g., Delayed ACKs, TCPKeepAlive, TCPSackFailures)statistic
networkobservability_tcp_flag_countersTCP packets count by flagflag
networkobservability_ip_connection_statsIP connection statisticsstatistic
networkobservability_udp_connection_statsUDP connection statisticsstatistic
networkobservability_udp_active_socketsUDP currently active socket count
networkobservability_interface_statsInterface statisticsInterfaceNamestatistic

Cilium

For clusters using the Cilium data plane, Advanced Network Observability supports only Linux (sorry, Windows folks). Here are the key metrics:

Metric NameDescriptionExtra LabelsLinuxWindows
cilium_forward_count_totalTotal forwarded packet countdirection
cilium_forward_bytes_totalTotal forwarded byte countdirection
cilium_drop_count_totalTotal dropped packet countdirectionreason
cilium_drop_bytes_totalTotal dropped byte countdirectionreason

Pod-Level Metrics (Hubble Metrics)

Pod-level metrics provide a deep dive into traffic data for individual pods, labeled by clusterinstance (Node name), and either source or destination. Here’s what you’ll be tracking:

Metric NameDescriptionExtra LabelsLinuxWindows
hubble_dns_queries_totalTotal DNS requests by querysource or destinationqueryqtypes (query type)
hubble_dns_responses_totalTotal DNS responses by query/responsesource or destinationqueryqtypes (query type), rcode (return code), ips_returned (number of IPs)
hubble_drop_totalTotal dropped packet countsource or destinationprotocolreason
hubble_tcp_flags_totalTotal TCP packets count by flagsource or destinationflag
hubble_flows_processed_totalTotal network flows processed (L4/L7 traffic)source or destinationprotocolverdicttypesubtype

Limitations

  • Pod-level metrics are available only on Linux.
  • Cilium data plane support starts with Kubernetes version 1.29.
  • Metric labels might have slight differences between Cilium and non-Cilium clusters.
  • Cilium data plane does not currently support DNS metrics.

Getting Started with Advanced Network Observability

Prerequisites

  • Azure CLI version 2.56.0 or newer.

Install the AKS-Preview CLI Extension

First things first, let’s get the necessary CLI extension installed and updated.

To install:

To update (making sure you’re on the latest version):

Register the Feature Flag

Next up, you’ll need to register the SafeguardsPreview feature flag. This might take a few minutes to reflect as ‘Registered’.

To register:

Check the registration status:

Once registered, refresh your resource provider registration with:

Create a Resource Group

Create an AKS Cluster with Advanced Network Observability

For non-Cilium data planes:

For Cilium data planes (Kubernetes version 1.29+):

Enable Advanced Network Observability on an Existing Cluster

Get Cluster Credentials

Set Up Azure Managed Prometheus and Grafana

Create Azure Monitor Resource

Create Grafana Instance

Link Resources to AKS Cluster

Visualization Using Grafana

Information

The hubble_flows_processed_total metric isn’t scraped by default due to high metric cardinality in large-scale clusters. Because of this, the Pods Flows dashboards have panels with missing data. To change this, you can modify the ama metrics settings to include hubble_flows_processed_total in the metric keep list. To learn how to do this, see the Minimal Ingestion Documentation.

Make sure the Azure Monitor pods are running using the kubectl get pods command.

Your output should look similar to the following example output:

WMicrosoft have created sample dashboards. They can be found under the Dashboards > Azure Managed Prometheus folder. They have names like “Kubernetes / Networking / “. The suite of dashboards includes:

  • Clusters: Shows Node-level metrics for your clusters.
  • DNS (Cluster): Shows DNS metrics on a cluster or selection of Nodes.
  • DNS (Workload): Shows DNS metrics for the specified workload (e.g. Pods of a DaemonSet or Deployment such as CoreDNS).
  • Drops (Workload): Shows drops to/from the specified workload (e.g. Pods of a Deployment or DaemonSet).
  • Pod Flows (Namespace): Shows L4/L7 packet flows to/from the specified namespace (i.e. Pods in the Namespace).
  • Pod Flows (Workload): Shows L4/L7 packet flows to/from the specified workload (e.g. Pods of a Deployment or DaemonSet).

Information

The Cilium data plane does not currently support DNS metrics/dashboards.

Visualize Using Hubble UI

First off we need to install the hubble cli:

Now lets check to make sure the Hubble pods are running using the kubectl get pods command.

Mutual TLS (mTLS) secures the Hubble Relay server. To allow the Hubble client to retrieve flows, you must obtain the appropriate certificates and configure the client with them. Apply the certificates using the following commands:

Press Enter after the script to restart the bash window.

Now let’s verify the secrets were generated using the following kubectl get secrets command:

Awesome, Its time to set up the Hubble UI.

Set Up Hubble UI

Save the following YAML to a file hubble-ui.yaml and apply it or you can apply the copy I have in GitHub:

Hubble Manifest (hidden by default because it is long)

Now for a bit of port forward magic so we can view the Hubble UI:

You can now access Hubble UI at http://localhost:12000/.

Conclusion

Advanced Network Observability in AKS is a game-changer for monitoring and diagnosing your network. With detailed metrics, flow logs, and flexible visualization options, it’s easier than ever to keep your applications running smoothly. Setting it up is straightforward, so why not enhance your AKS deployments today?

Share:
Twitter
LinkedIn
Facebook
Reddit
Whatsapp
Follow by Email

Pixel Robots.

I’m Richard Hooper aka Pixel Robots. I started this blog in 2016 for a couple reasons. The first reason was basically just a place for me to store my step by step guides, troubleshooting guides and just plain ideas about being a sysadmin. The second reason was to share what I have learned and found out with other people like me. Hopefully, you can find something useful on the site.

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *