Recently, I was tasked with implementing distributed tracing for a microservices platform running on Azure Kubernetes Service (AKS). The requirements were clear: use Grafana Tempo for trace storage, Azure Blob Storage as the backend for cost-effectiveness, and ensure all connectivity remained private for security compliance. What started as a straightforward Helm deployment turned into a deep dive into Azure Private Link Service, Managed Private Endpoints, and workload identity federation.
The Challenge: Private Connectivity for Observability
Most Tempo tutorials show you how to deploy it with local storage or public endpoints. But in enterprise environments, you often need:
- Private connectivity between Grafana and Tempo (no internet traffic)
- Azure Blob Storage backend for scalable, cost-effective trace storage
- Workload Identity for secure authentication without storing secrets
The tricky part? Making Azure Managed Grafana communicate privately with Tempo running inside AKS, while Tempo itself authenticates to Azure Storage using managed identities.
Architecture Overview
Here’s what we’re building:
1 2 3 4 5 6 7 8 9 10 |
Internet ←→ Azure Managed Grafana (Public Access) ↓ (Azure Private Link Service) Managed Private Endpoint ↓ (Private IP Communication) Private Link Service (Auto-created by AKS) ↓ (Internal Load Balancer) Tempo Pods (AKS) ↓ (Managed Identity Authentication) Azure Blob Storage (traces container) |
The key insight: Azure Managed Grafana can create Managed Private Endpoints to connect to Private Link Services, and AKS can automatically create Private Link Services for internal services.
Setting Up Azure Infrastructure
Storage Account and Managed Identity
First, we need a storage account and managed identity for Tempo:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# Create resource group az group create --name rg-observability-dev --location westeurope # Create storage account for traces az storage account create \ --name saobservabilitydev \ --resource-group rg-observability-dev \ --location westeurope \ --sku Standard_RAGZRS \ --kind StorageV2 \ --access-tier Hot \ --https-only true \ --min-tls-version TLS1_2 \ --allow-blob-public-access false # Create container for tempo traces az storage container create \ --name tempo-traces \ --account-name saobservabilitydev \ --auth-mode login # Create managed identity az identity create --name tempo-identity --resource-group rg-observability-dev |
Workload Identity Federation
The managed identity needs access to the storage account and federation with our Kubernetes service account:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# Get subscription and AKS OIDC issuer SUBSCRIPTION_ID=$(az account show --query id -o tsv) OIDC_ISSUER=$(az aks show --name pixelrobots-test --resource-group rg-aks-pixelrobots --query "oidcIssuerProfile.issuerUrl" -o tsv) # Assign storage permissions az role assignment create \ --assignee $(az identity show --name tempo-identity --resource-group rg-observability-dev --query principalId -o tsv) \ --role "Storage Blob Data Contributor" \ --scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/rg-observability-dev/providers/Microsoft.Storage/storageAccounts/saobservabilitydev" # Create federated identity credential az identity federated-credential create \ --name tempo-federated-credential \ --identity-name tempo-identity \ --resource-group rg-observability-dev \ --issuer $OIDC_ISSUER \ --subject system:serviceaccount:tempo:tempo \ --audience api://AzureADTokenExchange |
Tempo Configuration with Private Link Service
Here’s where it gets interesting. We need to configure Tempo’s Kubernetes service to automatically create a Private Link Service:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
# tempo-values.yaml serviceAccount: create: true name: tempo # Client ID will be set dynamically via --set podLabels: azure.workload.identity/use: "true" # Service configuration for Private Link Service service: type: LoadBalancer annotations: service.beta.kubernetes.io/azure-load-balancer-internal: "true" service.beta.kubernetes.io/azure-load-balancer-internal-subnet: "clusteringservices" # Change to your subnet name service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path: "/ready" # Enable Private Link Service service.beta.kubernetes.io/azure-pls-create: "true" service.beta.kubernetes.io/azure-pls-name: "tempo-private-link-service" service.beta.kubernetes.io/azure-pls-ip-configuration-subnet: "clusteringservices" # Change to your subnet name service.beta.kubernetes.io/azure-pls-proxy-protocol: "false" service.beta.kubernetes.io/azure-pls-visibility: "*" tempo: storage: trace: backend: azure azure: # These storage settings will be set dynamically via --set parameters use_federated_token: true resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi |
The key annotations here are the azure-pls-*
ones. These tell AKS to automatically create a Private Link Service when the LoadBalancer service is provisioned.
Dynamic Helm Deployment
Instead of hardcoding values, I used Helm’s --set
parameters to inject environment-specific configuration:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Get the managed identity client ID CLIENT_ID=$(az identity show --name tempo-identity --resource-group rg-observability-dev --query clientId -o tsv) # Add Grafana Helm repository helm repo add grafana https://grafana.github.io/helm-charts helm repo update # Install Tempo with dynamic values helm upgrade --install tempo grafana/tempo \ --namespace tempo \ --create-namespace \ -f tempo-values.yaml \ --set "serviceAccount.annotations.azure\.workload\.identity/client-id=$CLIENT_ID" \ --set "tempo.storage.trace.azure.storage_account_name=saobservabilitydev" \ --set "tempo.storage.trace.azure.container_name=tempo-traces" |
This approach keeps the values file clean and environment-agnostic while injecting the right managed identity and storage configuration at deployment time.
Connecting Grafana via Managed Private Endpoint
Once Tempo is running, we need to connect Azure Managed Grafana to it privately. First, we discover the Private Link Service that AKS created:
1 2 3 4 5 |
# Get the AKS node resource group NODE_RG=$(az aks show -g rg-aks-pixelrobots -n pixelrobots-test --query nodeResourceGroup -o tsv) # Find the Private Link Service az network private-link-service list -g $NODE_RG --query "[?contains(name, 'tempo')]" -o table |
When I first tried this, I was surprised to find that the PLS wasn’t immediately available. It takes a few minutes for AKS to provision the Private Link Service after the LoadBalancer service is created.
Creating the Managed Private Endpoint
With the PLS resource ID, we can create a Managed Private Endpoint in Grafana:
1 2 3 4 5 6 7 8 9 10 11 |
# Get the PLS resource ID (replace with your actual PLS ID) PLS_ID="/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$NODE_RG/providers/Microsoft.Network/privateLinkServices/tempo-private-link-service" # Create Managed Private Endpoint in Grafana az grafana mpe create \ -g rg-monitoring-dev \ --workspace-name grafana-pixelrobots-dev \ -n mpe-grafana-tempo \ --private-link-resource-id $PLS_ID \ --private-link-resource-region westeurope \ --group-ids tempo |
Approving the Connection
The MPE creation triggers a pending connection on the Private Link Service that needs approval:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# List pending connections az network private-endpoint-connection list \ --resource-group $NODE_RG \ --resource-name tempo-private-link-service \ --type Microsoft.Network/privateLinkServices # Approve the connection (replace with actual connection name) az network private-endpoint-connection approve \ --resource-group $NODE_RG \ --resource-name tempo-private-link-service \ --type Microsoft.Network/privateLinkServices \ --name <connection-name> \ --description "Approved for Grafana connectivity" |
Creating the Data Source
After approval, refresh the MPE state and get the private IP:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Refresh MPE state in Grafana az grafana mpe refresh -g rg-monitoring-dev --workspace-name grafana-pixelrobots-dev # Get the private IP PRIVATE_IP=$(az grafana mpe show -g rg-monitoring-dev --workspace-name grafana-pixelrobots-dev -n mpe-grafana-tempo --query privateLinkServicePrivateIP -o tsv) # Create Tempo data source az grafana data-source create \ -g rg-monitoring-dev \ -n grafana-pixelrobots-dev \ --definition '{ "name": "tempo-private-ip", "type": "tempo", "url": "http://'$PRIVATE_IP':3200", "access": "proxy" }' |
Lessons Learned and Gotchas
1. Private Link Service Takes Time
Don’t expect the PLS to be available immediately after deploying the Tempo service. I found it typically takes 2-5 minutes for AKS to provision it.
2. Service Health Probes Matter
The azure-load-balancer-health-probe-request-path
annotation is crucial. Without it, the load balancer health checks fail and the PLS doesn’t work properly.
3. Workload Identity Setup Order
Create the federated identity credential before deploying Tempo. The federation is set up using the expected service account path (system:serviceaccount:tempo:tempo
) – the actual Kubernetes service account gets created later during Helm deployment.
4. MPE Refresh is Required
After approving the private endpoint connection, you must run az grafana mpe refresh
for Grafana to pick up the approved state and assign the private IP.
5. Values File Cleanliness
Using --set
parameters instead of environment-specific values files makes the solution much more maintainable. The values file becomes a template, and the dynamic parts are injected at deployment time.
OpenTelemetry Collector Integration
To complete the setup, configure your OpenTelemetry Collector to send traces to Tempo:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
# otel-values.yaml config: exporters: otlp/traces: endpoint: "tempo.tempo.svc.cluster.local:4317" service: pipelines: traces: exporters: - otlp/traces processors: - memory_limiter - batch receivers: - otlp - jaeger - zipkin service: enabled: true internalTrafficPolicy: "Cluster" |
One gotcha I encountered: make sure to set internalTrafficPolicy: "Cluster"
in your OpenTelemetry Collector service configuration. Without this, newer versions of the OpenTelemetry Helm chart fail with a template error.
Wrapping Up
This setup gives you a production-ready distributed tracing solution with:
- Secure private connectivity between Grafana and Tempo
- Scalable Azure Blob Storage backend for trace data
- No secrets in Kubernetes thanks to workload identity
- Environment-agnostic configuration via parameterized deployments
The combination of Azure Private Link Service, Managed Private Endpoints, and workload identity might seem complex initially, but it provides the security and scalability needed for enterprise observability platforms.
Next steps you might consider: setting up trace sampling policies, configuring retention policies for cost optimization, and adding alerting based on trace error rates. The Azure CLI scripts can easily be parameterized and automated as part of your infrastructure as code pipeline.
The full automation of this setup, from storage account creation to Grafana data source configuration, takes what used to be a multi-hour manual process and reduces it to a few minutes of script execution. That’s the kind of developer experience improvement that makes platform engineering worthwhile.
0 Comments