Azure Monitor with OpenTelemetry Part 5: Distributed Tracing Across Microservices

Azure Monitor with OpenTelemetry Part 5: Distributed Tracing Across Microservices

Distributed tracing solves one of the most challenging problems in microservices architectures: understanding how requests flow through multiple services. When a single user action triggers calls across dozens of services, databases, and message queues, pinpointing performance bottlenecks or failures becomes nearly impossible without proper tracing infrastructure. OpenTelemetry’s distributed tracing capabilities with Azure Monitor provide end-to-end visibility into these complex request flows through automatic context propagation and correlation.

This guide explores how OpenTelemetry enables distributed tracing across microservices, demonstrating W3C TraceContext propagation, handling asynchronous messaging scenarios, implementing cross-service correlation, and troubleshooting common distributed tracing challenges in production environments.

Understanding Distributed Tracing

In a monolithic application, tracing a request’s execution path involves following a single thread through a codebase. Distributed systems fragment this simplicity across network boundaries, where a request might touch authentication services, product catalogs, inventory systems, payment processors, and notification services before completing. Each service operates independently, potentially in different languages, on different infrastructure, with different logging systems.

Distributed tracing reconstructs the complete request journey by linking individual operations (spans) into a coherent trace. Each span represents work done in a specific service, with timing information, metadata, and relationships to parent and child spans.

graph TB
    subgraph User Request
        A[Client Request]
    end
    
    subgraph API Gateway
        B[API Gateway Span]
    end
    
    subgraph Auth Service
        C[Auth Check Span]
        D[DB Query Span]
    end
    
    subgraph Product Service
        E[Get Product Span]
        F[Cache Lookup Span]
        G[DB Query Span]
    end
    
    subgraph Order Service
        H[Create Order Span]
        I[Validate Stock Span]
        J[Reserve Inventory Span]
    end
    
    subgraph Payment Service
        K[Process Payment Span]
        L[External API Span]
    end
    
    A -->|Trace ID: abc123| B
    B -->|Parent: B| C
    C -->|Parent: C| D
    B -->|Parent: B| E
    E -->|Parent: E| F
    E -->|Parent: E| G
    B -->|Parent: B| H
    H -->|Parent: H| I
    H -->|Parent: H| J
    H -->|Parent: H| K
    K -->|Parent: K| L
    
    style A fill:#68217a
    style B fill:#0078d4
    style C fill:#2185d0
    style E fill:#2185d0
    style H fill:#2185d0
    style K fill:#2185d0

W3C TraceContext Standard

The W3C TraceContext specification standardizes how trace information propagates across service boundaries. OpenTelemetry implements this specification by default, ensuring interoperability across different platforms, languages, and observability tools.

TraceContext Headers

W3C TraceContext uses two HTTP headers for propagation:

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: vendor1=value1,vendor2=value2

The traceparent header contains four fields separated by dashes:

00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
││ │                                │                 └─ Flags (01=sampled)
││ │                                └─────────────────── Parent/Span ID (16 hex)
││ └───────────────────────────────────────────────────── Trace ID (32 hex)
│└─────────────────────────────────────────────────────── Version (00)

The tracestate header carries vendor-specific information that doesn’t affect trace propagation but allows observability tools to pass additional context.

Automatic Context Propagation

OpenTelemetry instrumentation libraries automatically handle context propagation for common protocols like HTTP and gRPC. When an instrumented service makes an outgoing request, the SDK automatically injects trace context into headers. When receiving a request, the SDK extracts trace context and uses it to create child spans.

.NET Automatic Propagation

using Azure.Monitor.OpenTelemetry.AspNetCore;

var builder = WebApplication.CreateBuilder(args);

// Configure OpenTelemetry with Azure Monitor
builder.Services.AddOpenTelemetry().UseAzureMonitor();

// Add HttpClient with automatic instrumentation
builder.Services.AddHttpClient();

var app = builder.Build();

app.MapGet("/api/order/{id}", async (int id, HttpClient httpClient) =>
{
    // Outgoing HTTP call automatically propagates trace context
    var product = await httpClient.GetFromJsonAsync(
        $"http://product-service/api/products/{id}"
    );
    
    var inventory = await httpClient.GetFromJsonAsync(
        $"http://inventory-service/api/check/{id}"
    );
    
    return new { Product = product, Inventory = inventory };
});

app.Run();

The HttpClient instrumentation automatically injects traceparent headers into outgoing requests and creates dependency spans that correlate with the parent request span.

Node.js Automatic Propagation

const { useAzureMonitor } = require("@azure/monitor-opentelemetry");
const express = require("express");
const axios = require("axios");

// Configure OpenTelemetry
useAzureMonitor();

const app = express();

app.get("/api/checkout/:id", async (req, res) => {
    const orderId = req.params.id;
    
    // Automatic context propagation via axios instrumentation
    const orderData = await axios.get(`http://order-service/api/orders/${orderId}`);
    const payment = await axios.post("http://payment-service/api/process", {
        orderId: orderId,
        amount: orderData.data.total
    });
    
    res.json({ order: orderData.data, payment: payment.data });
});

app.listen(3000);

Python Automatic Propagation

import os
import requests
from flask import Flask, jsonify
from azure.monitor.opentelemetry import configure_azure_monitor

configure_azure_monitor(
    connection_string=os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING")
)

app = Flask(__name__)

@app.route("/api/user//orders")
def get_user_orders(user_id):
    # Requests library automatically propagates context
    user_response = requests.get(f"http://user-service/api/users/{user_id}")
    user_data = user_response.json()
    
    orders_response = requests.get(f"http://order-service/api/orders/user/{user_id}")
    orders_data = orders_response.json()
    
    return jsonify({
        "user": user_data,
        "orders": orders_data
    })

if __name__ == "__main__":
    app.run(port=5000)

Manual Context Propagation

Custom protocols or unsupported libraries require manual context injection and extraction. OpenTelemetry provides propagator APIs for these scenarios.

.NET Manual Propagation

using System.Diagnostics;
using OpenTelemetry;
using OpenTelemetry.Context.Propagation;

public class MessageQueueService
{
    private static readonly ActivitySource ActivitySource = new("MessageQueue.Service");
    private static readonly TextMapPropagator Propagator = Propagators.DefaultTextMapPropagator;
    
    public void SendMessage(string message)
    {
        using var activity = ActivitySource.StartActivity("SendMessage", ActivityKind.Producer);
        
        // Create headers dictionary for context injection
        var headers = new Dictionary();
        
        // Inject current trace context into headers
        Propagator.Inject(
            new PropagationContext(activity.Context, Baggage.Current),
            headers,
            (carrier, key, value) => carrier[key] = value
        );
        
        // Send message with propagated context
        var messageWithContext = new
        {
            Body = message,
            Headers = headers
        };
        
        PublishToQueue(messageWithContext);
    }
    
    public void ProcessMessage(dynamic message)
    {
        // Extract trace context from message headers
        var parentContext = Propagator.Extract(
            default,
            message.Headers,
            (carrier, key) => carrier.ContainsKey(key) ? new[] { carrier[key] } : Array.Empty()
        );
        
        // Create span with extracted parent context
        using var activity = ActivitySource.StartActivity(
            "ProcessMessage",
            ActivityKind.Consumer,
            parentContext.ActivityContext
        );
        
        // Process message
        HandleMessage(message.Body);
    }
}

Python Manual Propagation with Kafka

from opentelemetry import trace
from opentelemetry.propagate import inject, extract
from kafka import KafkaProducer, KafkaConsumer
import json

tracer = trace.get_tracer(__name__)

class KafkaTracing:
    def __init__(self):
        self.producer = KafkaProducer(
            bootstrap_servers=['localhost:9092'],
            value_serializer=lambda v: json.dumps(v).encode('utf-8')
        )
    
    def send_message(self, topic, message_body):
        with tracer.start_as_current_span("kafka_publish") as span:
            # Create message with headers for context
            headers = {}
            
            # Inject trace context into headers
            inject(headers)
            
            # Convert headers to Kafka format
            kafka_headers = [(k, v.encode('utf-8')) for k, v in headers.items()]
            
            message = {
                "body": message_body,
                "timestamp": time.time()
            }
            
            self.producer.send(
                topic,
                value=message,
                headers=kafka_headers
            )
            
            span.set_attribute("messaging.system", "kafka")
            span.set_attribute("messaging.destination", topic)
    
    def process_message(self, kafka_message):
        # Extract headers from Kafka message
        headers = {k: v.decode('utf-8') for k, v in kafka_message.headers}
        
        # Extract parent context
        ctx = extract(headers)
        
        # Create span with extracted context
        with tracer.start_as_current_span("kafka_consume", context=ctx) as span:
            message_data = json.loads(kafka_message.value)
            
            span.set_attribute("messaging.system", "kafka")
            span.set_attribute("messaging.operation", "process")
            
            # Process message
            self.handle_message(message_data)

Cross-Service Tracing Example

Let’s implement a complete microservices scenario with distributed tracing across multiple services.

Service 1: API Gateway (Node.js)

const { useAzureMonitor } = require("@azure/monitor-opentelemetry");
const express = require("express");
const axios = require("axios");

useAzureMonitor();

const app = express();
app.use(express.json());

app.post("/api/purchase", async (req, res) => {
    const { userId, productId, quantity } = req.body;
    
    try {
        // Call auth service
        const authResponse = await axios.post("http://auth-service:3001/validate", {
            userId: userId
        });
        
        if (!authResponse.data.authorized) {
            return res.status(401).json({ error: "Unauthorized" });
        }
        
        // Call product service
        const productResponse = await axios.get(
            `http://product-service:3002/api/products/${productId}`
        );
        
        // Call order service
        const orderResponse = await axios.post("http://order-service:3003/api/orders", {
            userId,
            productId,
            quantity,
            price: productResponse.data.price
        });
        
        res.json({
            success: true,
            orderId: orderResponse.data.orderId
        });
    } catch (error) {
        res.status(500).json({ error: error.message });
    }
});

app.listen(3000, () => console.log("API Gateway running on port 3000"));

Service 2: Product Service (Python)

import os
from flask import Flask, jsonify
from azure.monitor.opentelemetry import configure_azure_monitor
from opentelemetry import trace
import redis

configure_azure_monitor(
    connection_string=os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING")
)

app = Flask(__name__)
tracer = trace.get_tracer(__name__)
cache = redis.Redis(host='redis', port=6379)

@app.route("/api/products/")
def get_product(product_id):
    with tracer.start_as_current_span("check_cache") as span:
        cached = cache.get(f"product:{product_id}")
        if cached:
            span.set_attribute("cache.hit", True)
            return jsonify({"productId": product_id, "fromCache": True})
    
    with tracer.start_as_current_span("fetch_from_db") as span:
        # Simulated database fetch
        product = {
            "productId": product_id,
            "name": f"Product {product_id}",
            "price": 99.99,
            "stock": 50
        }
        
        cache.setex(f"product:{product_id}", 300, str(product))
        
        return jsonify(product)

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=3002)

Service 3: Order Service (.NET)

using Azure.Monitor.OpenTelemetry.AspNetCore;
using System.Diagnostics;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry().UseAzureMonitor();

var app = builder.Build();
var activitySource = new ActivitySource("OrderService");

app.MapPost("/api/orders", async (OrderRequest request) =>
{
    using var activity = activitySource.StartActivity("CreateOrder");
    
    activity?.SetTag("user.id", request.UserId);
    activity?.SetTag("product.id", request.ProductId);
    activity?.SetTag("order.quantity", request.Quantity);
    
    // Validate inventory
    using (var validateActivity = activitySource.StartActivity("ValidateInventory"))
    {
        await Task.Delay(50); // Simulated inventory check
        validateActivity?.SetTag("inventory.available", true);
    }
    
    // Calculate total
    var total = request.Price * request.Quantity;
    activity?.SetTag("order.total", total);
    
    // Create order
    var orderId = Guid.NewGuid().ToString();
    
    return new OrderResponse
    {
        OrderId = orderId,
        UserId = request.UserId,
        Total = total,
        Status = "confirmed"
    };
});

app.Run();

record OrderRequest(int UserId, int ProductId, int Quantity, decimal Price);
record OrderResponse
{
    public string OrderId { get; init; }
    public int UserId { get; init; }
    public decimal Total { get; init; }
    public string Status { get; init; }
}

Troubleshooting Distributed Traces

Distributed tracing challenges often manifest as broken traces, missing spans, or incorrect parent-child relationships.

Common Issues and Solutions

Missing Trace Context: Services create new traces instead of continuing existing ones. Verify that instrumentation libraries are installed and that context propagation is enabled. Check that HTTP headers are not stripped by proxies or load balancers.

Incorrect Parent-Child Relationships: Spans appear disconnected in Application Insights. Ensure all services use the same propagation format (W3C TraceContext). Verify that async operations properly maintain context.

Performance Impact: High-cardinality attributes or excessive span creation degrades performance. Implement sampling strategies and avoid creating spans for operations under 1ms. Use batch exporters to reduce network overhead.

Debugging Context Propagation

// .NET: Log trace context
using var activity = activitySource.StartActivity("Operation");
Console.WriteLine($"TraceId: {activity?.TraceId}");
Console.WriteLine($"SpanId: {activity?.SpanId}");
Console.WriteLine($"ParentId: {activity?.ParentSpanId}");

// Python: Log propagated headers
from opentelemetry.propagate import inject
headers = {}
inject(headers)
print(f"Propagated headers: {headers}")

Viewing Distributed Traces in Azure Monitor

Azure Monitor Application Insights provides several views for analyzing distributed traces. The Application Map visualizes service dependencies and request flows, showing average duration and failure rates for each service. Transaction Search allows querying specific traces with complete span hierarchies. The Performance blade identifies slow operations across service boundaries, and the End-to-end Transaction Details view shows the complete request timeline with all spans.

Next in the Series

This guide covered distributed tracing fundamentals with OpenTelemetry and Azure Monitor. The next article explores custom metrics and advanced telemetry patterns, demonstrating how to capture business metrics, implement custom instrumentation, and build dashboards for operational insights.

References

Written by:

542 Posts

View All Posts
Follow Me :
How to whitelist website on AdBlocker?

How to whitelist website on AdBlocker?

  1. 1 Click on the AdBlock Plus icon on the top right corner of your browser
  2. 2 Click on "Enabled on this site" from the AdBlock Plus option
  3. 3 Refresh the page and start browsing the site