Distributed Tracing for LLM Applications with OpenTelemetry → Explore with me!

In Part 1 of this series, we established why LLM failures are silent and semantic — invisible to traditional monitoring. The fix starts here: giving every step in your LLM pipeline a traceable identity so you can reconstruct exactly what happened when something goes wrong.

OpenTelemetry is the foundation. It is the vendor-neutral, CNCF-standard observability framework that gives you a single instrumentation layer compatible with Langfuse, LangSmith, Arize, Datadog, Jaeger, and dozens of other backends. You instrument once, and you can route telemetry to any destination without touching your application code again.

This post walks through instrumenting a complete LLM pipeline from scratch — covering Node.js, Python, and C# — with working code you can drop into a real project today.

What OpenTelemetry Gives You for LLM Applications

OpenTelemetry provides three signals: traces, metrics, and logs. For LLM applications, traces are the most important signal to get right first. A trace is a complete record of a single request as it flows through your system. It is composed of spans, where each span represents one unit of work — a database query, an HTTP call, a vector search, or a model inference call.

For an LLM pipeline, a single user request might produce a trace with spans covering the incoming API request, prompt template rendering, embedding generation, vector store retrieval, the LLM API call itself, any tool executions triggered by the model, and the final response post-processing. Each span captures its start time, duration, status, and a set of attributes specific to what it represents.

The OpenTelemetry community has defined GenAI semantic conventions specifically for LLM telemetry. These standardize attribute names across providers so your dashboards and alerts work the same way whether you are calling Azure OpenAI, Anthropic, or a self-hosted model.

Key GenAI span attributes include:

gen_ai.system — the LLM provider (e.g., openai, anthropic)
gen_ai.request.model — the model name requested
gen_ai.response.model — the model that actually responded
gen_ai.usage.input_tokens — prompt token count
gen_ai.usage.output_tokens — completion token count
gen_ai.operation.name — the operation type (chat, embeddings, etc.)

Trace Architecture for a RAG Pipeline

Before writing code, it helps to see what a complete trace looks like across a RAG pipeline. The diagram below shows the span hierarchy for a single user request that triggers retrieval and an LLM call.

flowchart TD
    A[Trace: user-request-abc123] --> B[Span: http.server\nPOST /api/chat\n120ms]
    B --> C[Span: prompt.render\nTemplate assembly\n2ms]
    B --> D[Span: genai.embeddings\ntext-embedding-3-small\n35ms]
    D --> E[Span: vectordb.query\nAzure AI Search\n28ms]
    B --> F[Span: genai.chat\ngpt-4o\n980ms]
    F --> G[Span: tool.execute\nget_product_info\n45ms]
    G --> H[Span: http.client\nInternal API call\n40ms]
    F --> I[Span: genai.chat\ngpt-4o second call\n720ms]
    B --> J[Span: output.validate\nGuardrails check\n8ms]

    style A fill:#1e3a5f,color:#ffffff
    style F fill:#2d4a1e,color:#ffffff
    style I fill:#2d4a1e,color:#ffffff

Each span is a child of its parent, forming a tree that shows the complete execution path. When the LLM call at 980ms is slow, you can see immediately that the tool execution inside it consumed 45ms and the second LLM call added another 720ms. Without this trace, all you see is a slow request with no breakdown.

Node.js Implementation

Install the required packages:

npm install @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-http \
  @opentelemetry/exporter-metrics-otlp-http \
  @opentelemetry/sdk-metrics \
  @opentelemetry/resources \
  @opentelemetry/semantic-conventions \
  openai

Create a tracing.js file that initializes the SDK before your application starts:

// tracing.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { OTLPMetricExporter } = require('@opentelemetry/exporter-metrics-otlp-http');
const { PeriodicExportingMetricReader } = require('@opentelemetry/sdk-metrics');
const { Resource } = require('@opentelemetry/resources');
const { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } = require('@opentelemetry/semantic-conventions');

const resource = new Resource({
  [SEMRESATTRS_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME || 'llm-api-service',
  [SEMRESATTRS_SERVICE_VERSION]: '1.0.0',
  'deployment.environment': process.env.NODE_ENV || 'development',
});

const traceExporter = new OTLPTraceExporter({
  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces',
  headers: {
    Authorization: `Basic ${process.env.LANGFUSE_AUTH_TOKEN}`,
  },
});

const metricExporter = new OTLPMetricExporter({
  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/metrics',
});

const sdk = new NodeSDK({
  resource,
  traceExporter,
  metricReader: new PeriodicExportingMetricReader({
    exporter: metricExporter,
    exportIntervalMillis: 30000,
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

process.on('SIGTERM', () => {
  sdk.shutdown().then(() => process.exit(0));
});

module.exports = sdk;

Now instrument your LLM calls with manual spans that capture GenAI-specific attributes:

// llm-service.js
require('./tracing'); // must be first import

const { trace, context, SpanStatusCode } = require('@opentelemetry/api');
const OpenAI = require('openai');

const tracer = trace.getTracer('llm-service', '1.0.0');
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function chatWithTrace(userMessage, retrievedDocs = []) {
  // Root span for the entire chat operation
  return tracer.startActiveSpan('llm.chat.pipeline', async (pipelineSpan) => {
    try {
      pipelineSpan.setAttributes({
        'user.message.length': userMessage.length,
        'rag.docs.retrieved': retrievedDocs.length,
      });

      // Span for prompt assembly
      const systemPrompt = await tracer.startActiveSpan('prompt.render', async (promptSpan) => {
        const prompt = buildSystemPrompt(retrievedDocs);
        promptSpan.setAttributes({
          'prompt.template.version': 'v2.1',
          'prompt.context.tokens.estimate': Math.ceil(prompt.length / 4),
        });
        promptSpan.end();
        return prompt;
      });

      // Span for the LLM call with full GenAI semantic conventions
      const response = await tracer.startActiveSpan('gen_ai.chat', async (llmSpan) => {
        llmSpan.setAttributes({
          'gen_ai.system': 'openai',
          'gen_ai.operation.name': 'chat',
          'gen_ai.request.model': 'gpt-4o',
          'gen_ai.request.temperature': 0.2,
          'gen_ai.request.max_tokens': 1024,
        });

        const startTime = Date.now();

        try {
          const completion = await openai.chat.completions.create({
            model: 'gpt-4o',
            temperature: 0.2,
            max_tokens: 1024,
            messages: [
              { role: 'system', content: systemPrompt },
              { role: 'user', content: userMessage },
            ],
          });

          const latencyMs = Date.now() - startTime;

          // Attach response attributes
          llmSpan.setAttributes({
            'gen_ai.response.model': completion.model,
            'gen_ai.usage.input_tokens': completion.usage.prompt_tokens,
            'gen_ai.usage.output_tokens': completion.usage.completion_tokens,
            'gen_ai.response.finish_reason': completion.choices[0].finish_reason,
            'llm.latency.ms': latencyMs,
            'llm.time_to_first_token.ms': latencyMs, // approximate for non-streaming
          });

          llmSpan.setStatus({ code: SpanStatusCode.OK });
          llmSpan.end();
          return completion;

        } catch (err) {
          llmSpan.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
          llmSpan.recordException(err);
          llmSpan.end();
          throw err;
        }
      });

      pipelineSpan.setStatus({ code: SpanStatusCode.OK });
      pipelineSpan.end();
      return response.choices[0].message.content;

    } catch (err) {
      pipelineSpan.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
      pipelineSpan.recordException(err);
      pipelineSpan.end();
      throw err;
    }
  });
}

function buildSystemPrompt(docs) {
  const context = docs.map(d => d.content).join('\n\n');
  return `You are a helpful assistant. Use the following context to answer questions:\n\n${context}`;
}

module.exports = { chatWithTrace };

Python Implementation

Install dependencies:

pip install opentelemetry-api \
  opentelemetry-sdk \
  opentelemetry-exporter-otlp \
  opentelemetry-semantic-conventions \
  openai \
  langfuse

Configure the tracer provider and instrument LLM calls:

# tracing.py
import os
import time
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.resources import Resource
from opentelemetry.trace import SpanStatusCode
from openai import OpenAI

# Configure resource with service identity
resource = Resource.create({
    "service.name": os.getenv("OTEL_SERVICE_NAME", "llm-api-service"),
    "service.version": "1.0.0",
    "deployment.environment": os.getenv("ENVIRONMENT", "development"),
})

# Configure OTLP exporter -- works with Langfuse, Arize, or any OTLP-compatible backend
otlp_exporter = OTLPSpanExporter(
    endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:4318/v1/traces"),
    headers={
        "Authorization": f"Basic {os.getenv('LANGFUSE_AUTH_TOKEN', '')}",
    }
)

provider = TracerProvider(resource=resource)
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
trace.set_tracer_provider(provider)

tracer = trace.get_tracer("llm-service", "1.0.0")
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))


def chat_with_trace(user_message: str, retrieved_docs: list = None) -> str:
    retrieved_docs = retrieved_docs or []

    with tracer.start_as_current_span("llm.chat.pipeline") as pipeline_span:
        pipeline_span.set_attribute("user.message.length", len(user_message))
        pipeline_span.set_attribute("rag.docs.retrieved", len(retrieved_docs))

        try:
            # Span for prompt rendering
            with tracer.start_as_current_span("prompt.render") as prompt_span:
                system_prompt = build_system_prompt(retrieved_docs)
                prompt_span.set_attribute("prompt.template.version", "v2.1")
                prompt_span.set_attribute(
                    "prompt.context.tokens.estimate", len(system_prompt) // 4
                )

            # Span for LLM call with GenAI semantic conventions
            with tracer.start_as_current_span("gen_ai.chat") as llm_span:
                llm_span.set_attribute("gen_ai.system", "openai")
                llm_span.set_attribute("gen_ai.operation.name", "chat")
                llm_span.set_attribute("gen_ai.request.model", "gpt-4o")
                llm_span.set_attribute("gen_ai.request.temperature", 0.2)
                llm_span.set_attribute("gen_ai.request.max_tokens", 1024)

                start_time = time.time()

                completion = client.chat.completions.create(
                    model="gpt-4o",
                    temperature=0.2,
                    max_tokens=1024,
                    messages=[
                        {"role": "system", "content": system_prompt},
                        {"role": "user", "content": user_message},
                    ],
                )

                latency_ms = (time.time() - start_time) * 1000

                llm_span.set_attribute("gen_ai.response.model", completion.model)
                llm_span.set_attribute(
                    "gen_ai.usage.input_tokens", completion.usage.prompt_tokens
                )
                llm_span.set_attribute(
                    "gen_ai.usage.output_tokens", completion.usage.completion_tokens
                )
                llm_span.set_attribute(
                    "gen_ai.response.finish_reason",
                    completion.choices[0].finish_reason
                )
                llm_span.set_attribute("llm.latency.ms", latency_ms)
                llm_span.set_status(SpanStatusCode.OK)

            pipeline_span.set_status(SpanStatusCode.OK)
            return completion.choices[0].message.content

        except Exception as e:
            pipeline_span.set_status(SpanStatusCode.ERROR, str(e))
            pipeline_span.record_exception(e)
            raise


def build_system_prompt(docs: list) -> str:
    context = "\n\n".join(d["content"] for d in docs)
    return f"You are a helpful assistant. Use the following context:\n\n{context}"

C# Implementation

Install packages via NuGet:

dotnet add package OpenTelemetry
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package Azure.AI.OpenAI

Configure OpenTelemetry in your Program.cs:

// Program.cs
using OpenTelemetry;
using OpenTelemetry.Resources;
using OpenTelemetry.Trace;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry()
    .ConfigureResource(resource => resource
        .AddService(
            serviceName: "llm-api-service",
            serviceVersion: "1.0.0"
        )
        .AddAttributes(new Dictionary<string, object>
        {
            ["deployment.environment"] = builder.Environment.EnvironmentName
        }))
    .WithTracing(tracing => tracing
        .AddHttpClientInstrumentation()
        .AddAspNetCoreInstrumentation()
        .AddSource("LlmService") // register our custom ActivitySource
        .AddOtlpExporter(options =>
        {
            options.Endpoint = new Uri(
                builder.Configuration["Otel:Endpoint"] ?? "http://localhost:4317"
            );
            options.Headers = $"Authorization=Basic {builder.Configuration["Langfuse:AuthToken"]}";
        }));

builder.Services.AddSingleton<LlmService>();
var app = builder.Build();
app.MapControllers();
app.Run();

Create the LlmService.cs with full span instrumentation:

// LlmService.cs
using System.Diagnostics;
using Azure.AI.OpenAI;

public class LlmService
{
    private static readonly ActivitySource ActivitySource = new("LlmService", "1.0.0");
    private readonly AzureOpenAIClient _openAiClient;

    public LlmService(IConfiguration config)
    {
        _openAiClient = new AzureOpenAIClient(
            new Uri(config["AzureOpenAI:Endpoint"]!),
            new System.ClientModel.ApiKeyCredential(config["AzureOpenAI:ApiKey"]!)
        );
    }

    public async Task<string> ChatWithTraceAsync(
        string userMessage,
        List<RetrievedDoc> retrievedDocs)
    {
        using var pipelineActivity = ActivitySource.StartActivity("llm.chat.pipeline");
        pipelineActivity?.SetTag("user.message.length", userMessage.Length);
        pipelineActivity?.SetTag("rag.docs.retrieved", retrievedDocs.Count);

        try
        {
            // Span for prompt rendering
            string systemPrompt;
            using (var promptActivity = ActivitySource.StartActivity("prompt.render"))
            {
                systemPrompt = BuildSystemPrompt(retrievedDocs);
                promptActivity?.SetTag("prompt.template.version", "v2.1");
                promptActivity?.SetTag("prompt.context.tokens.estimate", systemPrompt.Length / 4);
            }

            // Span for LLM call
            using var llmActivity = ActivitySource.StartActivity("gen_ai.chat");
            llmActivity?.SetTag("gen_ai.system", "azure.openai");
            llmActivity?.SetTag("gen_ai.operation.name", "chat");
            llmActivity?.SetTag("gen_ai.request.model", "gpt-4o");
            llmActivity?.SetTag("gen_ai.request.temperature", 0.2f);
            llmActivity?.SetTag("gen_ai.request.max_tokens", 1024);

            var stopwatch = Stopwatch.StartNew();

            var chatClient = _openAiClient.GetChatClient("gpt-4o");
            var completion = await chatClient.CompleteChatAsync(
                new[]
                {
                    new SystemChatMessage(systemPrompt),
                    new UserChatMessage(userMessage),
                },
                new ChatCompletionOptions
                {
                    Temperature = 0.2f,
                    MaxOutputTokenCount = 1024,
                }
            );

            stopwatch.Stop();

            llmActivity?.SetTag("gen_ai.response.model", "gpt-4o");
            llmActivity?.SetTag("gen_ai.usage.input_tokens", completion.Value.Usage.InputTokenCount);
            llmActivity?.SetTag("gen_ai.usage.output_tokens", completion.Value.Usage.OutputTokenCount);
            llmActivity?.SetTag("gen_ai.response.finish_reason", completion.Value.FinishReason.ToString());
            llmActivity?.SetTag("llm.latency.ms", stopwatch.ElapsedMilliseconds);
            llmActivity?.SetStatus(ActivityStatusCode.Ok);

            pipelineActivity?.SetStatus(ActivityStatusCode.Ok);
            return completion.Value.Content[0].Text;
        }
        catch (Exception ex)
        {
            pipelineActivity?.SetStatus(ActivityStatusCode.Error, ex.Message);
            pipelineActivity?.RecordException(ex);
            throw;
        }
    }

    private static string BuildSystemPrompt(List<RetrievedDoc> docs)
    {
        var context = string.Join("\n\n", docs.Select(d => d.Content));
        return $"You are a helpful assistant. Use the following context:\n\n{context}";
    }
}

public record RetrievedDoc(string Id, string Content, float Score);

Connecting to an Observability Backend

All three implementations export traces via OTLP, which means you can point them at any compatible backend by changing a single environment variable. Here is how to connect to the most commonly used options:

Langfuse (recommended for LLMOps)

# Encode your Langfuse keys as a Basic auth token
import base64
auth = base64.b64encode(b"PUBLIC_KEY:SECRET_KEY").decode()

# Set these environment variables
OTEL_EXPORTER_OTLP_ENDPOINT=https://cloud.langfuse.com/api/public/otel
LANGFUSE_AUTH_TOKEN=<your base64 encoded public:secret key>

Local development with Jaeger

# Run Jaeger locally with Docker
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  jaegertracing/all-in-one:latest

# Point your exporter at local Jaeger
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318/v1/traces

# Open Jaeger UI
open http://localhost:16686

Auto-instrumentation with OpenLLMetry

If you want automatic instrumentation of OpenAI, Anthropic, and vector DB calls without writing manual spans, OpenLLMetry wraps the OpenTelemetry SDK with LLM-specific auto-instrumentation:

# Python - auto-instrument all OpenAI and LangChain calls
pip install traceloop-sdk

from traceloop.sdk import Traceloop

Traceloop.init(
    app_name="llm-api-service",
    api_endpoint=os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT"),
)
# All subsequent OpenAI and LangChain calls are automatically traced

// Node.js - auto-instrument with OpenLLMetry
npm install @traceloop/node-server-sdk

import * as traceloop from "@traceloop/node-server-sdk";

traceloop.initialize({
  appName: "llm-api-service",
  apiEndpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
});
// All OpenAI, Anthropic, and LangChain calls are now traced automatically

What Your Traces Should Capture

A well-instrumented LLM pipeline trace gives you the ability to answer the following questions from a single trace view:

Which step in the pipeline consumed the most latency?
How many tokens did this request consume, and what did it cost?
What documents were retrieved, and were they relevant to the query?
Did the model call any tools, and did they succeed?
What was the finish reason — did the model stop naturally or hit a token limit?
Was there an exception at any point, and where exactly did it occur?

If your traces cannot answer all six of these questions, there are gaps in your instrumentation worth closing before you go to production.

What Comes Next

Traces tell you what happened and where time was spent. In Part 3, we go deeper into the metrics layer — the specific LLM measurements that matter for production operations: time-to-first-token, hallucination rate indicators, cost per request, and how to detect output quality drift before users report it.

Key Takeaways

OpenTelemetry is the vendor-neutral standard for LLM tracing — instrument once, export to any backend
GenAI semantic conventions give you standardized attribute names across all LLM providers
Every LLM pipeline step should be its own span: prompt rendering, retrieval, model call, tool execution, output validation
Always capture token counts, finish reason, and latency on every LLM span — these are your core production signals
OpenLLMetry provides auto-instrumentation for OpenAI, Anthropic, and vector DBs with zero manual span code
Local development with Jaeger lets you validate your instrumentation before connecting to a cloud backend

Distributed Tracing for LLM Applications with OpenTelemetry

What OpenTelemetry Gives You for LLM Applications

Trace Architecture for a RAG Pipeline

Node.js Implementation

Python Implementation

C# Implementation

Connecting to an Observability Backend

Langfuse (recommended for LLMOps)

Local development with Jaeger

Auto-instrumentation with OpenLLMetry

What Your Traces Should Capture

What Comes Next

Key Takeaways

References

Like this:

You may like

Written by:

Chandan 597 Posts

You May Have Missed

The Complete Picture: Balancing Professional and Personal Support Systems

For Parents, Partners, and Friends: A Guide to Supporting Your Loved One in Tech

The HR Conversation: When and How to Involve HR in Your Mental Health Journey

Finding Your Tech Tribe: The Power of Peer Support Groups

What OpenTelemetry Gives You for LLM Applications

Trace Architecture for a RAG Pipeline

Node.js Implementation

Python Implementation

C# Implementation

Connecting to an Observability Backend

Langfuse (recommended for LLMOps)

Local development with Jaeger

Auto-instrumentation with OpenLLMetry

What Your Traces Should Capture

What Comes Next

Key Takeaways

References

Like this:

You may like

Written by:

Chandan 597 Posts

Related Posts

Evaluating LLM Output Quality in Production: LLM-as-Judge and Human Feedback Loops

LLM Metrics That Actually Matter: Latency, Cost, Hallucination Rate, and Drift

Why LLMOps Is Not MLOps: The New Operational Reality for AI Teams

You May Have Missed

The Complete Picture: Balancing Professional and Personal Support Systems

For Parents, Partners, and Friends: A Guide to Supporting Your Loved One in Tech

The HR Conversation: When and How to Involve HR in Your Mental Health Journey

Finding Your Tech Tribe: The Power of Peer Support Groups