Tomáš Kubica

Deep dive into AI agent observability with Microsoft Agent Framework - tracing with open-source tools Aspire Dashboard and Langfuse

Tracing AI agents via Aspire Dashboard and Langfuse: a quick developer view, anonymization through the collector, and specialized AI observability.

In the first part we covered the basic architecture and reasons why I like to use the OpenTelemetry collector, so that last time we could focus on metrics and their use in Azure Monitor for Prometheus and Azure Managed Grafana. Today we dive into tracing, the most important discipline for AI agent observability, and we start with open-source tools: Aspire Dashboard for a quick developer view and Langfuse.

Open-source tracing for AI agents

OpenTelemetry is eating the observability world and I am a huge fan. After many years of combining proprietary tools and approaches with an alternative open-source SDK zoo, OpenTelemetry arrived with an ambition to unify metrics, logs, and tracing under a single protocol, single SDK and API. There are numerous backends that provide visualizations and data persistence, but for quick developer assessments they are all cumbersome, slow, expensive, or consume too many resources. That is why the .NET team developed Aspire Dashboard, a simple solution running in a single lightweight container where you can visualize OpenTelemetry quickly and nearly in real time.

Here we can see tracing from a multi-tenant Magentic-style solution in Microsoft Agent Framework.

Trace overview in Aspire Dashboard
Trace overview in Aspire Dashboard

In my case I also enabled logging of individual messages and responses.

Texts and responses in an Aspire trace
Texts and responses in an Aspire trace

I also collect custom attributes, such as session ID, logged-in user, user roles, department, and so on.

Custom attributes in Aspire Dashboard spans
Custom attributes in Aspire Dashboard spans

As you can verify in the first part of this series, my OpenTelemetry collector is configured to filter certain fields and hash others, thereby masking the original information. I have created another Aspire instance to which the OTEL collector sends filtered and anonymized information. Notice that I have no conversation content at all, the user ID is masked, but I still have the necessary technical information such as timings and token consumption.

Anonymized Aspire view without conversation content
Anonymized Aspire view without conversation content

This is what a tool call looks like.

Tool call detail in an Aspire trace
Tool call detail in an Aspire trace

For quick monitoring data snapshots, Aspire Dashboard is excellent - simple, tiny, fast.

Looking for a tracing tool that is open-source and specifically specialized for AI scenarios? Langfuse is very popular, but unfortunately even it is not fully open in terms of project governance (it is not under CNCF, Apache Foundation, or Linux Foundation) and is more of an MIT core. Nevertheless it is a very good solution, so let's take a look.

Right from the home screen you can see that Langfuse is not only about observability but also touches on the area of evaluation, which we will cover later in this series.

Langfuse project overview
Langfuse project overview
Langfuse navigation to tracing and evaluations
Langfuse navigation to tracing and evaluations
Langfuse observability and datasets view
Langfuse observability and datasets view

This is what a specific trace looks like - the same as what we saw in Aspire. Graphically it is different, but in principle the basic information is the same.

Concrete trace in Langfuse
Concrete trace in Langfuse

However, some things are not directly in the data but are derived - the calculation of token consumption in monetary terms is excellent.

Cost and token calculation in Langfuse
Cost and token calculation in Langfuse

Of course we can again nicely see the conversation itself.

Conversation in a Langfuse trace
Conversation in a Langfuse trace
Message detail in a Langfuse trace
Message detail in a Langfuse trace

Langfuse directly parses some well-known parameters, for example user ID. This allows it to immediately provide an overview of users and their token consumption.

User overview and token consumption
User overview and token consumption

From there you can drill down and see individual traces, sessions, and so on.

User detail and sessions in Langfuse
User detail and sessions in Langfuse
Individual traces in a session
Individual traces in a session

Langfuse also goes into evaluations, but we will discuss that later. You can take a captured conversation, add it to some dataset, annotate it, try it in a simulator, and so on.

Langfuse datasets and evaluation
Langfuse datasets and evaluation

For me, Langfuse is the best tracing tool specifically focused on AI in the open-source category. Even though it is not fully open in terms of governance, it is my first choice when I have to stay in the self-managed world. Its evaluation capabilities we will discuss later and it has its place there too, even though projects like DeepEval are strong, albeit somewhat differently focused, competition.

Today we dove into open-source options - Aspire for quick developer snapshots, Langfuse as an open-source specialized AI solution. Further alternatives lie in using non-specialized developer tools and in hosted solutions focused on AI scenarios. We will look at those next time - Azure Monitor and Azure AI Foundry.

Aspire Dashboard

small, fast, and excellent for a developer view nearly in real time.

Anonymization

the same telemetry stream can be sent via the OTEL collector once fully and once sanitized.

Langfuse

specialized AI observability tool with tracing, token economics, and a link to evaluations.

Tool choice

Aspire for quick troubleshooting, Langfuse as a self-managed AI tracing option.

The next step is to look at service-based variants - Azure Monitor and Azure AI Foundry.