Agent tracing overview (preview)

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Microsoft Foundry provides an observability platform for monitoring and tracing AI agents. It captures key details during an agent run, such as inputs, outputs, tool usage, retries, latencies, and costs. Understanding the reasoning behind your agent's executions is important for troubleshooting and debugging. However, understanding complex agents presents challenges for several reasons:

There could be a high number of steps involved in generating a response, making it hard to keep track of all of them.
The sequence of steps might vary based on user input.
The inputs/outputs at each stage might be long and deserve more detailed inspection.
Each step of an agent's runtime might also involve nesting. For example, an agent might invoke a tool, which uses another process, which then invokes another tool. If you notice strange or incorrect output from a top-level agent run, it might be difficult to determine exactly where in the execution the issue was introduced.

Trace results solve this by allowing you to view the inputs and outputs of each primitive involved in a particular agent run, displayed in the order they were invoked, making it easy to understand and debug your AI agent's behavior.

Prerequisites

To use tracing end-to-end, you need:

A Foundry project with tracing enabled. To set it up, see How to set up tracing in Microsoft Foundry.
Access to the Azure Application Insights resource connected to your project. For background, see Azure Application Insights.

Note

Tracing stores telemetry data in Azure Application Insights, which may incur costs based on data volume and retention settings. For pricing details, see Application Insights pricing.

OpenTelemetry in Foundry

OpenTelemetry (OTel) provides standardized protocols for collecting and routing telemetry data. Foundry uses OpenTelemetry semantic conventions so traces are consistent across supported tools and integrations.

Trace key concepts

Here's a brief overview of key concepts before getting started:

Key concepts	Description
Traces	Traces capture the journey of a request or workflow through your application by recording events and state changes (function calls, values, system events). See OpenTelemetry Traces.
Spans	Spans are the building blocks of traces, representing single operations within a trace. Each span captures start and end times, attributes, and can be nested to show hierarchical relationships, allowing you to see the full call stack and sequence of operations.
Attributes	Attributes are key-value pairs attached to traces and spans, providing contextual metadata such as function parameters, return values, or custom annotations. These enrich trace data making it more informative and useful for analysis.
Semantic conventions	OpenTelemetry defines semantic conventions to standardize names and formats for trace data attributes, making it easier to interpret and analyze across tools and platforms. To learn more, see OpenTelemetry's Semantic Conventions.
Trace exporters	Trace exporters send trace data to backend systems for storage and analysis. In Foundry, traces are stored in Azure Monitor Application Insights. To learn how to enable and view traces, see How to set up tracing in Microsoft Foundry.

How tracing works in Foundry

Tracing helps you answer questions like "Where did this response come from?" and "Which step introduced an error or latency spike?"

At a high level, tracing captures:

User inputs and agent outputs.
Tool usage, including tool calls and results.
Timing signals such as latency.

Once tracing is enabled for your project, you can inspect traces in the Foundry portal and in Azure Monitor Application Insights. For the step-by-step setup and viewing options, see How to set up tracing in Microsoft Foundry.

Extending OpenTelemetry with multi-agent observability

Microsoft, in collaboration with Cisco Outshift, has introduced new semantic conventions for multi-agent systems, built on OpenTelemetry and W3C Trace Context. These conventions standardize telemetry for multi-agent workflows, enabling consistent logging of metrics for quality, performance, safety, and cost, including tool invocations and collaboration.

These enhancements are integrated into:

Foundry
Microsoft Agent Framework
Semantic Kernel
LangChain
LangGraph
OpenAI Agents SDK

To learn more, see tracing integrations.

The following table describes the semantic conventions for multi-agent observability. Spans capture discrete operations, child spans show nested operations within a parent span, attributes provide metadata, and events mark significant occurrences during execution.

Type	Context/Parent Span	Name/Attribute/Event	Purpose
Span	—	execute_task	Captures task planning and event propagation, providing insights into how tasks are decomposed and distributed.
Child Span	invoke_agent	agent_to_agent_interaction	Traces communication between agents.
Child Span	invoke_agent	agent.state.management	Effective context, short or long term memory management.
Child Span	invoke_agent	agent_planning	Logs the agent's internal planning steps.
Child Span	invoke_agent	agent orchestration	Captures agent-to-agent orchestration.
Attribute	invoke_agent	tool_definitions	Describes the tool's purpose or configuration.
Attribute	invoke_agent	llm_spans	Records model call spans.
Attribute	execute_tool	tool.call.arguments	Logs the arguments passed during tool invocation.
Attribute	execute_tool	tool.call.results	Records the results returned by the tool.
Event	—	Evaluation (name, error.type, label)	Enables structured evaluation of agent performance and decision-making.

Best practices

Use consistent span attributes: Apply the same attribute names and formats across all agents and tools to simplify querying and analysis.
Correlate evaluation run IDs: Link trace data with evaluation runs to analyze both quality and performance in a unified view.
Redact sensitive content: Remove or mask personal data, secrets, and credentials from prompts, tool arguments, and span attributes before they reach telemetry.

Security and privacy

Tracing can capture sensitive information (for example, user inputs, model outputs, and tool arguments and results). Use these practices to reduce risk:

Don't store secrets, credentials, or tokens in prompts, tool arguments, or span attributes.
Redact or minimize personal data and other sensitive content before it appears in telemetry.
Treat trace data as production telemetry and apply the same access controls and retention policies you use for logs and metrics.

Troubleshooting

If traces aren't appearing in the Foundry portal or Application Insights:

Verify that your Foundry project is connected to an Application Insights resource.
Check that your account has the required permissions to query telemetry.
Ensure your agent code includes the necessary instrumentation. For framework-specific setup, see Tracing integrations.

Tip

Tracing is available in all regions where Foundry is supported. Trace data retention and sampling follow your Application Insights configuration. For details, see Data retention and archive in Azure Monitor Logs.

Feedback

Was this page helpful?

Last updated on 2026-02-04