Agent Observability

Full-stack monitoring, tracing, and debugging for production AI agents

Standard APM tools miss most of what matters in AI systems: prompt content, token budgets, retrieval quality, and reasoning chains. Our observability stack is designed from the ground up for agents — capturing every signal that matters and surfacing it when you need it.

Distributed Tracing#

What Gets Traced Automatically#

Every instrumented agent records:

Span Type	Captured Data
LLM request	Model, prompt tokens, completion tokens, latency, cost, finish reason
Tool call	Tool name, input arguments, output, duration, error
Memory read/write	Query, results, namespace, latency
Agent handoff	From/to agent, context passed, reason
Retrieval	Query, top-k results, scores, source documents
Workflow step	Step name, status, retry count, checkpoint state

Trace Visualization#

Waterfall timeline showing parallelism and bottlenecks
Cost and token breakdown per span
Input/output diff viewer for LLM calls
Side-by-side comparison of two trace runs

Sampling#

Always-sample for errors, slow traces, and high-cost runs
Probabilistic for baseline traffic (configurable rate)
Tail-based sampling to keep interesting traces regardless of outcome
Zero code changes required to change sampling strategy

Cost Analytics#

Track LLM spend with the granularity of software observability tools.

Dashboards include:

Daily / weekly spend by model, agent, and workflow
Cost per successful task completion
Token efficiency ratio (output tokens per dollar)
Spend forecasting based on current trajectory
Per-user and per-team cost attribution

Alerts:

Budget threshold warnings (50%, 80%, 100%)
Cost spike detection (>2× day-over-day)
High-cost trace flagging for manual review

Quality Signals#

Observability for AI goes beyond latency and error rates.

Hallucination Detection#

Confidence scoring on factual claims using a lightweight verifier model
Source attribution checks (did the agent cite something it wasn't given?)
Flagging answers that contradict retrieved context

Retrieval Quality#

Precision@k tracking for RAG pipelines
Context relevance scoring per retrieved chunk
Unused context detection (retrieval cost with no impact on output)

Response Quality#

Structured output validation failures tracked per schema
Refusal and safety filter activation rates
Response length distribution and truncation events

Debugging Tools#

Trace Replay#

Re-execute any historical trace with modified inputs, prompts, or model parameters. Compare outputs side-by-side. No need to reconstruct the full context manually.

Session Inspection#

Full conversation view with:

User messages and agent responses
Internal reasoning steps (chain-of-thought)
All tool calls and their results
Memory reads at each turn
Token count per message

Prompt Versioning#

Track prompt changes across deployments. A/B compare prompt versions by cost, latency, and quality metrics on production traffic.

Alerting#

Alert Type	Default Threshold	Configurable
Error rate spike	>5% over 5 min	Yes
p95 latency degradation	>2× baseline	Yes
LLM cost overrun	>150% of budget	Yes
Tool failure rate	>10% over 1 hour	Yes
Hallucination score	>0.3 average	Yes
Agent stuck / timeout	>configured timeout	Yes

Delivers to: Slack, PagerDuty, OpsGenie, email, and custom webhooks.

Integration#

Zero-Code Instrumentation#

Drop in our SDK and all LLM calls are automatically traced:

1
import { instrument } from '@assistance/observe'
2
3
instrument({
4
  serviceName: 'my-agent',
5
  endpoint: 'https://ingest.observe.assistance.bg',
6
  apiKey: process.env.OBSERVE_API_KEY,
7
})
8
// All OpenAI, Anthropic, and LangChain calls now traced

Manual Spans#

1
import { tracer } from '@assistance/observe'
2
3
const span = tracer.startSpan('custom-retrieval')
4
const results = await myVectorDb.search(query)
5
span.setAttributes({ resultCount: results.length })
6
span.end()

Framework Support#

Works out of the box with LangChain, LangGraph, CrewAI, AutoGen, custom agent loops, and any framework that uses standard LLM client libraries.

Data Retention#

Tier	Retention	Resolution
Full traces	30 days	Raw
Aggregated metrics	13 months	1-minute
Cost data	24 months	Per-request
Anomaly events	24 months	Raw

Getting Started#

Install the SDK in under 10 minutes. We'll walk you through instrumenting your first agent and setting up your dashboards.

Set up agent observability →

Agent Observability

Full-stack monitoring, tracing, and debugging for production AI agents

Distributed Tracing#

What Gets Traced Automatically#

Trace Visualization#

Sampling#

Cost Analytics#

Quality Signals#

Hallucination Detection#

Retrieval Quality#

Response Quality#

Debugging Tools#

Trace Replay#

Session Inspection#

Prompt Versioning#

Alerting#

Integration#

Zero-Code Instrumentation#

Manual Spans#

Framework Support#

Data Retention#

Getting Started#

Is this helpful?

AI Tools

Agent Observability

Full-stack monitoring, tracing, and debugging for production AI agents

Distributed Tracing#

What Gets Traced Automatically#

Trace Visualization#

Sampling#

Cost Analytics#

Quality Signals#

Hallucination Detection#

Retrieval Quality#

Response Quality#

Debugging Tools#

Trace Replay#

Session Inspection#

Prompt Versioning#

Alerting#

Integration#

Zero-Code Instrumentation#

Manual Spans#

Framework Support#

Data Retention#

Getting Started#

Related Services#

Is this helpful?

AI Tools