Vision Agents provides built-in observability through OpenTelemetry. Collect metrics and traces across all components to monitor performance, latency, and errors in your agents.
Quick Start
To enable metrics collection, configure OpenTelemetry:
# 1. Configure OpenTelemetry
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server
start_http_server(9464)
reader = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))
# 2. Now import and create your agent
from vision_agents.core import Agent
agent = Agent(llm=..., stt=..., tts=...)
Metrics are now available at http://localhost:9464/metrics.
MetricsCollector
The MetricsCollector class subscribes to events from all agent components and records OpenTelemetry metrics automatically. Each Agent automatically creates a MetricsCollector internally, so metrics collection is enabled by default.
If no OpenTelemetry providers are configured, metrics are no-ops and have no performance impact.
The collector listens to events from:
- LLM — Response latency, token usage, tool calls
- STT — Transcription latency, audio duration
- TTS — Synthesis latency, audio duration, characters
- Turn Detection — Turn duration, trailing silence
- Realtime LLM — Session metrics, audio I/O, transcriptions
- VLM — Inference latency, token usage
- Video Processors — Frame processing, detections
Metric Attributes
All metrics include contextual attributes:
| Attribute | Description |
|---|
provider | The plugin name (e.g., openai, deepgram) |
model | Model identifier when available |
error_type | Exception class name for error metrics |
error_code | Error code when available |
Metrics Reference
All metrics use the vision_agents.core meter namespace.
STT Metrics
| Metric | Type | Unit | Description |
|---|
stt.latency.ms | Histogram | ms | Processing latency for speech-to-text |
stt.audio_duration.ms | Histogram | ms | Duration of audio processed |
stt.errors | Counter | — | Total STT errors |
TTS Metrics
| Metric | Type | Unit | Description |
|---|
tts.latency.ms | Histogram | ms | Synthesis latency |
tts.audio_duration.ms | Histogram | ms | Duration of synthesized audio |
tts.characters | Counter | — | Characters synthesized |
tts.errors | Counter | — | Total TTS errors |
LLM Metrics
| Metric | Type | Unit | Description |
|---|
llm.latency.ms | Histogram | ms | Response latency (request to complete) |
llm.time_to_first_token.ms | Histogram | ms | Time to first token (streaming) |
llm.tokens.input | Counter | — | Input/prompt tokens consumed |
llm.tokens.output | Counter | — | Output/completion tokens generated |
llm.tool_calls | Counter | — | Tool/function calls executed |
llm.tool_latency.ms | Histogram | ms | Tool execution latency |
llm.errors | Counter | — | Total LLM errors |
Turn Detection Metrics
| Metric | Type | Unit | Description |
|---|
turn.duration.ms | Histogram | ms | Duration of detected speech turns |
turn.trailing_silence.ms | Histogram | ms | Silence duration before turn end |
Realtime LLM Metrics
For speech-to-speech models like OpenAI Realtime:
| Metric | Type | Unit | Description |
|---|
realtime.sessions | Counter | — | Sessions started |
realtime.session_duration.ms | Histogram | ms | Session duration |
realtime.audio.input.bytes | Counter | bytes | Audio bytes sent to LLM |
realtime.audio.output.bytes | Counter | bytes | Audio bytes received from LLM |
realtime.audio.input.duration.ms | Counter | ms | Audio duration sent |
realtime.audio.output.duration.ms | Counter | ms | Audio duration received |
realtime.responses | Counter | — | Complete responses received |
realtime.transcriptions.user | Counter | — | User speech transcriptions |
realtime.transcriptions.agent | Counter | — | Agent speech transcriptions |
realtime.errors | Counter | — | Realtime errors |
VLM / Vision Metrics
| Metric | Type | Unit | Description |
|---|
vlm.inference.latency.ms | Histogram | ms | VLM inference latency |
vlm.inferences | Counter | — | Inference requests |
vlm.tokens.input | Counter | — | Input tokens (text + image) |
vlm.tokens.output | Counter | — | Output tokens |
vlm.errors | Counter | — | VLM errors |
Video Processor Metrics
| Metric | Type | Unit | Description |
|---|
video.frames.processed | Counter | — | Frames processed |
video.processing.latency.ms | Histogram | ms | Frame processing latency |
video.detections | Counter | — | Objects/items detected |
AgentMetrics
For in-process metrics without external infrastructure, access aggregated metrics directly from the agent:
# After running your agent
metrics = agent.metrics
# STT
print(f"Average STT latency: {metrics.stt_latency_ms__avg.value()} ms")
print(f"Total audio processed: {metrics.stt_audio_duration_ms__total.value()} ms")
# TTS
print(f"Average TTS latency: {metrics.tts_latency_ms__avg.value()} ms")
print(f"Characters synthesized: {metrics.tts_characters__total.value()}")
# LLM
print(f"Average LLM latency: {metrics.llm_latency_ms__avg.value()} ms")
print(f"Input tokens: {metrics.llm_input_tokens__total.value()}")
print(f"Output tokens: {metrics.llm_output_tokens__total.value()}")
print(f"Tool calls: {metrics.llm_tool_calls__total.value()}")
Available AgentMetrics
| Metric | Type | Description |
|---|
stt_latency_ms__avg | Average | Average STT processing latency |
stt_audio_duration_ms__total | Counter | Total audio duration processed |
tts_latency_ms__avg | Average | Average TTS synthesis latency |
tts_audio_duration_ms__total | Counter | Total synthesized audio duration |
tts_characters__total | Counter | Total characters synthesized |
llm_latency_ms__avg | Average | Average LLM response latency |
llm_time_to_first_token_ms__avg | Average | Average time to first token |
llm_input_tokens__total | Counter | Total input tokens |
llm_output_tokens__total | Counter | Total output tokens |
llm_tool_calls__total | Counter | Total tool calls |
llm_tool_latency_ms__avg | Average | Average tool execution latency |
turn_duration_ms__avg | Average | Average turn duration |
turn_trailing_silence_ms__avg | Average | Average trailing silence |
realtime_audio_input_bytes__total | Counter | Total audio bytes sent |
realtime_audio_output_bytes__total | Counter | Total audio bytes received |
realtime_audio_input_duration_ms__total | Counter | Total input audio duration |
realtime_audio_output_duration_ms__total | Counter | Total output audio duration |
realtime_user_transcriptions__total | Counter | Total user transcriptions |
realtime_agent_transcriptions__total | Counter | Total agent transcriptions |
vlm_inference_latency_ms__avg | Average | Average VLM inference latency |
vlm_inferences__total | Counter | Total VLM inferences |
vlm_input_tokens__total | Counter | Total VLM input tokens |
vlm_output_tokens__total | Counter | Total VLM output tokens |
video_frames_processed__total | Counter | Total frames processed |
video_processing_latency_ms__avg | Average | Average frame processing latency |
Prometheus Setup
Export metrics to Prometheus for monitoring dashboards and alerting.
Step 1 — Install the exporter
uv add opentelemetry-exporter-prometheus prometheus-client
Step 2 — Configure OpenTelemetry
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server
# Start HTTP server for Prometheus scraping
start_http_server(port=9464)
# Configure OpenTelemetry
reader = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))
Step 3 — Create and run your agent
from vision_agents.core import Agent, AgentLauncher, Runner
agent = Agent(...)
# MetricsCollector is automatically attached
# Run with CLI
Runner(AgentLauncher(create_agent=..., join_call=...)).cli()
View metrics at http://localhost:9464/metrics.
Tracing with Jaeger
Trace requests across components for debugging latency issues.
Step 1 — Install the exporter
uv add opentelemetry-sdk opentelemetry-exporter-otlp
Step 2 — Configure tracing
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
resource = Resource.create({"service.name": "my-agent"})
provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint="localhost:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
Step 3 — Run Jaeger
docker run --rm -it \
-e COLLECTOR_OTLP_ENABLED=true \
-p 16686:16686 -p 4317:4317 -p 4318:4318 \
jaegertracing/all-in-one:1.51
View traces at http://localhost:16686.
Complete Example
"""Prometheus metrics example with Vision Agents."""
# Configure OpenTelemetry
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server
start_http_server(9464)
reader = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))
# Now import agents
from vision_agents.core import Agent, User, AgentLauncher, Runner
from vision_agents.plugins import deepgram, getstream, gemini, elevenlabs
async def create_agent(**kwargs) -> Agent:
return Agent(
edge=getstream.Edge(),
agent_user=User(name="Metrics Agent", id="agent"),
instructions="You're a helpful voice assistant.",
llm=gemini.LLM("gemini-3.1-flash-lite-preview"),
tts=elevenlabs.TTS(),
stt=deepgram.STT(),
)
async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
# MetricsCollector is automatically attached to the agent
call = await agent.create_call(call_type, call_id)
async with agent.join(call):
await agent.simple_response("Hello! Metrics are being collected.")
await agent.finish()
# Print summary after call
m = agent.metrics
print(f"LLM latency: {m.llm_latency_ms__avg.value():.1f} ms")
print(f"Tokens: {m.llm_input_tokens__total.value()} in / {m.llm_output_tokens__total.value()} out")
if __name__ == "__main__":
Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()
Run with:
uv run agent.py run --call-type default --call-id test
Metrics available at http://localhost:9464/metrics.
Example Prometheus Queries
OpenTelemetry metric names use dots (e.g., llm.latency.ms). Prometheus converts these to underscores when scraping (e.g., llm_latency_ms).
Average LLM latency over time:
rate(llm_latency_ms_sum[5m]) / rate(llm_latency_ms_count[5m])
Total tokens used:
sum(llm_tokens_input) + sum(llm_tokens_output)
Error rate:
rate(llm_errors_total[5m])
Best Practices
Configure OpenTelemetry - Set up providers to enable metric collection. If no providers are configured, metrics are no-ops.
MetricsCollector is automatic - Each Agent automatically creates a MetricsCollector internally. If no OpenTelemetry provider is configured, metrics are no-ops with no performance impact.
Use AgentMetrics for simple logging - Access agent.metrics directly for in-process metrics without external infrastructure.
Add resource attributes - Include service name and environment in your metrics:
from opentelemetry.sdk.resources import Resource
resource = Resource.create({
"service.name": "my-agent",
"service.version": "1.0.0",
"deployment.environment": "production",
})
provider = MeterProvider(resource=resource, metric_readers=[reader])
Set up alerting on:
- LLM latency > 2000ms (p95)
- Error rate > 1%
- Token usage anomalies
Next Steps