> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Telemetry & Metrics

Vision Agents provides built-in observability through [OpenTelemetry](https://opentelemetry.io/). Collect metrics and traces across all components to monitor performance, latency, and errors in your agents.

## Quick Start

To enable metrics collection, configure OpenTelemetry:

```python theme={null}
# 1. Configure OpenTelemetry
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

start_http_server(9464)
reader = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))

# 2. Now import and create your agent
from vision_agents.core import Agent

agent = Agent(llm=..., stt=..., tts=...)
```

Metrics are now available at `http://localhost:9464/metrics`.

## MetricsCollector

The `MetricsCollector` class subscribes to events from all agent components and records OpenTelemetry metrics automatically. Each `Agent` automatically creates a `MetricsCollector` internally, so metrics collection is enabled by default.

If no OpenTelemetry providers are configured, metrics are no-ops and have no performance impact.

For new integrations, prefer the collector's normalized `on_*` metric hooks and `agent.metrics` over provider-specific event classes.

The collector listens to events from:

* **LLM** — Response latency, token usage, tool calls
* **STT** — Transcription latency, audio duration
* **TTS** — Synthesis latency, audio duration, characters
* **Turn Detection** — Turn duration, trailing silence
* **Realtime LLM** — Session metrics, audio I/O, transcriptions
* **VLM** — Inference latency, token usage
* **Video Processors** — Frame processing, detections

### Metric Attributes

All metrics include contextual attributes:

| Attribute    | Description                                  |
| ------------ | -------------------------------------------- |
| `provider`   | The plugin name (e.g., `openai`, `deepgram`) |
| `model`      | Model identifier when available              |
| `error_type` | Exception class name for error metrics       |
| `error_code` | Error code when available                    |

## Metrics Reference

All metrics use the `vision_agents.core` meter namespace.

### STT Metrics

| Metric                  | Type      | Unit | Description                           |
| ----------------------- | --------- | ---- | ------------------------------------- |
| `stt.latency.ms`        | Histogram | ms   | Processing latency for speech-to-text |
| `stt.audio_duration.ms` | Histogram | ms   | Duration of audio processed           |
| `stt.errors`            | Counter   | —    | Total STT errors                      |

### TTS Metrics

| Metric                  | Type      | Unit | Description                   |
| ----------------------- | --------- | ---- | ----------------------------- |
| `tts.latency.ms`        | Histogram | ms   | Synthesis latency             |
| `tts.audio_duration.ms` | Histogram | ms   | Duration of synthesized audio |
| `tts.characters`        | Counter   | —    | Characters synthesized        |
| `tts.errors`            | Counter   | —    | Total TTS errors              |

### LLM Metrics

| Metric                       | Type      | Unit | Description                            |
| ---------------------------- | --------- | ---- | -------------------------------------- |
| `llm.latency.ms`             | Histogram | ms   | Response latency (request to complete) |
| `llm.time_to_first_token.ms` | Histogram | ms   | Time to first token (streaming)        |
| `llm.tokens.input`           | Counter   | —    | Input/prompt tokens consumed           |
| `llm.tokens.output`          | Counter   | —    | Output/completion tokens generated     |
| `llm.tool_calls`             | Counter   | —    | Tool/function calls executed           |
| `llm.tool_latency.ms`        | Histogram | ms   | Tool execution latency                 |
| `llm.errors`                 | Counter   | —    | Total LLM errors                       |

### Turn Detection Metrics

| Metric                     | Type      | Unit | Description                       |
| -------------------------- | --------- | ---- | --------------------------------- |
| `turn.duration.ms`         | Histogram | ms   | Duration of detected speech turns |
| `turn.trailing_silence.ms` | Histogram | ms   | Silence duration before turn end  |

### Realtime LLM Metrics

For speech-to-speech models like OpenAI Realtime:

| Metric                              | Type      | Unit  | Description                   |
| ----------------------------------- | --------- | ----- | ----------------------------- |
| `realtime.sessions`                 | Counter   | —     | Sessions started              |
| `realtime.session_duration.ms`      | Histogram | ms    | Session duration              |
| `realtime.audio.input.bytes`        | Counter   | bytes | Audio bytes sent to LLM       |
| `realtime.audio.output.bytes`       | Counter   | bytes | Audio bytes received from LLM |
| `realtime.audio.input.duration.ms`  | Counter   | ms    | Audio duration sent           |
| `realtime.audio.output.duration.ms` | Counter   | ms    | Audio duration received       |
| `realtime.responses`                | Counter   | —     | Complete responses received   |
| `realtime.transcriptions.user`      | Counter   | —     | User speech transcriptions    |
| `realtime.transcriptions.agent`     | Counter   | —     | Agent speech transcriptions   |
| `realtime.errors`                   | Counter   | —     | Realtime errors               |

### VLM / Vision Metrics

| Metric                     | Type      | Unit | Description                 |
| -------------------------- | --------- | ---- | --------------------------- |
| `vlm.inference.latency.ms` | Histogram | ms   | VLM inference latency       |
| `vlm.inferences`           | Counter   | —    | Inference requests          |
| `vlm.tokens.input`         | Counter   | —    | Input tokens (text + image) |
| `vlm.tokens.output`        | Counter   | —    | Output tokens               |
| `vlm.errors`               | Counter   | —    | VLM errors                  |

### Video Processor Metrics

| Metric                        | Type      | Unit | Description              |
| ----------------------------- | --------- | ---- | ------------------------ |
| `video.frames.processed`      | Counter   | —    | Frames processed         |
| `video.processing.latency.ms` | Histogram | ms   | Frame processing latency |
| `video.detections`            | Counter   | —    | Objects/items detected   |

## AgentMetrics

For in-process metrics without external infrastructure, access aggregated metrics directly from the agent:

```python theme={null}
# After running your agent
metrics = agent.metrics

# STT
print(f"Average STT latency: {metrics.stt_latency_ms__avg.value()} ms")
print(f"Total audio processed: {metrics.stt_audio_duration_ms__total.value()} ms")

# TTS
print(f"Average TTS latency: {metrics.tts_latency_ms__avg.value()} ms")
print(f"Characters synthesized: {metrics.tts_characters__total.value()}")

# LLM
print(f"Average LLM latency: {metrics.llm_latency_ms__avg.value()} ms")
print(f"Input tokens: {metrics.llm_input_tokens__total.value()}")
print(f"Output tokens: {metrics.llm_output_tokens__total.value()}")
print(f"Tool calls: {metrics.llm_tool_calls__total.value()}")
```

### Available AgentMetrics

| Metric                                     | Type    | Description                      |
| ------------------------------------------ | ------- | -------------------------------- |
| `stt_latency_ms__avg`                      | Average | Average STT processing latency   |
| `stt_audio_duration_ms__total`             | Counter | Total audio duration processed   |
| `tts_latency_ms__avg`                      | Average | Average TTS synthesis latency    |
| `tts_audio_duration_ms__total`             | Counter | Total synthesized audio duration |
| `tts_characters__total`                    | Counter | Total characters synthesized     |
| `llm_latency_ms__avg`                      | Average | Average LLM response latency     |
| `llm_time_to_first_token_ms__avg`          | Average | Average time to first token      |
| `llm_input_tokens__total`                  | Counter | Total input tokens               |
| `llm_output_tokens__total`                 | Counter | Total output tokens              |
| `llm_tool_calls__total`                    | Counter | Total tool calls                 |
| `llm_tool_latency_ms__avg`                 | Average | Average tool execution latency   |
| `turn_duration_ms__avg`                    | Average | Average turn duration            |
| `turn_trailing_silence_ms__avg`            | Average | Average trailing silence         |
| `realtime_audio_input_bytes__total`        | Counter | Total audio bytes sent           |
| `realtime_audio_output_bytes__total`       | Counter | Total audio bytes received       |
| `realtime_audio_input_duration_ms__total`  | Counter | Total input audio duration       |
| `realtime_audio_output_duration_ms__total` | Counter | Total output audio duration      |
| `realtime_user_transcriptions__total`      | Counter | Total user transcriptions        |
| `realtime_agent_transcriptions__total`     | Counter | Total agent transcriptions       |
| `vlm_inference_latency_ms__avg`            | Average | Average VLM inference latency    |
| `vlm_inferences__total`                    | Counter | Total VLM inferences             |
| `vlm_input_tokens__total`                  | Counter | Total VLM input tokens           |
| `vlm_output_tokens__total`                 | Counter | Total VLM output tokens          |
| `video_frames_processed__total`            | Counter | Total frames processed           |
| `video_processing_latency_ms__avg`         | Average | Average frame processing latency |

## Prometheus Setup

Export metrics to Prometheus for monitoring dashboards and alerting.

**Step 1 — Install the exporter**

```bash theme={null}
uv add opentelemetry-exporter-prometheus prometheus-client
```

**Step 2 — Configure OpenTelemetry**

```python theme={null}
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

# Start HTTP server for Prometheus scraping
start_http_server(port=9464)

# Configure OpenTelemetry
reader = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))
```

**Step 3 — Create and run your agent**

```python theme={null}
from vision_agents.core import Agent, AgentLauncher, Runner

agent = Agent(...)
# MetricsCollector is automatically attached

# Run with CLI
Runner(AgentLauncher(create_agent=..., join_call=...)).cli()
```

View metrics at `http://localhost:9464/metrics`.

## Tracing with Jaeger

Trace requests across components for debugging latency issues.

**Step 1 — Install the exporter**

```bash theme={null}
uv add opentelemetry-sdk opentelemetry-exporter-otlp
```

**Step 2 — Configure tracing**

```python theme={null}
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

resource = Resource.create({"service.name": "my-agent"})
provider = TracerProvider(resource=resource)
exporter = OTLPSpanExporter(endpoint="localhost:4317", insecure=True)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
```

**Step 3 — Run Jaeger**

```bash theme={null}
docker run --rm -it \
         -e COLLECTOR_OTLP_ENABLED=true \
         -p 16686:16686 -p 4317:4317 -p 4318:4318 \
         jaegertracing/all-in-one:1.51
```

View traces at `http://localhost:16686`.

## Complete Example

```python theme={null}
"""Prometheus metrics example with Vision Agents."""

# Configure OpenTelemetry
from opentelemetry import metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

start_http_server(9464)
reader = PrometheusMetricReader()
metrics.set_meter_provider(MeterProvider(metric_readers=[reader]))

# Now import agents
from vision_agents.core import Agent, User, AgentLauncher, Runner
from vision_agents.plugins import deepgram, getstream, gemini, elevenlabs


async def create_agent(**kwargs) -> Agent:
    return Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Metrics Agent", id="agent"),
        instructions="You're a helpful voice assistant.",
        llm=gemini.LLM("gemini-flash-lite-latest"),
        tts=elevenlabs.TTS(),
        stt=deepgram.STT(),
    )


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    # MetricsCollector is automatically attached to the agent
    call = await agent.create_call(call_type, call_id)
    async with agent.join(call):
        await agent.simple_response("Hello! Metrics are being collected.")
        await agent.finish()

    # Print summary after call
    m = agent.metrics
    print(f"LLM latency: {m.llm_latency_ms__avg.value():.1f} ms")
    print(f"Tokens: {m.llm_input_tokens__total.value()} in / {m.llm_output_tokens__total.value()} out")


if __name__ == "__main__":
    Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()
```

Run with:

```bash theme={null}
uv run agent.py run --call-type default --call-id test
```

Metrics available at `http://localhost:9464/metrics`.

## Example Prometheus Queries

<Note>
  OpenTelemetry metric names use dots (e.g., `llm.latency.ms`). Prometheus converts these to underscores when scraping (e.g., `llm_latency_ms`).
</Note>

**Average LLM latency over time:**

```promql theme={null}
rate(llm_latency_ms_sum[5m]) / rate(llm_latency_ms_count[5m])
```

**Total tokens used:**

```promql theme={null}
sum(llm_tokens_input) + sum(llm_tokens_output)
```

**Error rate:**

```promql theme={null}
rate(llm_errors_total[5m])
```

## Best Practices

**Configure OpenTelemetry** - Set up providers to enable metric collection. If no providers are configured, metrics are no-ops.

**MetricsCollector is automatic** - Each Agent automatically creates a MetricsCollector internally. If no OpenTelemetry provider is configured, metrics are no-ops with no performance impact.

**Use AgentMetrics for simple logging** - Access `agent.metrics` directly for in-process metrics without external infrastructure.

**Add resource attributes** - Include service name and environment in your metrics:

```python theme={null}
from opentelemetry.sdk.resources import Resource

resource = Resource.create({
    "service.name": "my-agent",
    "service.version": "1.0.0",
    "deployment.environment": "production",
})
provider = MeterProvider(resource=resource, metric_readers=[reader])
```

**Set up alerting on:**

* LLM latency > 2000ms (p95)
* Error rate > 1%
* Token usage anomalies

## Next Steps

* [Kubernetes Deployment](/guides/kubernetes-deployment) - Helm chart with Prometheus and Grafana out of the box
* [Built-in HTTP Server](/guides/http-server) - Console mode and HTTP server for session management