Skip to main content
Vision Agents can run as a single process on your laptop or as a scaled-out service in Kubernetes. This guide maps the path between the two.

The path to production

Run locally

Start your agent as an HTTP server with session management. This is the foundation for everything that follows.
uv run agent.py serve
The server handles session creation, health checks, authentication, and metrics out of the box.

Built-in HTTP Server

API endpoints, session limits, and authentication

Containerize

Package your agent into a Docker image for deployment to any environment.

Docker Deployment

Dockerfiles for CPU and GPU, environment configuration

Scale out

Running multiple replicas? Add a Redis-backed session registry so any node can manage any session.

Horizontal Scaling

Redis session store, custom backends, heartbeat mechanism

Orchestrate

The complete setup: Helm chart, health probes, Redis, Prometheus scraping, and a Grafana dashboard.

Kubernetes Deployment

Step-by-step guide with monitoring included

Observe

Track latency, token usage, and errors across all components with OpenTelemetry. Works at any stage, not just Kubernetes.

Telemetry & Metrics

Metrics reference, Prometheus queries, Jaeger tracing

Pick your starting point

Not every project needs every step.
GoalStart here
Local development and testingHTTP Server
Deploy a single containerDocker Deployment
Run multiple replicasHorizontal Scaling
Full production setupKubernetes Deployment
Add metrics to any setupTelemetry & Metrics