Skip to main content
Deploy Vision Agents to production using Docker. For a complete Kubernetes setup with Helm charts, monitoring, and Grafana dashboards, see the Kubernetes Deployment guide.

Key Considerations

FactorRecommendation
RegionUS East for lowest latency (most AI providers default here)
CPU vs GPUCPU for most voice agents; GPU only if running local models
ScalingUse the HTTP server for multi-session deployments

Docker

Two Dockerfiles are provided: CPU (Dockerfile) - Small, fast to build (~150MB)
FROM python:3.13-slim
WORKDIR /app
RUN pip install uv
COPY pyproject.toml uv.lock agent.py ./
EXPOSE 8080
ENV UV_LINK_MODE=copy
CMD ["sh", "-c", "uv sync --frozen && uv run agent.py serve --host 0.0.0.0 --port 8080"]
GPU (Dockerfile.gpu) - For local model inference (~8GB)
FROM pytorch/pytorch:2.9.1-cuda12.8-cudnn9-runtime
WORKDIR /app
RUN pip install uv
COPY pyproject.toml uv.lock agent.py ./
EXPOSE 8080
ENV UV_LINK_MODE=copy
CMD ["sh", "-c", "uv sync --frozen && uv run agent.py serve --host 0.0.0.0 --port 8080"]
Build for Linux (required for cloud deployment):
docker buildx build --platform linux/amd64 -t vision-agent .
Only use the GPU Dockerfile if running local models (Roboflow, local VLMs). Most voice agents use cloud APIs and don’t need GPUs. Make sure CUDA drivers are installed and the base image matches your CUDA version.

Environment Variables

Create a .env file with your API keys:
STREAM_API_KEY=your_key
STREAM_API_SECRET=your_secret
DEEPGRAM_API_KEY=your_key
ELEVENLABS_API_KEY=your_key
GOOGLE_API_KEY=your_key
For Kubernetes, create a secret:
kubectl create secret generic vision-agent-env --from-env-file=.env

Next Steps

Built-in HTTP Server

API endpoints, session limits, and authentication

Horizontal Scaling

Scale across multiple servers with Redis

Kubernetes Deployment

Helm chart, Prometheus, and Grafana

Telemetry & Metrics

OpenTelemetry, Prometheus, and Jaeger setup