Deploy Vision Agents to production using Docker. For a complete Kubernetes setup with Helm charts, monitoring, and Grafana dashboards, see the Kubernetes Deployment guide.
Key Considerations
Factor Recommendation Region US East for lowest latency (most AI providers default here) CPU vs GPU CPU for most voice agents; GPU only if running local models Scaling Use the HTTP server for multi-session deployments
Docker
Two Dockerfiles are provided:
CPU (Dockerfile) - Small, fast to build (~150MB)
FROM python:3.13-slim
WORKDIR /app
RUN pip install uv
COPY pyproject.toml uv.lock agent.py ./
EXPOSE 8080
ENV UV_LINK_MODE=copy
CMD [ "sh" , "-c" , "uv sync --frozen && uv run agent.py serve --host 0.0.0.0 --port 8080" ]
GPU (Dockerfile.gpu) - For local model inference (~8GB)
FROM pytorch/pytorch:2.9.1-cuda12.8-cudnn9-runtime
WORKDIR /app
RUN pip install uv
COPY pyproject.toml uv.lock agent.py ./
EXPOSE 8080
ENV UV_LINK_MODE=copy
CMD [ "sh" , "-c" , "uv sync --frozen && uv run agent.py serve --host 0.0.0.0 --port 8080" ]
Build for Linux (required for cloud deployment):
docker buildx build --platform linux/amd64 -t vision-agent .
Only use the GPU Dockerfile if running local models (Roboflow, local VLMs). Most voice agents use cloud APIs and don’t need GPUs. Make sure CUDA drivers are installed and the base image matches your CUDA version.
Environment Variables
Create a .env file with your API keys:
STREAM_API_KEY = your_key
STREAM_API_SECRET = your_secret
DEEPGRAM_API_KEY = your_key
ELEVENLABS_API_KEY = your_key
GOOGLE_API_KEY = your_key
For Kubernetes, create a secret:
kubectl create secret generic vision-agent-env --from-env-file=.env
Next Steps
Built-in HTTP Server API endpoints, session limits, and authentication
Horizontal Scaling Scale across multiple servers with Redis
Kubernetes Deployment Helm chart, Prometheus, and Grafana
Telemetry & Metrics OpenTelemetry, Prometheus, and Jaeger setup