import loggingfrom dotenv import load_dotenvfrom vision_agents.core import Agent, AgentLauncher, Runner, Userfrom vision_agents.plugins import deepgram, elevenlabs, gemini, getstreamload_dotenv()logging.basicConfig(level=logging.INFO)async def create_agent(**kwargs) -> Agent: """Factory function that creates and configures an agent.""" agent = Agent( edge=getstream.Edge(), agent_user=User(name="Assistant", id="agent"), instructions="You are a helpful voice assistant.", llm=gemini.LLM("gemini-3.1-flash-lite-preview"), tts=elevenlabs.TTS(), stt=deepgram.STT(eager_turn_detection=True), ) @agent.llm.register_function(description="Get the current weather for a location") async def get_weather(location: str) -> str: return f"The weather in {location} is sunny and 72°F." return agentasync def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None: """Called when the agent should join a call.""" call = await agent.create_call(call_type, call_id) async with agent.join(call): await agent.simple_response("Hello! How can I help you today?") await agent.finish()if __name__ == "__main__": runner = Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)) runner.cli()
For development and testing, use console mode to run a single agent:
uv run <your_agent.py> run
Option
Default
Description
--call-type
default
Call type for the video call
--call-id
auto-generated
Call ID for the video call
--debug
false
Enable debug mode
--log-level
INFO
Set the logging level
--no-demo
false
Disable opening the demo UI
--video-track-override
—
Local video file to play instead of incoming video
--no-splash
false
Disable the splash screen
The splash screen is only shown in interactive terminals. It is automatically suppressed in non-interactive environments such as CI pipelines and Docker containers. Use --no-splash to suppress it explicitly.
call_id values must match the pattern ^[a-z0-9_-]+$ (lowercase alphanumeric, hyphens, and underscores only). Invalid call IDs return HTTP 400.
Close operations (DELETE and POST /close) return HTTP 202 Accepted. The close request is processed asynchronously — the owning node will shut down the session on its next maintenance cycle.Creating a Session:
Use authentication and permission callbacks to secure your agent server and control who can start, view, or close sessions.These callbacks are standard FastAPI dependencies, giving you access to headers, query parameters, and dependency injection.
Each permission callback receives call_id from the URL path and can use standard FastAPI dependencies for authentication:
from fastapi import Header, HTTPExceptionasync def can_start_session( call_id: str, authorization: str = Header(None),) -> bool: """Check if the request is authorized to start a session.""" if not authorization: raise HTTPException(status_code=401, detail="Authorization required") user = await validate_token(authorization) if not user.has_permission("start_session"): raise HTTPException(status_code=403, detail="Permission denied") return Trueasync def can_close_session( call_id: str, authorization: str = Header(None),) -> bool: """Check if the request is authorized to close a session.""" if not authorization: raise HTTPException(status_code=401, detail="Authorization required") user = await validate_token(authorization) if not user.can_access_call(call_id): raise HTTPException(status_code=403, detail="Cannot close this session") return Truerunner = Runner( AgentLauncher(create_agent=create_agent, join_call=join_call), serve_options=ServeOptions( can_start_session=can_start_session, can_close_session=can_close_session, ),)
The Runner exposes its FastAPI instance via runner.fast_api, allowing you to add custom routes, middlewares, and other configurations after initialization.
from fastapi.middleware.gzip import GZipMiddlewarerunner = Runner(AgentLauncher(create_agent=create_agent, join_call=join_call))# Adding a custom endpoint@runner.fast_api.get("/custom")def custom_endpoint(): return {"message": "Custom endpoint"}# Add custom middlewarerunner.fast_api.add_middleware(GZipMiddleware, minimum_size=1000)
For full control over the FastAPI configuration, provide your own instance via ServeOptions:
from fastapi import FastAPIapp = FastAPI( title="My Agent Server", description="Custom agent server with additional features", version="1.0.0",)# Add your own routes before passing to Runner@app.get("/custom")def custom_endpoint(): return {"message": "Custom endpoint"}runner = Runner( AgentLauncher(create_agent=create_agent, join_call=join_call), serve_options=ServeOptions(fast_api=app),)
When providing a custom FastAPI app via ServeOptions(fast_api=app), the Runner will use it as-is without any configuration.It will not register the default endpoints (/calls/{call_id}/sessions/..., /health, /ready, etc.) nor apply CORS settings.
You are responsible for assembling the application yourself.
AgentLauncher provides options to control session lifecycle and resource usage:
Parameter
Type
Default
Description
max_concurrent_sessions
int | None
None
Maximum concurrent sessions across all calls
max_sessions_per_call
int | None
None
Maximum sessions allowed per call_id
max_session_duration_seconds
float | None
None
Maximum duration before session is auto-closed
agent_idle_timeout
float
60.0
Seconds agent stays alone on call before auto-close
maintenance_interval
float
5.0
Interval between maintenance checks for expired sessions
runner = Runner( AgentLauncher( create_agent=create_agent, join_call=join_call, max_concurrent_sessions=10, # Limit total concurrent agents max_sessions_per_call=1, # One agent per call max_session_duration_seconds=3600, # 1 hour max per session agent_idle_timeout=120.0, # Disconnect after 2 min alone ))
max_concurrent_sessions - Prevents resource exhaustion by capping how many agents can run simultaneously. Useful for cost control and server capacity planning.
max_sessions_per_call - Prevents duplicate agents from joining the same call. Set to 1 to ensure only one agent per conversation.
max_session_duration_seconds - Automatically terminates long-running sessions. Protects against runaway sessions that could accumulate costs.
agent_idle_timeout - Cleans up agents when all other participants have left the call. The agent disconnects after being alone for this duration.
The built-in HTTP server is just a thin wrapper around AgentLauncher. If you need a different transport — gRPC, WebSocket, message queue, or a custom protocol — you can use AgentLauncher directly.AgentLauncher is transport-agnostic. It manages agent lifecycle, session limits, and the session registry. You provide the interface layer on top.
from vision_agents.core import AgentLauncherlauncher = AgentLauncher( create_agent=create_agent, join_call=join_call, max_concurrent_sessions=10,)# Start the launcher (warmup + maintenance loop)await launcher.start()# Start a session — returns an AgentSessionsession = await launcher.start_session(call_id="my-call-123", call_type="default")# Query a session from the registry (works across nodes)info = await launcher.get_session_info("my-call-123", session.id)# Request session closure (works across nodes)await launcher.request_close_session("my-call-123", session.id)# Close a session running on this nodeawait launcher.close_session(session.id)# Stop the launcher (closes all sessions + registry)await launcher.stop()
“Local” methods operate on this node’s in-memory session map. “Registry” methods read from or write to shared storage, so they work across nodes when a SessionRegistry is configured.
By default, AgentLauncher tracks sessions in local memory, which works for single-node deployments. To scale horizontally across multiple servers, you can provide a SessionRegistry backed by Redis. This allows any node to query or close sessions running on other nodes, and removes the need for sticky sessions or session affinity.See the Horizontal Scaling guide for setup instructions.