> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Built-in HTTP Server

> Run agents as an HTTP server with session management, authentication, and real-time metrics

The `Runner` class provides two modes for running your agents:

* a single-agent console mode for development,
* and an HTTP server mode that spawns agents on demand for production deployments.

<Note>
  For a complete working example, see [08\_agent\_server\_example](https://github.com/GetStream/Vision-Agents/tree/main/examples/08_agent_server_example) in the Vision Agents repository.
</Note>

## Core Components

Running agents as a server requires four components:

1. **`create_agent()`** - A factory function that configures and returns an Agent instance
2. **`join_call()`** - Defines what happens when an agent joins a call
3. **`AgentLauncher`** - Responsible for running and monitoring the agents
4. **`Runner`** - a wrapper on top of `AgentLauncher`, providing CLI commands for console and server modes

## Basic Example

```python theme={null}
import logging
from dotenv import load_dotenv
from vision_agents.core import Agent, AgentLauncher, Runner, User
from vision_agents.plugins import deepgram, elevenlabs, gemini, getstream

load_dotenv()
logging.basicConfig(level=logging.INFO)


async def create_agent(**kwargs) -> Agent:
    """Factory function that creates and configures an agent."""
    agent = Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Assistant", id="agent"),
        instructions="You are a helpful voice assistant.",
        llm=gemini.LLM("gemini-3.1-flash-lite-preview"),
        tts=elevenlabs.TTS(),
        stt=deepgram.STT(eager_turn_detection=True),
    )

    @agent.llm.register_function(description="Get the current weather for a location")
    async def get_weather(location: str) -> str:
        return f"The weather in {location} is sunny and 72°F."

    return agent


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    """Called when the agent should join a call."""
    call = await agent.create_call(call_type, call_id)

    async with agent.join(call):
        await agent.simple_response("Hello! How can I help you today?")
        await agent.finish()


if __name__ == "__main__":
    runner = Runner(AgentLauncher(create_agent=create_agent, join_call=join_call))
    runner.cli()
```

## Running the Server

Start the HTTP server with the `serve` command:

```bash theme={null}
uv run  <your_agent.py> serve
```

The server starts on `http://127.0.0.1:8000` by default.
The interactive API documentation can be found at `http://127.0.0.1:8000/docs` (Swagger UI).

### CLI Options

| Option               | Default     | Description               |
| -------------------- | ----------- | ------------------------- |
| `--host`             | `127.0.0.1` | Server host               |
| `--port`             | `8000`      | Server port               |
| `--agents-log-level` | `INFO`      | Log level for agents      |
| `--http-log-level`   | `INFO`      | Log level for HTTP server |
| `--no-splash`        | `false`     | Disable the splash screen |

```bash theme={null}
uv run agent.py serve --host 0.0.0.0 --port 8000 --agents-log-level DEBUG
```

### Console mode

For development and testing, use console mode to run a single agent:

```bash theme={null}
uv run <your_agent.py> run
```

| Option                   | Default        | Description                                        |
| ------------------------ | -------------- | -------------------------------------------------- |
| `--call-type`            | `default`      | Call type for the video call                       |
| `--call-id`              | auto-generated | Call ID for the video call                         |
| `--debug`                | `false`        | Enable debug mode                                  |
| `--log-level`            | `INFO`         | Set the logging level                              |
| `--no-demo`              | `false`        | Disable opening the demo UI                        |
| `--video-track-override` | —              | Local video file to play instead of incoming video |
| `--no-splash`            | `false`        | Disable the splash screen                          |

<Tip>
  The splash screen is only shown in interactive terminals. It is automatically suppressed in non-interactive environments such as CI pipelines and Docker containers. Use `--no-splash` to suppress it explicitly.
</Tip>

## API Endpoints

The server exposes these endpoints:

| Method | Endpoint                                         | Purpose                             |
| ------ | ------------------------------------------------ | ----------------------------------- |
| POST   | `/calls/{call_id}/sessions`                      | Spawn a new agent for a call        |
| DELETE | `/calls/{call_id}/sessions/{session_id}`         | Request closure of an agent session |
| POST   | `/calls/{call_id}/sessions/{session_id}/close`   | Request closure via sendBeacon      |
| GET    | `/calls/{call_id}/sessions/{session_id}`         | Get session information             |
| GET    | `/calls/{call_id}/sessions/{session_id}/metrics` | Real-time performance metrics       |
| GET    | `/health`                                        | Liveness check                      |
| GET    | `/ready`                                         | Readiness check                     |

Close operations (DELETE and POST `/close`) return **HTTP 202 Accepted**. The close request is processed asynchronously — the owning node will shut down the session on its next maintenance cycle.

**Creating a Session:**

```bash theme={null}
curl -X POST http://127.0.0.1:8000/calls/my-call-123/sessions \
  -H "Content-Type: application/json" \
  -d '{"call_type": "default"}'
```

Response:

```json theme={null}
{
  "session_id": "abc-123",
  "call_id": "my-call-123",
  "session_started_at": "2025-01-15T10:30:00Z"
}
```

**Getting Session Metrics:**

```bash theme={null}
curl http://127.0.0.1:8000/calls/my-call-123/sessions/abc-123/metrics
```

Response:

```json theme={null}
{
  "session_id": "abc-123",
  "call_id": "my-call-123",
  "session_started_at": "2025-01-15T10:30:00Z",
  "metrics_generated_at": "2025-01-15T10:35:00Z",
  "metrics": {
    "llm_latency_ms__avg": 245.5,
    "llm_time_to_first_token_ms__avg": 120.3,
    "llm_input_tokens__total": 1500,
    "llm_output_tokens__total": 800,
    "stt_latency_ms__avg": 85.2,
    "tts_latency_ms__avg": 95.1
  }
}
```

## Configuration with ServeOptions

The HTTP server behavior can be customized using `ServeOptions`:

```python theme={null}
from vision_agents.core import Runner, AgentLauncher, ServeOptions

runner = Runner(
    AgentLauncher(create_agent=create_agent, join_call=join_call),
    serve_options=ServeOptions(
        cors_allow_origins=["https://myapp.com"],
        cors_allow_methods=["GET", "POST", "DELETE"],
        cors_allow_headers=["Authorization"],
        cors_allow_credentials=True,
    ),
)
```

### CORS Options

| Option                   | Description          | Default |
| ------------------------ | -------------------- | ------- |
| `cors_allow_origins`     | Allowed origins      | `["*"]` |
| `cors_allow_methods`     | Allowed HTTP methods | `["*"]` |
| `cors_allow_headers`     | Allowed headers      | `["*"]` |
| `cors_allow_credentials` | Allow credentials    | `True`  |

### Authentication & Permissions

Use authentication and permission callbacks to secure your agent server and control who can start, view, or close sessions.

These callbacks are standard FastAPI dependencies, giving you access to headers, query parameters, and dependency injection.

| Option              | Default   | Description                            |
| ------------------- | --------- | -------------------------------------- |
| `can_start_session` | allow all | Permission check for starting sessions |
| `can_close_session` | allow all | Permission check for closing sessions  |
| `can_view_session`  | allow all | Permission check for viewing sessions  |
| `can_view_metrics`  | allow all | Permission check for viewing metrics   |

#### Permission Callbacks

Each permission callback receives `call_id` from the URL path and can use standard FastAPI dependencies for authentication:

```python theme={null}
from fastapi import Header, HTTPException


async def can_start_session(
    call_id: str,
    authorization: str = Header(None),
) -> bool:
    """Check if the request is authorized to start a session."""
    if not authorization:
        raise HTTPException(status_code=401, detail="Authorization required")
    user = await validate_token(authorization)
    if not user.has_permission("start_session"):
        raise HTTPException(status_code=403, detail="Permission denied")
    return True


async def can_close_session(
    call_id: str,
    authorization: str = Header(None),
) -> bool:
    """Check if the request is authorized to close a session."""
    if not authorization:
        raise HTTPException(status_code=401, detail="Authorization required")
    user = await validate_token(authorization)
    if not user.can_access_call(call_id):
        raise HTTPException(status_code=403, detail="Cannot close this session")
    return True


runner = Runner(
    AgentLauncher(create_agent=create_agent, join_call=join_call),
    serve_options=ServeOptions(
        can_start_session=can_start_session,
        can_close_session=can_close_session,
    ),
)
```

### Customizing the Default FastAPI App

The `Runner` exposes its FastAPI instance via `runner.fast_api`, allowing you to add custom routes, middlewares, and other configurations after initialization.

```python theme={null}
from fastapi.middleware.gzip import GZipMiddleware

runner = Runner(AgentLauncher(create_agent=create_agent, join_call=join_call))

# Adding a custom endpoint
@runner.fast_api.get("/custom")
def custom_endpoint():
    return {"message": "Custom endpoint"}

# Add custom middleware
runner.fast_api.add_middleware(GZipMiddleware, minimum_size=1000)
```

### Using a Custom FastAPI Instance

For full control over the FastAPI configuration, provide your own instance via `ServeOptions`:

```python theme={null}
from fastapi import FastAPI

app = FastAPI(
    title="My Agent Server",
    description="Custom agent server with additional features",
    version="1.0.0",
)

# Add your own routes before passing to Runner
@app.get("/custom")
def custom_endpoint():
    return {"message": "Custom endpoint"}

runner = Runner(
    AgentLauncher(create_agent=create_agent, join_call=join_call),
    serve_options=ServeOptions(fast_api=app),
)
```

<Warning>
  When providing a custom FastAPI app via `ServeOptions(fast_api=app)`, the `Runner` will use it as-is without any configuration.

  It will not register the default endpoints (`/calls/{call_id}/sessions/...`, `/health`, `/ready`, etc.) nor apply CORS settings.
  You are responsible for assembling the application yourself.
</Warning>

## Session Limits & Resource Management

`AgentLauncher` provides options to control session lifecycle and resource usage:

| Parameter                      | Type            | Default | Description                                              |
| ------------------------------ | --------------- | ------- | -------------------------------------------------------- |
| `max_concurrent_sessions`      | `int \| None`   | `None`  | Maximum concurrent sessions across all calls             |
| `max_sessions_per_call`        | `int \| None`   | `None`  | Maximum sessions allowed per call\_id                    |
| `max_session_duration_seconds` | `float \| None` | `None`  | Maximum duration before session is auto-closed           |
| `agent_idle_timeout`           | `float`         | `60.0`  | Seconds agent stays alone on call before auto-close      |
| `maintenance_interval`         | `float`         | `5.0`   | Interval between maintenance checks for expired sessions |

```python theme={null}
runner = Runner(
    AgentLauncher(
        create_agent=create_agent,
        join_call=join_call,
        max_concurrent_sessions=10,       # Limit total concurrent agents
        max_sessions_per_call=1,          # One agent per call
        max_session_duration_seconds=3600, # 1 hour max per session
        agent_idle_timeout=120.0,         # Disconnect after 2 min alone
    )
)
```

* **`max_concurrent_sessions`** - Prevents resource exhaustion by capping how many agents can run simultaneously. Useful for cost control and server capacity planning.
* **`max_sessions_per_call`** - Prevents duplicate agents from joining the same call. Set to `1` to ensure only one agent per conversation.
* **`max_session_duration_seconds`** - Automatically terminates long-running sessions. Protects against runaway sessions that could accumulate costs.
* **`agent_idle_timeout`** - Cleans up agents when all other participants have left the call. The agent disconnects after being alone for this duration.

## Using AgentLauncher Without the HTTP Server

The built-in HTTP server is just a thin wrapper around `AgentLauncher`. If you need a different transport — gRPC, WebSocket, message queue, or a custom protocol — you can use `AgentLauncher` directly.

`AgentLauncher` is transport-agnostic. It manages agent lifecycle, session limits, and the session registry. You provide the interface layer on top.

```python theme={null}
from vision_agents.core import AgentLauncher

launcher = AgentLauncher(
    create_agent=create_agent,
    join_call=join_call,
    max_concurrent_sessions=10,
)

# Start the launcher (warmup + maintenance loop)
await launcher.start()

# Start a session — returns an AgentSession
session = await launcher.start_session(call_id="my-call-123", call_type="default")

# Query a session from the registry (works across nodes)
info = await launcher.get_session_info("my-call-123", session.id)

# Request session closure (works across nodes)
await launcher.request_close_session("my-call-123", session.id)

# Close a session running on this node
await launcher.close_session(session.id)

# Stop the launcher (closes all sessions + registry)
await launcher.stop()
```

### AgentLauncher Methods

| Method                                       | Scope    | Description                                         |
| -------------------------------------------- | -------- | --------------------------------------------------- |
| `start()`                                    | Local    | Initialize launcher, warmup, start maintenance loop |
| `stop()`                                     | Local    | Close all sessions and stop the launcher            |
| `start_session(call_id, call_type)`          | Both     | Create agent, join call, register in store          |
| `close_session(session_id)`                  | Local    | Close a session running on this node                |
| `get_session(session_id)`                    | Local    | Look up a session on this node only                 |
| `get_session_info(call_id, session_id)`      | Registry | Query session info from shared storage              |
| `request_close_session(call_id, session_id)` | Registry | Set a close flag for any node to process            |

<Tip>
  "Local" methods operate on this node's in-memory session map. "Registry" methods read from or write to shared storage, so they work across nodes when a `SessionRegistry` is configured.
</Tip>

### gRPC Example

Here's a sketch of how you might wrap `AgentLauncher` with a gRPC service:

```python theme={null}
import grpc
from vision_agents.core import AgentLauncher

class AgentService(agent_pb2_grpc.AgentServiceServicer):
    def __init__(self, launcher: AgentLauncher):
        self.launcher = launcher

    async def StartSession(self, request, context):
        session = await self.launcher.start_session(
            call_id=request.call_id,
            call_type=request.call_type,
        )
        return agent_pb2.StartSessionResponse(session_id=session.id)

    async def CloseSession(self, request, context):
        await self.launcher.request_close_session(
            call_id=request.call_id,
            session_id=request.session_id,
        )
        return agent_pb2.CloseSessionResponse()

# Start the launcher, then serve
launcher = AgentLauncher(create_agent=create_agent, join_call=join_call)
await launcher.start()

server = grpc.aio.server()
agent_pb2_grpc.add_AgentServiceServicer_to_server(AgentService(launcher), server)
server.add_insecure_port("[::]:50051")
await server.start()
await server.wait_for_termination()
```

## Scaling to Multiple Nodes

By default, `AgentLauncher` tracks sessions in local memory, which works for single-node deployments. To scale horizontally across multiple servers, you can provide a `SessionRegistry` backed by Redis. This allows any node to query or close sessions running on other nodes, and removes the need for sticky sessions or session affinity.

See the [Horizontal Scaling](/guides/horizontal-scaling) guide for setup instructions.

## Next Steps

<CardGroup cols={2}>
  <Card title="Horizontal Scaling" icon="circle-nodes" href="/guides/horizontal-scaling">
    Scale across multiple servers with Redis
  </Card>

  <Card title="Docker Deployment" icon="docker" href="/guides/deployment">
    Docker, Kubernetes, and scaling
  </Card>

  <Card title="Agent Server Example" icon="github" href="https://github.com/GetStream/vision-agents/tree/main/examples/08_agent_server_example">
    Complete working implementation
  </Card>
</CardGroup>
