> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Local transport

The `local` plugin replaces the cloud edge with your machine's microphone, speakers, and camera. Useful for local development, desktop apps, and demos where you don't want to round-trip through a real-time transport.

<Info>
  No Stream account is required for the local edge — but you'll still need API
  keys for whichever LLM / STT / TTS plugins you use.
</Info>

## Installation

```sh theme={null}
uv add "vision-agents[local]"
```

The plugin uses [sounddevice](https://python-sounddevice.readthedocs.io/) for audio I/O and [PyAV](https://pyav.basswood-io.com/) for video. On some Linux systems you may need to install `portaudio` separately.

## Quick Start

```python theme={null}
from vision_agents.core import Agent, User
from vision_agents.plugins import deepgram, gemini
from vision_agents.plugins.local import LocalEdge
from vision_agents.plugins.local.devices import (
    select_audio_input_device,
    select_audio_output_device,
    select_video_device,
)

input_device = select_audio_input_device()
output_device = select_audio_output_device()
video_device = select_video_device()

agent = Agent(
    edge=LocalEdge(
        audio_input=input_device,
        audio_output=output_device,
        video_input=video_device,
    ),
    agent_user=User(name="Local AI", id="local-agent"),
    instructions="Keep responses short and conversational.",
    llm=gemini.LLM("gemini-3-flash-preview"),
    tts=deepgram.TTS(),
    stt=deepgram.STT(),
)
```

The `select_*` helpers prompt interactively in the terminal. For headless use, instantiate `AudioInputDevice`, `AudioOutputDevice`, and `CameraDevice` directly with a known device index.

## Parameters

| Name           | Type                | Default | Description                                             |
| -------------- | ------------------- | ------- | ------------------------------------------------------- |
| `audio_input`  | `AudioInputDevice`  | —       | Microphone for capturing user audio.                    |
| `audio_output` | `AudioOutputDevice` | —       | Speaker for playing agent audio.                        |
| `video_input`  | `CameraDevice`      | `None`  | Camera for capturing user video. `None` disables video. |
| `video_width`  | `int`               | `640`   | Output video width in pixels.                           |
| `video_height` | `int`               | `480`   | Output video height in pixels.                          |
| `video_fps`    | `int`               | `30`    | Output video frame rate.                                |

When `video_input` is set, agent video is rendered locally in a tkinter window. Subclass the device classes (`AudioInputDevice`, `AudioOutputDevice`, `CameraDevice`) to swap in alternative backends (e.g. GStreamer).

## Next Steps

<CardGroup cols={2}>
  <Card title="Build a Voice Agent" icon="microphone" href="/introduction/voice-agents">
    Get started with voice
  </Card>

  <Card title="Build a Video Agent" icon="video" href="/introduction/video-agents">
    Add video processing
  </Card>
</CardGroup>
