> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Moondream

[Moondream](https://moondream.ai/) provides zero-shot object detection, visual question answering, and image captioning. Detect any object by describing it in natural language without training. Available as cloud-hosted API or local on-device.

<Info>
  Vision Agents requires a [Stream](https://getstream.io/try-for-free/) account
  for real-time transport. Most providers offer free tiers to get started.
</Info>

## Installation

```sh theme={null}
uv add "vision-agents[moondream]"
```

## Detection (Cloud)

```python theme={null}
from vision_agents.core import Agent, User
from vision_agents.plugins import moondream, gemini, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a vision assistant.",
    llm=gemini.Realtime(fps=10),
    processors=[
        moondream.CloudDetectionProcessor(
            detect_objects=["person", "car", "dog"],
            conf_threshold=0.3,
        )
    ],
)
```

<Warning>
  Set `MOONDREAM_API_KEY` in your environment or pass `api_key` directly.
</Warning>

| Name             | Type                 | Default    | Description                   |
| ---------------- | -------------------- | ---------- | ----------------------------- |
| `detect_objects` | `str` or `List[str]` | `"person"` | Objects to detect (zero-shot) |
| `conf_threshold` | `float`              | `0.3`      | Confidence threshold          |
| `fps`            | `int`                | `30`       | Frame processing rate         |

## Detection (Local)

Runs on-device without API calls. Requires `HF_TOKEN` for model access.

```python theme={null}
processor = moondream.LocalDetectionProcessor(
    detect_objects=["person", "car"],
    device="cuda",
)
```

## VLM (Cloud)

Visual question answering or automatic captioning.

```python theme={null}
from vision_agents.plugins import moondream, deepgram, elevenlabs

llm = moondream.CloudVLM(mode="vqa")  # or "caption"

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    llm=llm,
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
)
```

| Name   | Type  | Default | Description                   |
| ------ | ----- | ------- | ----------------------------- |
| `mode` | `str` | `"vqa"` | Mode (`"vqa"` or `"caption"`) |

## VLM (Local)

```python theme={null}
llm = moondream.LocalVLM(mode="vqa", force_cpu=False)
```

## Cloud vs Local

|              | Cloud                                                 | Local                                      |
| ------------ | ----------------------------------------------------- | ------------------------------------------ |
| **Use when** | Simple setup, no infrastructure management            | Higher throughput, own GPU infrastructure  |
| **Pros**     | No model download, no GPU required, automatic updates | No rate limits, no API costs, full control |
| **Cons**     | Requires API key, 2 RPS rate limit (can be increased) | Requires GPU for best performance          |

<Note>
  Local models require `HF_TOKEN` for HuggingFace authentication. CUDA
  recommended for best performance.
</Note>

## Next Steps

<CardGroup cols={2}>
  <Card title="Build a Voice Agent" icon="microphone" href="/introduction/voice-agents">
    Get started with voice
  </Card>

  <Card title="Build a Video Agent" icon="video" href="/introduction/video-agents">
    Add video processing
  </Card>
</CardGroup>