> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# HuggingFace Inference

[HuggingFace Inference](https://huggingface.co/docs/inference-providers/en/index) is an inference platform that provides access to thousands of models through a unified API. Routes to multiple providers (Together AI, Groq, Cerebras, Replicate, Fireworks) so you can switch backends without changing code. Supports both text **LLM** and **VLM** (vision) models.

<Info>
  Vision Agents requires a [Stream](https://getstream.io/try-for-free/) account
  for real-time transport. Most providers offer free tiers to get started.
</Info>

<Tip>
  For local on-device inference using open-weight models, see
  [HuggingFace Transformers](/integrations/llm/huggingface-transformers).
</Tip>

## Installation

```sh theme={null}
uv add "vision-agents[huggingface]"
```

## LLM

Text-only language model with streaming and function calling.

```python theme={null}
from vision_agents.core import Agent, User
from vision_agents.plugins import huggingface, getstream, deepgram

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=huggingface.LLM(
        model="meta-llama/Meta-Llama-3-8B-Instruct",
        provider="fastest"
    ),
    stt=deepgram.STT(),
    tts=deepgram.TTS(),
)

@agent.llm.register_function(description="Get weather for a location")
async def get_weather(location: str) -> dict:
    return {"temperature": "72°F", "condition": "Sunny"}
```

| Name       | Type  | Default | Description                                                  |
| ---------- | ----- | ------- | ------------------------------------------------------------ |
| `model`    | `str` | —       | HuggingFace model ID                                         |
| `provider` | `str` | `None`  | Provider (`"together"`, `"groq"`, `"fastest"`, `"cheapest"`) |
| `api_key`  | `str` | `None`  | API key (defaults to `HF_TOKEN` env var)                     |

## VLM

Vision language model with automatic video frame buffering and function calling. Supports models like Qwen2-VL.

```python theme={null}
from vision_agents.core import Agent, User
from vision_agents.plugins import huggingface, getstream, deepgram

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a visual assistant.",
    llm=huggingface.VLM(
        model="Qwen/Qwen2-VL-7B-Instruct",
        fps=1,
        frame_buffer_seconds=10,
    ),
    stt=deepgram.STT(),
    tts=deepgram.TTS(),
)
```

| Name                   | Type  | Default | Description                       |
| ---------------------- | ----- | ------- | --------------------------------- |
| `model`                | `str` | —       | HuggingFace VLM model ID          |
| `fps`                  | `int` | `1`     | Video frames per second to buffer |
| `frame_buffer_seconds` | `int` | `10`    | Seconds of video to buffer        |
| `provider`             | `str` | `None`  | Inference provider                |

## Next Steps

<CardGroup cols={2}>
  <Card title="Build a Voice Agent" icon="microphone" href="/introduction/voice-agents">
    Get started with voice
  </Card>

  <Card title="Build a Video Agent" icon="video" href="/introduction/video-agents">
    Add video processing
  </Card>
</CardGroup>
