HuggingFace Inference is an inference platform that provides access to thousands of models through a unified API. Routes to multiple providers (Together AI, Groq, Cerebras, Replicate, Fireworks) so you can switch backends without changing code. Supports both text LLM and VLM (vision) models.Documentation Index
Fetch the complete documentation index at: https://visionagents.ai/llms.txt
Use this file to discover all available pages before exploring further.
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
Installation
LLM
Text-only language model with streaming and function calling.| Name | Type | Default | Description |
|---|---|---|---|
model | str | — | HuggingFace model ID |
provider | str | None | Provider ("together", "groq", "fastest", "cheapest") |
api_key | str | None | API key (defaults to HF_TOKEN env var) |
VLM
Vision language model with automatic video frame buffering and function calling. Supports models like Qwen2-VL.| Name | Type | Default | Description |
|---|---|---|---|
model | str | — | HuggingFace VLM model ID |
fps | int | 1 | Video frames per second to buffer |
frame_buffer_seconds | int | 10 | Seconds of video to buffer |
provider | str | None | Inference provider |
Next Steps
Build a Voice Agent
Get started with voice
Build a Video Agent
Add video processing

