Skip to main content
HuggingFace Inference provides access to thousands of models through a unified API. The plugin supports multiple inference providers (Together AI, Groq, Cerebras, Replicate, Fireworks) and offers both LLM (text) and VLM (vision) integrations.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[huggingface]

LLM

Text-only language model with streaming and function calling.
from vision_agents.core import Agent, User
from vision_agents.plugins import huggingface, getstream, deepgram

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=huggingface.LLM(
        model="meta-llama/Meta-Llama-3-8B-Instruct",
        provider="fastest"
    ),
    stt=deepgram.STT(),
    tts=deepgram.TTS(),
)
NameTypeDefaultDescription
modelstrHuggingFace model ID
providerstrNoneProvider ("together", "groq", "fastest", "cheapest")
api_keystrNoneAPI key (defaults to HF_TOKEN env var)

VLM

Vision language model with automatic video frame buffering.
from vision_agents.plugins import huggingface, getstream, deepgram

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a visual assistant.",
    llm=huggingface.VLM(
        model="Qwen/Qwen2-VL-7B-Instruct",
        fps=1,
        frame_buffer_seconds=10,
    ),
    stt=deepgram.STT(),
    tts=deepgram.TTS(),
)
NameTypeDefaultDescription
modelstrHuggingFace VLM model ID
fpsint1Video frames per second to buffer
frame_buffer_secondsint10Seconds of video to buffer
providerstrNoneInference provider

Next Steps