- HuggingFace LLM - Text-only language model integration with streaming responses, function calling, and multi-provider support.
- HuggingFace VLM - Vision language model integration with automatic video frame buffering for real-time video understanding.
Installation
Install the Stream HuggingFace plugin with:Configuration
Set your HuggingFace API token:HuggingFace LLM
The HuggingFace LLM plugin provides text-only language model integration with streaming responses and function calling support.Usage
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
model | str | - | The HuggingFace model ID to use (e.g., "meta-llama/Meta-Llama-3-8B-Instruct"). |
api_key | Optional[str] | None | HuggingFace API token. If not provided, reads from HF_TOKEN environment variable. |
provider | Optional[str] | None | Inference provider (e.g., "together", "groq", "fastest", "cheapest"). Auto-selects based on your HuggingFace settings if omitted. |
client | Optional[AsyncInferenceClient] | None | Custom AsyncInferenceClient instance for dependency injection. |
Methods
simple_response(text, processors, participant)
Generate a response to text input:create_response(messages, input, stream)
Create a response with full control over the request:Function calling
You can register functions that the model can call:Supported providers
HuggingFace’s Inference Providers API supports multiple backends. You can specify a provider explicitly or let HuggingFace auto-select based on your account preferences:- Together AI
- Groq
- Cerebras
- Replicate
- Fireworks
HuggingFace VLM
The HuggingFace VLM plugin provides vision language model integration with automatic video frame buffering for real-time video understanding.Usage
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
model | str | - | The HuggingFace model ID to use (e.g., "Qwen/Qwen2-VL-7B-Instruct"). |
api_key | Optional[str] | None | HuggingFace API token. If not provided, reads from HF_TOKEN environment variable. |
provider | Optional[str] | None | Inference provider. Auto-selects based on your HuggingFace settings if omitted. |
fps | int | 1 | Number of video frames per second to buffer. |
frame_buffer_seconds | int | 10 | Number of seconds of video to buffer for the model’s input. |
client | Optional[AsyncInferenceClient] | None | Custom AsyncInferenceClient instance for dependency injection. |

