NVIDIA

NVIDIA provides powerful vision language models through their NIM platform. The plugin enables real-time video understanding using models like Cosmos Reason2 with automatic frame buffering and NVCF asset management.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add "vision-agents[nvidia]"

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import nvidia, getstream, deepgram, elevenlabs

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="Analyze the video and answer questions.",
    llm=nvidia.VLM(
        model="nvidia/cosmos-reason2-8b",
        fps=1,
        frame_buffer_seconds=10,
    ),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
)

Set NVIDIA_API_KEY in your environment or pass api_key directly.

Parameters

Name	Type	Default	Description
`model`	`str`	`"nvidia/cosmos-reason2-8b"`	NVIDIA model ID
`fps`	`int`	`1`	Video frames per second to buffer
`frame_buffer_seconds`	`int`	`10`	Seconds of video to buffer
`frame_width`	`int`	`800`	Frame width
`frame_height`	`int`	`600`	Frame height
`api_key`	`str`	`None`	API key (defaults to `NVIDIA_API_KEY` env var)

Overview

AI Providers

Custom Integrations

Installation

Quick Start

Parameters

Next Steps

Build a Voice Agent

Build a Video Agent

Overview

AI Providers

Custom Integrations

​Installation

​Quick Start

​Parameters

​Next Steps

Build a Voice Agent

Build a Video Agent

Installation

Quick Start

Parameters

Next Steps