Skip to main content
OpenAI provides industry-leading language models. The plugin supports the Responses API (for GPT-5+) and ChatCompletionsLLM (any OpenAI-compatible API). Requires separate STT/TTS.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.
OpenAI also provides Realtime speech-to-speech and text-to-speech.

Installation

uv add "vision-agents[openai]"

LLM (Responses API)

Uses the Responses API (default for GPT-5+).
from vision_agents.core import Agent, User
from vision_agents.plugins import openai, deepgram, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful assistant.",
    llm=openai.LLM(model="gpt-5.4"),
    stt=deepgram.STT(),
    tts=openai.TTS(),
)
NameTypeDefaultDescription
modelstrModel (e.g., "gpt-5.4")
api_keystrNoneAPI key (defaults to OPENAI_API_KEY env var)
base_urlstrNoneCustom API endpoint

ChatCompletionsLLM

Works with any OpenAI-compatible API (Together AI, Fireworks, DeepSeek, etc.).
from vision_agents.plugins import openai

llm = openai.ChatCompletionsLLM(
    model="deepseek-chat",
    base_url="https://api.deepseek.com",
    api_key="your_api_key"
)

Function Calling

@agent.llm.register_function(description="Get weather for a location")
async def get_weather(location: str) -> dict:
    return {"temperature": "72°F", "condition": "Sunny"}
See the Function Calling guide for details.

Events

The OpenAI plugin emits a low-level event for raw stream data. Most developers should use the core events (LLMResponseCompletedEvent, RealtimeUserSpeechTranscriptionEvent, etc.) instead.
from vision_agents.plugins.openai.events import OpenAIStreamEvent

@agent.events.subscribe
async def on_openai_stream(event: OpenAIStreamEvent):
    # Access raw OpenAI stream data
    print(f"Raw event: {event.event_type}, {event.event_data}")

Next Steps

OpenAI Realtime

Speech-to-speech over WebRTC

OpenAI TTS

Text-to-speech synthesis