Skip to main content
xAI’s Grok is a powerful language model that provides advanced reasoning capabilities and real-time knowledge. The xAI plugin for Vision Agents enables you to use Grok models in your conversational AI applications with full support for function calling and tool use. The xAI plugin provides native integration with xAI’s chat completion API, including conversation memory management, streaming responses, and multimodal support.

Installation

Install the xAI plugin with
uv add vision-agents[xai]

Example

from vision_agents.plugins import xai
from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, deepgram, elevenlabs

# Create agent with Grok
agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="AI Assistant", id="agent"),
    instructions="You are a helpful assistant with access to real-time information.",
    llm=xai.LLM(model="grok-4.1"),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS()
)

Initialization

The xAI plugin exists in the form of the LLM class:
from vision_agents.plugins import xai

llm = xai.LLM(
    model="grok-4.1",
    api_key="your_xai_api_key"  # or set XAI_API_KEY environment variable
)
To initialize without passing in the API key, make sure the XAI_API_KEY is available as an environment variable. You can do this either by defining it in a .env file or exporting it directly in your terminal.

Parameters

These are the parameters available in the xAI LLM plugin:
NameTypeDefaultDescription
modelstr"grok-4"The xAI model to use. Options include "grok-4", "grok-4.1", and other available Grok models.
api_keystr or NoneNoneYour xAI API key. If not provided, the plugin will look for the XAI_API_KEY environment variable.
clientAsyncClient or NoneNoneOptional pre-configured xAI AsyncClient instance. If provided, uses this instead of creating a new one.

Features

Conversation Memory

The xAI plugin automatically manages conversation history, allowing for natural multi-turn conversations:
llm = xai.LLM(model="grok-4.1")

# First message
await llm.simple_response("My name is Alice and I have 2 cats")

# Second message - the LLM remembers the context
response = await llm.simple_response("How many pets do I have?")
print(response.text)  # Will mention the 2 cats

Streaming Responses

The plugin supports streaming responses for real-time text generation:
response = await llm.create_response(
    input="Tell me about quantum computing",
    instructions="You are a helpful science educator.",
    stream=True
)

Function Calling (Grok 4.1+)

Grok 4.1 and later models support function calling, allowing the model to use tools and take actions:
from vision_agents.plugins import xai

llm = xai.LLM(model="grok-4.1")

@llm.register_function(
    description="Get the current weather for a location"
)
async def get_weather(location: str) -> str:
    # Your weather API logic here
    return f"The weather in {location} is sunny and 72°F"

# The model can now call this function when appropriate
response = await llm.simple_response("What's the weather in San Francisco?")
Function calling is supported in Grok 4.1 and later models. Earlier versions of Grok do not support this feature.

Methods

simple_response(text, processors, participant)

Generate a response to text input. This method is called automatically when new STT transcripts are received.
response = await llm.simple_response("Hello, how are you?")
print(response.text)
Parameters:
  • text (str): Input text to respond to
  • processors (list[Processor] | None): Optional list of processors for video/voice AI context
  • participant (Participant | None): Optional participant object
Returns: LLMResponseEvent with the generated text

create_response(input, instructions, model, stream)

Create a response with full control over parameters:
response = await llm.create_response(
    input="Explain machine learning",
    instructions="You are a technical educator. Be concise and clear.",
    model="grok-4.1",
    stream=True
)
Parameters:
  • input (str): Input text
  • instructions (str): System instructions for the model
  • model (str | None): Override the default model
  • stream (bool): Whether to stream the response (default: True)
Returns: LLMResponseEvent with the generated text

Events

The xAI plugin emits standard Vision Agents LLM events:
from vision_agents.core.llm.events import (
    LLMResponseChunkEvent,
    LLMResponseCompletedEvent
)

@llm.events.on(LLMResponseChunkEvent)
async def on_chunk(event: LLMResponseChunkEvent):
    print(f"Chunk: {event.delta}")

@llm.events.on(LLMResponseCompletedEvent)
async def on_completed(event: LLMResponseCompletedEvent):
    print(f"Complete response: {event.text}")

Usage with Agent

Use the xAI LLM as part of a complete voice or video agent:
from vision_agents.core import Agent, User
from vision_agents.plugins import xai, getstream, deepgram, elevenlabs

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Grok Assistant", id="agent"),
    instructions="You are a helpful AI assistant powered by Grok.",
    llm=xai.LLM(model="grok-4.1"),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS()
)

# Join a call
call = client.video.call("default", call_id)
await call.get_or_create(data={"created_by_id": agent.agent_user.id})

with await agent.join(call):
    await agent.say("Hello! I'm powered by Grok. How can I help you today?")
    await agent.finish()

Model Options

xAI offers several Grok models with different capabilities:
  • grok-4: The base Grok 4 model with strong reasoning capabilities
  • grok-4.1: Enhanced version with function calling support and improved performance
  • Check xAI’s documentation for the latest available models

Getting Started

  1. Get your xAI API key from the xAI Console
  2. Set the XAI_API_KEY environment variable:
    export XAI_API_KEY="your_api_key_here"
    
  3. Use the plugin in your Vision Agents application