Skip to main content
Many AI models are stateless, meaning they don’t retain previous messages when a user makes a request. This can lead to conversations that feel unnatural since the model can’t remember details from earlier in the conversation. To solve this, Vision Agents provides a complete chat and memory system that manages conversations between users and AI agents. This system supports both in-memory storage for development/testing and persistent storage through external providers like Stream Chat.

Building with Persistent Conversations

For production applications, the Agent uses StreamConversation automatically. This uses our Chat API under the hood, which calls an ephemeral endpoint that collects the events from the LLM responses before persisting them. This ensures that the responses from the LLM are still streamed in real time to the user, allowing for smooth UI updates while not affecting performance/rate limits by writing to the database. As the default strategy, no additional setup is required for conversations and Stream. The following example will automatically create a chat channel under the hood linked to the call ID and persist the conversation as both the user and the bot speak. No additional accounts or API keys required.
async def start_agent() -> None:
    llm = gemini.Realtime()

    # create an agent to run with Stream's edge, Gemini llm
    agent = agents.Agent(
        edge=getstream.Edge(),  # low latency edge. clients for React, iOS, Android, RN, Flutter etc.
        agent_user=User(name="My persistent AI friend", id="agent"),  # the user object for the agent (name, image etc)
        instructions="You're a conversational AI assistant with persistent memory. Keep responses short and conversational. Don't use special characters or formatting. Be friendly and helpful. Remember details from our conversation across sessions.",
        llm=llm,
    )

    await agent.create_user()

    # Create a call
    call = agent.edge.client.video.call("default", str(uuid4()))

    # Open the demo UI
    agent.edge.open_demo(call)

    # Have the agent join the call/room
    with await agent.join(call):
        # The conversation is now active and will:
        # 1. Store user messages from speech-to-text
        # 2. Store agent responses from the LLM
        # 3. Persist all messages to Stream's chat API
        # 4. Maintain conversation context across sessions

        await agent.llm.simple_response("Hello! I can remember our conversation even after I restart. What's your name?")
        # run till the call ends
        await agent.finish()

Building with In-Memory Conversations

For development and testing, you can use InMemoryConversation to store messages locally. This approach is perfect for prototyping and doesn’t require any external services. Let’s build a simple example using in-memory conversations. First, we’ll need to install the required dependencies:
uv add vision-agents vision-agents[getstream] vision-agents[gemini]
Next, in our main.py file, we can start by importing the packages required for our project:
import asyncio
import logging
from uuid import uuid4

from dotenv import load_dotenv

from vision_agents.core.edge.types import User
from vision_agents.plugins import getstream, gemini
from vision_agents.core.agents.conversation import InMemoryConversation
from vision_agents.core import agents, cli

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

load_dotenv()
This sets up some basic logging and loads in the .env variables required for our sample. Since we are running the Gemini model in this example, you will need to have the following in your .env:
# Stream API credentials
STREAM_API_KEY=
STREAM_API_SECRET=

# Gemini
GOOGLE_API_KEY=
Both Stream and Google offer free API keys. For Gemini, developers can get a free API key on Google’s AI Studio while Stream developers can get their API key on the Stream Dashboard
Next, we can define our start_agent function where most of our code will live. In this method, we can setup the Agent with conversation support:
async def start_agent() -> None:
    llm = gemini.LLM()
    llm._conversation = InMemoryConversation("Be friendly", [])

    # create an agent to run with Stream's edge, Gemini llm
    agent = agents.Agent(
        edge=getstream.Edge(),  # low latency edge. clients for React, iOS, Android, RN, Flutter etc.
        agent_user=User(name="My memory-enabled AI friend", id="agent"),  # the user object for the agent (name, image etc)
        instructions="You're a conversational AI assistant with memory. Keep responses short and conversational. Don't use special characters or formatting. Be friendly and helpful. Remember details from our conversation.",
        processors=[],  # processors can fetch extra data, check images/audio data or transform video
        llm=llm,
    )

    await agent.create_user()

    # Create a call
    call = agent.edge.client.video.call("default", str(uuid4()))

    # Open the demo UI
    agent.edge.open_demo(call)

    # Have the agent join the call/room
    with await agent.join(call):
        await agent.llm.simple_response("Hello! I can remember our conversation. What's your name?")
        # run till the call ends
        await agent.finish()


if __name__ == "__main__":
    asyncio.run(cli.start_dispatcher(start_agent))
To run our example, we can call uv run main.py which kicks off the agent with conversation memory and automatically opens the Stream Video demo app as the UI.

Advanced Conversation Features

Both conversation types support advanced features like streaming messages and real-time updates. The system automatically handles:
  • Message Structure: Each message includes content, role, user_id, timestamp, and unique ID
  • Streaming Support: Real-time message updates as the LLM generates responses
  • Event Integration: Integration with the framework’s event system
  • Thread Safety: Background workers handle API calls asynchronously
The conversation system works automatically when you use the Agent class. Messages are stored and retrieved transparently, providing context to your LLM for more natural conversations.

Custom Conversation Implementations

If you need custom conversation behavior, you can implement the Conversation abstract base class:
from vision_agents.core.conversation import Conversation, Message

class CustomConversation(Conversation):
    def add_message(self, message: Message, completed: bool = True):
        """Add a message to your custom storage."""
        # Your custom logic here
        pass

    def update_message(self, message_id: str, input_text: str, user_id: str,
                      replace_content: bool, completed: bool):
        """Update an existing message in your custom storage."""
        # Your custom logic here
        pass
I