Memory and Chat - Vision Agents

Many AI models are stateless, meaning they don’t retain previous messages when a user makes a request. This can lead to conversations that feel unnatural since the model can’t remember details from earlier in the conversation. To solve this, Vision Agents provides a complete chat and memory system that manages conversations between users and AI agents. This system supports both in-memory storage for development/testing and persistent storage through external providers like Stream Chat.

Building with Persistent Conversations

For production applications, the Agent uses StreamConversation automatically. This uses our Chat API under the hood, which calls an ephemeral endpoint that collects the events from the LLM responses before persisting them. This ensures that the responses from the LLM are still streamed in real time to the user, allowing for smooth UI updates while not affecting performance/rate limits by writing to the database. As the default strategy, no additional setup is required for conversations and Stream. The following example will automatically create a chat channel under the hood linked to the call ID and persist the conversation as both the user and the bot speak. No additional accounts or API keys required.

from vision_agents.core import User, Agent, cli
from vision_agents.core.agents import AgentLauncher

async def create_agent(**kwargs) -> Agent:
    """Create the agent with Gemini Realtime."""
    llm = gemini.Realtime()

    # create an agent to run with Stream's edge, Gemini llm
    agent = Agent(
        edge=getstream.Edge(),  # low latency edge. clients for React, iOS, Android, RN, Flutter etc.
        agent_user=User(name="My persistent AI friend", id="agent"),  # the user object for the agent (name, image etc)
        instructions="You're a conversational AI assistant with persistent memory. Keep responses short and conversational. Don't use special characters or formatting. Be friendly and helpful. Remember details from our conversation across sessions.",
        llm=llm,
    )
    return agent


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    """Join the call and start the agent."""
    # Ensure the agent user is created
    await agent.create_user()
    # Create a call
    call = await agent.create_call(call_type, call_id)

    # Have the agent join the call/room
    with await agent.join(call):
        await agent.edge.open_demo(call)
        
        # The conversation is now active and will:
        # 1. Store user messages from speech-to-text
        # 2. Store agent responses from the LLM
        # 3. Persist all messages to Stream's chat API
        # 4. Maintain conversation context across sessions

        await agent.llm.simple_response("Hello! I can remember our conversation even after I restart. What's your name?")
        
        # run till the call ends
        await agent.finish()


if __name__ == "__main__":
    cli(AgentLauncher(create_agent=create_agent, join_call=join_call))

Building with In-Memory Conversations

For development and testing, you can use InMemoryConversation to store messages locally. This approach is perfect for prototyping and doesn’t require any external services. Let’s build a simple example using in-memory conversations. First, we’ll need to install the required dependencies:

uv add vision-agents vision-agents[getstream] vision-agents[gemini]

Next, in our main.py file, we can start by importing the packages required for our project:

import logging

from dotenv import load_dotenv

from vision_agents.core import User, Agent, cli
from vision_agents.core.agents import AgentLauncher
from vision_agents.core.agents.conversation import InMemoryConversation
from vision_agents.plugins import getstream, gemini

logger = logging.getLogger(__name__)

load_dotenv()

This sets up some basic logging and loads in the .env variables required for our sample. Since we are running the Gemini model in this example, you will need to have the following in your .env:

# Stream API credentials
STREAM_API_KEY=
STREAM_API_SECRET=

# Gemini
GOOGLE_API_KEY=

Both Stream and Google offer free API keys. For Gemini, developers can get a free API key on Google’s AI Studio while Stream developers can get their API key on the Stream Dashboard

Next, we can define our create_agent and join_call functions to set up the Agent with conversation support:

async def create_agent(**kwargs) -> Agent:
    """Create the agent with Gemini LLM and in-memory conversation."""
    llm = gemini.LLM()
    llm._conversation = InMemoryConversation("Be friendly", [])

    # create an agent to run with Stream's edge, Gemini llm
    agent = Agent(
        edge=getstream.Edge(),  # low latency edge. clients for React, iOS, Android, RN, Flutter etc.
        agent_user=User(name="My memory-enabled AI friend", id="agent"),  # the user object for the agent (name, image etc)
        instructions="You're a conversational AI assistant with memory. Keep responses short and conversational. Don't use special characters or formatting. Be friendly and helpful. Remember details from our conversation.",
        processors=[],  # processors can fetch extra data, check images/audio data or transform video
        llm=llm,
    )
    return agent


async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
    """Join the call and start the agent."""
    # Ensure the agent user is created
    await agent.create_user()
    # Create a call
    call = await agent.create_call(call_type, call_id)

    # Have the agent join the call/room
    with await agent.join(call):
        await agent.edge.open_demo(call) # Open a demo UI to quickly test the agent 
        
        await agent.llm.simple_response("Hello! I can remember our conversation. What's your name?")
        
        # run till the call ends
        await agent.finish()


if __name__ == "__main__":
    cli(AgentLauncher(create_agent=create_agent, join_call=join_call)) 

To run our example, we can call uv run main.py which kicks off the agent with conversation memory and automatically opens the Stream Video demo app as the UI.

Advanced Conversation Features

Both conversation types support advanced features like streaming messages and real-time updates. The system automatically handles:

Message Structure: Each message includes content, role, user_id, timestamp, and unique ID
Streaming Support: Real-time message updates as the LLM generates responses
Event Integration: Integration with the framework’s event system
Thread Safety: Background workers handle API calls asynchronously

The conversation system works automatically when you use the Agent class. Messages are stored and retrieved transparently, providing context to your LLM for more natural conversations.

Custom Conversation Implementations

If you need custom conversation behavior, you can implement the Conversation abstract base class:

from vision_agents.core.conversation import Conversation, Message

class CustomConversation(Conversation):
    def add_message(self, message: Message, completed: bool = True):
        """Add a message to your custom storage."""
        # Your custom logic here
        pass

    def update_message(self, message_id: str, input_text: str, user_id: str,
                      replace_content: bool, completed: bool):
        """Update an existing message in your custom storage."""
        # Your custom logic here
        pass

How-to Guides

​Building with Persistent Conversations

​Building with In-Memory Conversations

​Advanced Conversation Features

​Custom Conversation Implementations

Building with Persistent Conversations

Building with In-Memory Conversations

Advanced Conversation Features

Custom Conversation Implementations