> ## Documentation Index > Fetch the complete documentation index at: https://visionagents.ai/llms.txt > Use this file to discover all available pages before exploring further. # Quickstart > Build and run your first AI voice agent in under 5 minutes You'll build a real-time voice agent you can talk to in your browser, using [Gemini Realtime](https://ai.google.dev/gemini-api/docs/live) on [Stream's](https://getstream.io/video/) edge network. About 18 lines of Python. {`Create a Python project for a Vision Agents voice assistant using uv and Python 3.12. Steps: 1. Initialize: uv init --python 3.12 my-agent && cd my-agent && uv add "vision-agents[getstream,gemini]" python-dotenv 2. Create .env with: STREAM_API_KEY, STREAM_API_SECRET (from getstream.io), GOOGLE_API_KEY (from aistudio.google.com) 3. Create main.py: from dotenv import load_dotenv from vision_agents.core import Agent, AgentLauncher, User, Runner from vision_agents.plugins import getstream, gemini load_dotenv() async def create_agent(**kwargs) -> Agent: return Agent( edge=getstream.Edge(), agent_user=User(name="Assistant", id="agent"), instructions="You're a helpful voice assistant. Be concise.", llm=gemini.Realtime(), ) async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None: call = await agent.create_call(call_type, call_id) async with agent.join(call): await agent.simple_response("Greet the user") await agent.finish() if __name__ == "__main__": Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli() 4. Run with: uv run main.py run Reference docs: https://visionagents.ai MCP server: https://visionagents.ai/mcp Skill.md: https://visionagents.ai/skill.md`} ## Build your agent Create a project directory and install Vision Agents with the `getstream` and `gemini` plugins. If you don't have **[uv](https://docs.astral.sh/uv/getting-started/installation/)** yet, install it first. ```bash theme={null} uv init --python 3.12 my-agent && cd my-agent uv add "vision-agents[getstream,gemini]" python-dotenv ``` Get the keys you'll need: * Create a free **[Stream account](https://getstream.io/try-for-free/)** for `STREAM_API_KEY` and `STREAM_API_SECRET`. * Get an API key from **[Google AI Studio](https://aistudio.google.com/)** for `GOOGLE_API_KEY`. Then create a `.env` file in the project root. Vision Agents auto-loads these for each plugin. ```bash .env theme={null} STREAM_API_KEY=your_stream_api_key STREAM_API_SECRET=your_stream_api_secret GOOGLE_API_KEY=your_google_api_key ``` Create `main.py`. The agent joins a Stream call and responds via Gemini Realtime. ```python main.py theme={null} from dotenv import load_dotenv from vision_agents.core import Agent, AgentLauncher, User, Runner from vision_agents.plugins import getstream, gemini load_dotenv() async def create_agent(**kwargs) -> Agent: return Agent( edge=getstream.Edge(), agent_user=User(name="Assistant", id="agent"), instructions="You're a helpful voice assistant. Be concise.", llm=gemini.Realtime(), ) async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None: call = await agent.create_call(call_type, call_id) async with agent.join(call): await agent.simple_response("Greet the user") await agent.finish() if __name__ == "__main__": Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli() ``` Start the agent. The CLI prints a join link, open it to talk to your agent in the browser. ```bash theme={null} uv run main.py run ``` The agent greets you as soon as you join the call. ## Next steps Custom STT/LLM/TTS pipelines, function calling, provider options VLMs, YOLO processors, real-time video analysis Docker, Kubernetes, and monitoring 25+ AI providers to mix and match