Skip to main content

View Example on GitHub

Both outbound_phone_example.py and inbound_phone_and_rag_example.py live in this folder
Build inbound and outbound phone agents with Twilio Media Streams, Stream edge transport, and Gemini. This tutorial covers phone plumbing only — no RAG. For knowledge retrieval on calls, continue to Phone Support Agent (RAG).
Vision Agents requires a Stream account for real-time transport.

What You Will Build

  • Make an outbound call programmatically (e.g. call your cell to test audio)
  • Answer inbound calls on your Twilio number with a voice AI agent
  • Handle bidirectional audio via Twilio Media Streams over WebSocket
  • Bridge phone audio into a Stream call with attach_phone_to_call
For optimal latency, deploy in US-east. Local development adds round-trip latency through ngrok and your machine.

Prerequisites

Create a .env file at the Vision Agents repo root:
STREAM_API_KEY=
STREAM_API_SECRET=
GOOGLE_API_KEY=
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=
You also need ngrok and a Twilio phone number. See the Twilio integration for webhook and API details.

Run the example

Clone and install

Clone the repo and install dependencies from the root:
git clone git@github.com:GetStream/Vision-Agents.git
cd Vision-Agents
uv sync

Start ngrok

Expose port 8000 so Twilio can reach your local server:
ngrok http 8000
Copy the HTTPS hostname (without https://) — you’ll use it as NGROK_URL.

Configure your Twilio number

In the Twilio Console:
  1. Go to Phone Numbers → Manage → Active numbers
  2. Select your number (or buy one)
  3. Under Voice Configuration, set A call comes in to Webhook
  4. Enter https://<NGROK_URL>/twilio/voice with method HTTP POST
See Twilio integration for how the webhook handler works.

Make an outbound call

In a new terminal, from the example directory:
cd examples/03_phone_and_rag_example
NGROK_URL=your-subdomain.ngrok-free.app uv run outbound_phone_example.py \
  --from +15551234567 \
  --to +15557654321
Replace with your Twilio number (--from) and a destination you can answer (--to, often your cell). This starts the HTTP server and initiates the outbound call.

Run the inbound agent

With ngrok and your Twilio webhook still configured, start the inbound server:
cd examples/03_phone_and_rag_example
NGROK_URL=your-subdomain.ngrok-free.app uv run inbound_phone_and_rag_example.py
RAG is optional at this stage — the agent runs with Gemini even without extra RAG configuration.

Call your number

Dial your Twilio number from any phone. You should hear the AI agent answer and respond in real time.

How it works

Twilio uses TwiML to control calls. The voice webhook returns a <Connect><Stream> response that pipes audio to your WebSocket:
  1. POST /twilio/voice — validates the Twilio signature, registers the call in TwilioCallRegistry, returns TwiML with a media stream URL
  2. WS /twilio/media/{call_id}/{token} — accepts the WebSocket, runs TwilioMediaStream, validates the token
  3. attach_phone_to_call — bridges Twilio mulaw audio ↔ the Stream call where your agent runs
The inbound script uses ProxyHeadersMiddleware so signature validation works when ngrok terminates HTTPS.

Next Steps

Phone Support Agent (RAG)

Add Gemini FileSearch or TurboPuffer knowledge retrieval

Twilio Integration

Plugin API reference and components

Phone Calling

Provider overview and learning path

Telnyx

Alternative telephony provider