Skip to main content
Twilio is a programmable telephony provider. The twilio plugin bridges PSTN phone calls into a Vision Agents + Stream call through Twilio voice webhooks, TwiML Media Streams, and bidirectional WebSocket audio.
Vision Agents requires a Stream account for real-time transport.

What the plugin provides

  • Twilio Media Streams over WebSocket with bidirectional audio
  • TwilioCallRegistry for call/session/token tracking
  • attach_phone_to_call to bridge Twilio mulaw audio ↔ Stream WebRTC
  • Built-in FastAPI helpers: verify_twilio_signature, CallWebhookInput, TwiML response builders
  • Automatic mulaw/PCM conversion at 8 kHz

Prerequisites

RequirementNotes
Twilio accountTWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN
Twilio phone number (E.164)Used as caller ID and inbound number
Webhook public URLngrok for local dev → https://<NGROK_URL>/twilio/voice
Stream credentialsSTREAM_API_KEY, STREAM_API_SECRET

Installation

uv add "vision-agents[twilio]"
Or install the plugin package directly:
uv add vision-agents-plugins-twilio

Environment Variables

VariableRequiredDescription
TWILIO_ACCOUNT_SIDYesTwilio account SID
TWILIO_AUTH_TOKENYesAuth token for REST API and webhook signature verification
NGROK_URLLocal devPublic hostname without https://
STREAM_API_KEY / STREAM_API_SECRETYesStream edge transport
GOOGLE_API_KEYExamplesRequired by the phone agent examples (Gemini)

Twilio account setup

  1. Buy or assign a Twilio phone number.
  2. Under Phone Numbers → Manage → Active numbers, open your number.
  3. Set A call comes in to Webhook pointing at:
    https://<NGROK_URL>/twilio/voice
    
  4. Use HTTP POST.
For a full walkthrough with copy-paste commands, see the Twilio Phone Agent example.

Quick Start

The plugin gives you registry, media stream, and bridge primitives. Your FastAPI server wires them to Twilio webhooks and WebSockets:
from vision_agents.plugins import twilio

registry = twilio.TwilioCallRegistry()

# 1. Register call from webhook handler
call = registry.create(
    call_id,
    form_data=data.model_dump(by_alias=True),
    prepare=prepare_call,  # optional: pre-warm agent + Stream call
)

# 2. Return TwiML that starts a media stream
url = f"wss://{NGROK_URL}/twilio/media/{call_id}/{call.token}"
return twilio.create_media_stream_response(url)

# 3. WebSocket handler
stream = twilio.TwilioMediaStream(websocket)
await stream.accept()

agent, phone_user, stream_call = await call.await_prepare()
await twilio.attach_phone_to_call(stream_call, stream, phone_user.id)
await stream.run()

Inbound calls

Twilio sends a webhook when someone calls your number. Validate the signature, register the call, and return TwiML to start the media stream:
from fastapi import Depends

@app.post("/twilio/voice")
async def voice_webhook(
    _: None = Depends(twilio.verify_twilio_signature),
    data: twilio.CallWebhookInput = Depends(twilio.CallWebhookInput.as_form),
):
    call_id = str(uuid.uuid4())
    twilio_call = registry.create(
        call_id,
        data.model_dump(by_alias=True),
        prepare=lambda: prepare_call(call_id),
    )
    url = f"wss://{NGROK_URL}/twilio/media/{call_id}/{twilio_call.token}"
    return twilio.create_media_stream_response(url)


@app.websocket("/twilio/media/{call_id}/{token}")
async def media_stream(websocket: WebSocket, call_id: str, token: str):
    twilio_call = registry.validate(call_id, token)
    stream = twilio.TwilioMediaStream(websocket)
    await stream.accept()
    # attach to agent and run stream (see Quick Start)
When running behind ngrok, add ProxyHeadersMiddleware so Twilio signature validation sees the public HTTPS URL. See the phone agent example source.

Outbound calls

Pre-register the call in the registry, start your server, then dial via the Twilio REST API:
from twilio.rest import Client

call_id = str(uuid.uuid4())
twilio_call = registry.create(call_id, prepare=lambda: prepare_call(call_id))
url = f"wss://{NGROK_URL}/twilio/media/{call_id}/{twilio_call.token}"

client = Client(TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN)
client.calls.create(
    twiml=twilio.create_media_stream_twiml(url),
    to=to_number,
    from_=from_number,
)
The WebSocket media handler is the same as inbound.

Key Components

ComponentDescription
TwilioCallRegistryTracks active calls, tokens, optional async prepare tasks
TwilioCallCall session with from_number, to_number, await_prepare()
TwilioMediaStreamWebSocket media handler; exposes audio_track, send_audio(), run()
attach_phone_to_callBridges Twilio mulaw audio ↔ Stream call participant
verify_twilio_signatureFastAPI dependency for webhook authentication
CallWebhookInputTyped model for Twilio voice webhook form data
create_media_stream_response / create_media_stream_twimlTwiML helpers for bidirectional streaming
Audio helpersmulaw_to_pcm, pcm_to_mulaw, TWILIO_SAMPLE_RATE

Audio

Twilio Media Streams use mulaw encoding at 8 kHz. The plugin converts between mulaw and PCM automatically in TwilioMediaStream and exposes conversion helpers if you need them directly.
ConstantDefaultDescription
TWILIO_SAMPLE_RATE8000Twilio media stream sample rate

Common setup errors

ErrorFix
Webhook URL mismatchUpdate the Twilio number webhook to your current ngrok URL
Invalid Twilio signatureEnsure TWILIO_AUTH_TOKEN is set; add ProxyHeadersMiddleware behind ngrok
No audio on callConfirm the media WebSocket URL uses wss:// and the server is running before dialing
Outbound call failsVerify --from is a Twilio number on your account

Next Steps

Twilio Phone Agent

Step-by-step inbound and outbound phone tutorial

Phone Support Agent (RAG)

Add knowledge retrieval to phone calls

Phone Calling

Provider overview and learning path

Telnyx

Alternative telephony provider

Stream Video RTC

Default edge transport for agent calls

Build a Voice Agent

Get started with voice