> ## Documentation Index
> Fetch the complete documentation index at: https://visionagents.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Multiple Speakers

Handle calls where multiple human participants talk to the same agent. The framework routes audio per-participant and gates who the agent listens to at any given moment.

## How It Works

When several participants join a call, the agent maintains a separate audio queue for each one. A **multi-speaker filter** decides whose audio actually reaches the pipeline:

1. Each participant gets their own audio queue.
2. A `FirstSpeakerWinsFilter` (enabled by default) uses Silero VAD to detect speech
3. The first participant whose speech exceeds the VAD threshold acquires a **lock** — only their audio passes through
4. Everyone else's audio is dropped until the lock is released
5. The lock releases when the active speaker's turn ends (from STT/turn signals) or they go silent

```
Participant A audio ─┐
                     ├──→ Multi-speaker filter ──→ STT → LLM → TTS
Participant B audio ─┘     (first speaker wins)
```

<Tip>
  The filter only activates when two or more participants are on the call. Single-speaker calls bypass it entirely with no overhead.
</Tip>

## Configuration

Pass a `multi_speaker_filter` to the `Agent` constructor to customize the behavior:

```python theme={null}
from vision_agents.core import Agent, User
from vision_agents.core.utils.audio_filter import FirstSpeakerWinsFilter
from vision_agents.plugins import deepgram, elevenlabs, gemini, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Assistant", id="agent"),
    instructions="You are a helpful voice assistant.",
    llm=gemini.LLM("gemini-flash-lite-latest"),
    tts=elevenlabs.TTS(),
    stt=deepgram.STT(),
    multi_speaker_filter=FirstSpeakerWinsFilter(
        speech_threshold=0.5,
        silence_release_ms=1500.0,
    ),
)
```

<Note>
  Omitting `multi_speaker_filter` (or passing `None`) defaults to `FirstSpeakerWinsFilter()` with the parameters shown above.
</Note>

## FirstSpeakerWinsFilter Parameters

| Parameter            | Type    | Default  | Description                                                               |
| -------------------- | ------- | -------- | ------------------------------------------------------------------------- |
| `speech_threshold`   | `float` | `0.5`    | Silero VAD score (0.0–1.0) a participant must exceed to acquire the lock  |
| `silence_release_ms` | `float` | `1500.0` | Milliseconds of silence from the active speaker before releasing the lock |

**Lock lifecycle:**

1. **No lock held** — all audio passes through. The first participant whose VAD score exceeds `speech_threshold` acquires the lock.
2. **Lock held** — only the locked speaker's audio reaches the pipeline. Other participants' audio is dropped without running VAD (no extra cost).
3. **Silence timeout** — if the active speaker goes silent for `silence_release_ms`, the lock is released.
4. **Turn end** — a `TurnEnded` signal releases the lock unconditionally.
5. **Participant disconnects** — the lock is cleared immediately if that participant held it.

## Building a Custom AudioFilter

Replace the default filter with your own by implementing the `AudioFilter` interface:

```python theme={null}
from typing import Optional

from getstream.video.rtc import PcmData

from vision_agents.core.edge.types import Participant
from vision_agents.core.utils.audio_filter import AudioFilter


class MyCustomFilter(AudioFilter):
    async def process_audio(
        self, pcm: PcmData, participant: Participant
    ) -> Optional[PcmData]:
        """Return PcmData to pass the audio through, or None to drop it."""
        # Your logic here
        return pcm

    def clear(self, participant: Optional[Participant] = None) -> None:
        """Called on turn end or participant disconnect.

        If participant is provided, only clear state for that participant.
        If None, clear all state unconditionally.
        """
        pass
```

Then pass it to the agent:

```python theme={null}
agent = Agent(
    ...,
    multi_speaker_filter=MyCustomFilter(),
)
```

## Best Practices

**Tune thresholds for your environment** — Lower `speech_threshold` for quiet speakers; raise it to reject background noise. Adjust `silence_release_ms` based on expected pause lengths in your use case.

**Combine with turn detection** — The multi-speaker filter gates *which* speaker's audio reaches the pipeline. [Turn detection](/ai-technologies/turn-detection) determines *when* the speaker has finished. They work together automatically.

## Next Steps

<CardGroup cols={2}>
  <Card title="Interruption Handling" icon="hand" href="/guides/interruption-handling">
    Handle user interruptions
  </Card>

  <Card title="Turn Detection" icon="wave-pulse" href="/ai-technologies/turn-detection">
    VAD and turn detection concepts
  </Card>
</CardGroup>