Skip to main content

Audio and Video Processors

Processors extend the agent’s capabilities by analysing and transforming audio/video streams in real-time. They have access to audio, video and can provide state to the LLM. Examples of what you can support with processors are:
  • API calls or state: Often you need some additional state. Like the score/stats of a video game/ sport match etc.
  • Video Analysis: Pose detection, object recognition etc. Share the output of this with the realtime LLM
  • Video/image capture: Easily support AI driven video capture or images.
  • Video/audio transform: Video avatars, video effects etc
Examples clarify it best

Simple Examples

Simple logging Thumbnail Green hue effect

Advanced examples

YoloPose This processor implements a yolo pose detection. The pose is passed to the AI.

Processor API

Audio
# process incoming audio
async def process_audio(
    self, audio_data: bytes, participant: models_pb2.Participant
) -> None:
    """Process audio data. Override this method to implement audio processing."""
    pass
    
# add outgoing audio
def create_audio_track(self):
    return aiortc.AudioStreamTrack()
Images
async def process_image(
    self, image: Image.Image, participant: models_pb2.Participant
):
    pass
Video
# process incoming video
async def process_video(
    self,
    track: aiortc.mediastreams.MediaStreamTrack,
    participant: models_pb2.Participant,
):
    pass

# outgoing video
def create_video_track(self):
    return aiortc.VideoStreamTrack()
I