Audio and Video Processors

Processors extend the agent’s capabilities by analysing and transforming audio/video streams in real-time. They have access to audio, video and can provide state to the LLM. Examples of what you can support with processors are:

API calls or state: Often you need some additional state. Like the score/stats of a video game/ sport match etc.
Video Analysis: Pose detection, object recognition etc. Share the output of this with the realtime LLM
Video/image capture: Easily support AI driven video capture or images.
Video/audio transform: Video avatars, video effects etc

Examples clarify it best

Simple Examples

Simple logging Thumbnail Green hue effect

Advanced examples

YoloPose This processor implements a yolo pose detection. The pose is passed to the AI.

Processor API

Audio

# process incoming audio
async def process_audio(
    self, audio_data: bytes, participant: models_pb2.Participant
) -> None:
    """Process audio data. Override this method to implement audio processing."""
    pass
    
# add outgoing audio
def create_audio_track(self):
    return aiortc.AudioStreamTrack()

Images

async def process_image(
    self, image: Image.Image, participant: models_pb2.Participant
):
    pass

Video

# process incoming video
async def process_video(
    self,
    track: aiortc.mediastreams.MediaStreamTrack,
    participant: models_pb2.Participant,
):
    pass

# outgoing video
def create_video_track(self):
    return aiortc.VideoStreamTrack()

Getting Started

AI Technologies

Core Architecture

Cookbook

Reference

Audio and Video Processors

Audio and Video Processors

Simple Examples

Advanced examples

Processor API

Getting Started

AI Technologies

Core Architecture

Cookbook

Reference

​Audio and Video Processors

​Simple Examples

​Advanced examples

​Processor API

Audio and Video Processors

Simple Examples

Advanced examples

Processor API