Audio and Video Processors
Processors extend the agent’s capabilities by analysing and transforming audio/video streams in real-time. Examples of what you can support with processors are:- API calls or state: Often you need some additional state. Like the score/stats of a video game/ sport match etc.
- Video Analysis: Pose detection, object recognition etc. The annotated video is sent to the realtime LLM
- Video/image capture: Easily support AI driven video capture or images.
- Video/audio transform: Video avatars, video effects etc
Simple Examples
Simple loggingAdvanced Examples
YoloPose This processor implements YOLO pose detection and annotates video frames with skeleton overlays.- Golf swing analysis
- Fitness form checking
- Dance instruction
- Physical therapy monitoring
Processor API
AudioUsing Processors
Add processors to your agent:Processor Base Classes
AudioVideoProcessor
: Base class with interval-based processing supportAudioProcessorMixin
: Implementprocess_audio()
to process audio streamsImageProcessorMixin
: Implementprocess_image()
to process video frames as PIL ImagesVideoProcessorMixin
: Implementprocess_video()
to process raw video tracksVideoPublisherMixin
: Implementpublish_video_track()
to publish transformed videoAudioPublisherMixin
: Implementpublish_audio_track()
to publish transformed audio
should_process()
method respects the interval
parameter, allowing you to control processing frequency and reduce computational overhead.