Video Call Moderator - Vision Agents

View Moderation Example on GitHub

Check out the complete Moderation example in our GitHub repository

In this example, we build a real-time video moderation agent that detects offensive gestures using a custom Roboflow model running locally, censors them with a Gaussian blur, and issues escalating verbal warnings — ultimately kicking the user from the call on the third offense. The agent uses Gemini Flash Lite for language and Deepgram for speech, processing video at 15 FPS with local inference for low-latency detection.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

What You Will Build

Detect offensive content in real time using a custom Roboflow model with local inference
Censor detected content with Gaussian blur applied directly to the video stream
Issue escalating verbal warnings via Gemini Flash Lite and Deepgram TTS
Automatically kick users from the call on the third offense via the Stream API
Run detection locally with no cloud round-trip per frame for low latency

Next Steps

Roboflow Integration

Train or find your own detection model on Roboflow

Docker Deployment

Docker setup and environment configuration

Smart Security Camera Live Video Try-On

View Moderation Example on GitHub

​What You Will Build

​Next Steps

Roboflow Integration

Docker Deployment

What You Will Build

Next Steps