Vision Agents is an open-source Video AI framework for building real-time voice and video applications built and maintained by the team at Stream. It ships with Stream Video as its default low-latency transport, powered by our global edge network. The framework is edge/transport agnostic meaning developers can also bring any edge layer they like.
Each integration is built on extensible base classes. For example, with BaseProcessor or VideoProcessorMixin, you can plug in custom computer-vision models like Ultralytics YOLO.👉 Ready to dive in? Follow the installation guide to build your first Agent.