Live Video Try-On - Vision Agents

View virtual try-on example on GitHub

Check out the complete virtual try-on example in our GitHub repository

Build a real-time virtual try-on agent using Vision Agents and Decart. Powered by Decart’s Lucy-2 real-time model (lucy_2_rt), the agent listens for voice requests and restyles your video feed so you appear to be wearing different outfits — driven by both a text prompt and a reference image. Lucy-2 is purpose-built for virtual try-on and costume-swap use cases. It accepts a reference image alongside a prompt, enabling accurate outfit transfer onto the user’s live video. Prompt and image updates are applied atomically via update_state, so the output video never shows a half-updated frame.

Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

What you will build

Listen to voice input and swap outfits in real time
Use Decart Lucy-2 to restyle your video feed with both a prompt and a reference image
Atomically swap costumes via processor.update_state(prompt=..., image=...)
Fall back to prompt-only changes for freeform outfit requests
Speak with an expressive voice using ElevenLabs
Run on Stream’s low-latency edge network

Next steps

Decart Integration

Explore Decart’s video restyling and try-on capabilities

Expressive Voice Narrator

See another storytelling example with Cartesia’s expressive TTS

Video Call Moderator Expressive Voice Narrator

⌘I

Documentation Index

View virtual try-on example on GitHub

​What you will build

​Next steps

Decart Integration

Expressive Voice Narrator

What you will build

Next steps