Ultralytics YOLO

Ultralytics YOLO provides state-of-the-art computer vision models for object detection, pose estimation, and more. The Ultralytics plugin for Vision Agents enables real-time pose detection with skeleton overlays that your LLM can analyze.

Features

Pose Detection: Detect human body keypoints in real-time
Hand Tracking: Optional detailed hand skeleton connections
Wrist Highlights: Visual markers for wrist positions
Video Publishing: Annotated frames published back to the call

Installation

Install the Ultralytics plugin with:

uv add vision-agents[ultralytics]

The YOLO model file downloads automatically on first use.

Example

from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, gemini, ultralytics

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Fitness Coach", id="agent"),
    instructions="Analyze the user's form and provide feedback.",
    llm=gemini.Realtime(fps=10),
    processors=[
        ultralytics.YOLOPoseProcessor(
            model_path="yolo11n-pose.pt",
            conf_threshold=0.5,
            enable_hand_tracking=True
        )
    ],
)

YOLOPoseProcessor

The YOLOPoseProcessor detects human poses and annotates video frames with skeleton overlays.

Parameters

Parameter	Type	Default	Description
`model_path`	`str`	`"yolo11n-pose.pt"`	Path to YOLO pose model file
`conf_threshold`	`float`	`0.5`	Minimum confidence for keypoint detection
`imgsz`	`int`	`512`	Input image size for inference
`device`	`str`	`"cpu"`	Device for inference (`"cpu"` or `"cuda"`)
`fps`	`int`	`30`	Output video frame rate
`interval`	`int`	`0`	Processing interval in seconds (0 = every frame)
`enable_hand_tracking`	`bool`	`True`	Draw hand skeleton connections
`enable_wrist_highlights`	`bool`	`True`	Highlight wrist positions with markers
`max_workers`	`int`	`24`	Thread pool size for pose processing

How It Works

Receives video frames from the call participant
Detects human poses using YOLO inference
Annotates frames with skeleton overlays and keypoint markers
Publishes the annotated video stream back to the call
Provides pose data that can be accessed via the processor’s state

Skeleton Colors

Blue: Main body skeleton (torso, legs, head)
Cyan: Right hand connections
Yellow: Left hand connections
Red circles: Wrist position markers

Use Cases

Sports Coaching: Golf swing analysis, tennis form, batting stance
Fitness Training: Exercise form checking, rep counting
Dance Instruction: Movement analysis, choreography feedback
Physical Therapy: Range of motion tracking, posture correction
Gaming: Motion-based game controls, gesture recognition

GPU Acceleration

For better performance, use CUDA:

ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",
    device="cuda",  # Use GPU
    imgsz=640       # Larger input for better accuracy
)

Model Options

YOLO offers different model sizes trading off speed vs accuracy:

Model	Speed	Accuracy	Use Case
`yolo11n-pose.pt`	Fastest	Good	Real-time on CPU
`yolo11s-pose.pt`	Fast	Better	Real-time on GPU
`yolo11m-pose.pt`	Medium	High	Quality-focused
`yolo11l-pose.pt`	Slower	Higher	Maximum accuracy

Models download automatically from Ultralytics on first use.

Overview

AI Providers

Custom Integrations

Features

Installation

Example

YOLOPoseProcessor

Parameters

How It Works

Skeleton Colors

Use Cases

GPU Acceleration

Model Options

Overview

AI Providers

Custom Integrations

​Features

​Installation

​Example

​YOLOPoseProcessor

​Parameters

​How It Works

​Skeleton Colors

​Use Cases

​GPU Acceleration

​Model Options

Features

Installation

Example

YOLOPoseProcessor

Parameters

How It Works

Skeleton Colors

Use Cases

GPU Acceleration

Model Options