Skip to main content
Ultralytics YOLO provides state-of-the-art computer vision for object detection, pose estimation, and segmentation. The plugin enables real-time pose detection with skeleton overlays.
Vision Agents requires a Stream account for real-time transport. Most providers offer free tiers to get started.

Installation

uv add vision-agents[ultralytics]

Quick Start

from vision_agents.core import Agent, User
from vision_agents.plugins import ultralytics, gemini, getstream

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Fitness Coach", id="agent"),
    instructions="Analyze the user's form and provide feedback.",
    llm=gemini.Realtime(fps=10),
    processors=[
        ultralytics.YOLOPoseProcessor(
            model_path="yolo11n-pose.pt",
            conf_threshold=0.5,
            enable_hand_tracking=True,
        )
    ],
)
YOLO models download automatically on first use.

Parameters

NameTypeDefaultDescription
model_pathstr"yolo11n-pose.pt"YOLO pose model
conf_thresholdfloat0.5Keypoint confidence threshold
devicestr"cpu"Device ("cpu" or "cuda")
enable_hand_trackingboolTrueDraw hand skeleton connections
enable_wrist_highlightsboolTrueHighlight wrist positions

Model Sizes

ModelSpeedUse Case
yolo11n-pose.ptFastestReal-time on CPU
yolo11s-pose.ptFastReal-time on GPU
yolo11m-pose.ptMediumQuality-focused
yolo11l-pose.ptSlowerMaximum accuracy

Next Steps