Skip to main content
Ultralytics YOLO provides state-of-the-art computer vision models for object detection, pose estimation, and more. The Ultralytics plugin for Vision Agents enables real-time pose detection with skeleton overlays that your LLM can analyze.

Features

  • Pose Detection: Detect human body keypoints in real-time
  • Hand Tracking: Optional detailed hand skeleton connections
  • Wrist Highlights: Visual markers for wrist positions
  • Video Publishing: Annotated frames published back to the call

Installation

Install the Ultralytics plugin with:
uv add vision-agents[ultralytics]
The YOLO model file downloads automatically on first use.

Example

from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, gemini, ultralytics

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Fitness Coach", id="agent"),
    instructions="Analyze the user's form and provide feedback.",
    llm=gemini.Realtime(fps=10),
    processors=[
        ultralytics.YOLOPoseProcessor(
            model_path="yolo11n-pose.pt",
            conf_threshold=0.5,
            enable_hand_tracking=True
        )
    ],
)

YOLOPoseProcessor

The YOLOPoseProcessor detects human poses and annotates video frames with skeleton overlays.

Parameters

ParameterTypeDefaultDescription
model_pathstr"yolo11n-pose.pt"Path to YOLO pose model file
conf_thresholdfloat0.5Minimum confidence for keypoint detection
imgszint512Input image size for inference
devicestr"cpu"Device for inference ("cpu" or "cuda")
fpsint30Output video frame rate
intervalint0Processing interval in seconds (0 = every frame)
enable_hand_trackingboolTrueDraw hand skeleton connections
enable_wrist_highlightsboolTrueHighlight wrist positions with markers
max_workersint24Thread pool size for pose processing

How It Works

  1. Receives video frames from the call participant
  2. Detects human poses using YOLO inference
  3. Annotates frames with skeleton overlays and keypoint markers
  4. Publishes the annotated video stream back to the call
  5. Provides pose data that can be accessed via the processor’s state

Skeleton Colors

  • Blue: Main body skeleton (torso, legs, head)
  • Cyan: Right hand connections
  • Yellow: Left hand connections
  • Red circles: Wrist position markers

Use Cases

  • Sports Coaching: Golf swing analysis, tennis form, batting stance
  • Fitness Training: Exercise form checking, rep counting
  • Dance Instruction: Movement analysis, choreography feedback
  • Physical Therapy: Range of motion tracking, posture correction
  • Gaming: Motion-based game controls, gesture recognition

GPU Acceleration

For better performance, use CUDA:
ultralytics.YOLOPoseProcessor(
    model_path="yolo11n-pose.pt",
    device="cuda",  # Use GPU
    imgsz=640       # Larger input for better accuracy
)

Model Options

YOLO offers different model sizes trading off speed vs accuracy:
ModelSpeedAccuracyUse Case
yolo11n-pose.ptFastestGoodReal-time on CPU
yolo11s-pose.ptFastBetterReal-time on GPU
yolo11m-pose.ptMediumHighQuality-focused
yolo11l-pose.ptSlowerHigherMaximum accuracy
Models download automatically from Ultralytics on first use.