Skip to main content
Moonshine is a high-performance Speech-to-Text (STT) engine optimized for local and edge deployments. Designed to work efficiently on low-resource environments like browsers and embedded devices, Moonshine is ideal for real-time, offline voice applications where speed, privacy, and low latency are critical. The Moonshine plugin in the Vision Agents SDK allows you to integrate STT services to your application.

Installation

Install the Stream Moonshine plugin with
uv add vision-agents[moonshine]
You’ll also need to install the Moonshine ONNX version from GitHub:
uv add "useful-moonshine-onnx@git+https://github.com/usefulsensors/moonshine.git#subdirectory=moonshine-onnx"

Example

Check out our Moonshine example to see a practical implementation of the plugin and get inspiration for your own projects, or read on for some key details.

Initialisation

The Moonshine plugin for Stream exists in the form of the STT class:
from vision_agents.plugins import moonshine

stt = moonshine.STT()
We recommend using the Moonshine plugin combined with a VAD plugin like Silero to avoid excessive local processing.

Parameters

These are the parameters available in the MoonshineSTT plugin for you to customise:
NameTypeDefaultDescription
model_namestr"moonshine/base"The Moonshine model to use. Supported options are "moonshine/tiny" and "moonshine/base".
sample_rateint16000The sample rate (in Hz) of the audio input. Must match Moonshine’s expected rate.
languagestr"en-US"Language code for transcription. Currently, only "en-US" is supported.
min_audio_length_msint100Minimum length (in milliseconds) of audio required before processing.
target_dbfsfloat-26.0Target RMS loudness level (in dBFS) for audio normalization before transcription.

Functionality

Process Audio

Once you join the call, you can listen to the connection for audio events. You can then pass along the audio events for the STT class to process:
from getstream.video import rtc

async with rtc.join(call, bot_user_id) as connection:

    @connection.on("audio")
    async def on_audio(pcm: PcmData, user):
        # Process audio through Moonshine STT
        await stt.process_audio(pcm, user)

Events

Transcript Event

The transcript event is triggered when a final transcript is available from Moonshine:
@stt.on("transcript")
async def on_transcript(text: str, user: any, metadata: dict):
    # Process transcript event here

Error Event

If an error occurs, an error event is fired:
@stt.on("error")
async def on_stt_error(error):
    # Process error event here

Close

You can close the STT connection with the close() method:
stt.close()
I