Skip to main content

LLM (Large Language Model)

The LLM component handles text generation and conversation logic. It supports both traditional request-response patterns and real-time streaming. The base interface provides simple_response() for generating responses from text input, includes function calling capabilities with automatic tool execution, and manages conversation context. Multiple providers are supported including OpenAI, Anthropic, Google, and others. Some LLM implementations support real-time speech-to-speech communication, eliminating the need for separate STT/TTS components:
# Traditional mode
agent = Agent(
    llm=openai.LLM(model="gpt-4o-mini"),
    stt=deepgram.STT(),
    tts=elevenlabs.TTS()
)

# Realtime mode
agent = Agent(
    llm=openai.Realtime(model="gpt-4o-realtime-preview")
)
Each LLM follows our philosophy of “thin wrapping”. Out of the box, developers can pass in their own client to the LLMs or interact with the native APIs directly including full support for passing native method args. The LLMs can be combined with other features such as processors to provide realtime feedback on the world around you. It can also be used in a simple voice only mode as shown in the previous example. For models running in the non-realtime mode, a TTS service and STT service must be provided. These models will convert the user’s speech to text which is then passed to the model. The model response is then converted into a voice output.
I