LlmClient
Trait for provider-specific LLM API implementations.
Overview
LlmClient is the unified interface that all provider-specific LLM clients implement. It defines a single streaming method with callback-based event handling, enabling the agent runtime to work with any provider through the same API surface.
You typically do not implement this trait directly -- use the built-in provider clients (AnthropicClient, OpenAIClient, OpenAICodexClient, OpenRouterCompletionsClient, OpenRouterClient, VertexClient) or the DynamicLlmClient wrapper for runtime dispatch.
Definition
#[async_trait]
pub trait LlmClient: Send + Sync {
async fn chat_with_tools_streaming<
FContent,
FTool,
FReason,
FToolPartial,
FContentBlock,
FUsage,
>(
&self,
messages: &[UnifiedMessage],
tools: &[UnifiedTool],
on_content: FContent,
on_tool_calls: FTool,
on_reasoning: FReason,
on_tool_calls_partial: FToolPartial,
on_content_block_complete: FContentBlock,
on_usage: FUsage,
) -> Result<()>
where
FContent: FnMut(&str) -> Result<()> + Send,
FTool: FnMut(Vec<UnifiedToolCall>) -> Result<()> + Send,
FReason: FnMut(&str) -> Result<()> + Send,
FToolPartial: FnMut(&[UnifiedToolCall]) -> Result<()> + Send,
FContentBlock: FnMut(UnifiedContentBlock) -> Result<()> + Send,
FUsage: FnMut(UnifiedUsage) -> Result<()> + Send;
fn provider_name(&self) -> &str;
}Required Methods
chat_with_tools_streaming
Sends a conversation to the LLM and streams the response through callbacks. This is the core method that drives every agent interaction.
Parameters:
| Parameter | Type | Description |
|---|---|---|
messages | &[UnifiedMessage] | Conversation history in unified format (system, user, assistant, tool results) |
tools | &[UnifiedTool] | Tool specifications the LLM can invoke |
on_content | FnMut(&str) -> Result<()> | Called for each chunk of generated text |
on_tool_calls | FnMut(Vec<UnifiedToolCall>) -> Result<()> | Called when tool calls are finalized with complete arguments |
on_reasoning | FnMut(&str) -> Result<()> | Called for reasoning/thinking tokens (text only, for streaming display) |
on_tool_calls_partial | FnMut(&[UnifiedToolCall]) -> Result<()> | Called for incremental tool call updates during streaming |
on_content_block_complete | FnMut(UnifiedContentBlock) -> Result<()> | Called when a complete content block is finalized (preserves signatures and structured data) |
on_usage | FnMut(UnifiedUsage) -> Result<()> | Called at completion with token usage statistics |
Returns: Result<()> -- succeeds when the full response has been streamed, or returns an error on failure.
provider_name
Returns the provider name string for logging and debugging.
fn provider_name(&self) -> &str;Conversation Flow
The streaming callback sequence follows this pattern:
- Text generation --
on_contentis called repeatedly with text chunks as the LLM generates its response. - Reasoning --
on_reasoningis called with thinking tokens for models that support extended thinking (e.g., Claude with thinking enabled, o-series models). These arrive interleaved with or before content. - Content blocks --
on_content_block_completeis called when a complete block (text, thinking with signature, etc.) is finalized. This preserves structured data that the text-only callbacks cannot represent. - Tool calls -- if the LLM decides to invoke tools:
on_tool_calls_partialis called with incremental updates as arguments stream inon_tool_callsis called once with the finalized tool calls and complete arguments
- Usage --
on_usageis called at the end of the response with token counts.
The caller (agent runtime) then executes any requested tools, appends results to the message history, and calls chat_with_tools_streaming again. This loop continues until the LLM stops requesting tools.
Thread Safety
All implementations must be Send + Sync for use in async contexts. The trait bound Send + Sync is enforced at the trait level.
Error Handling
Implementations return errors for:
- Authentication failures (missing or invalid API keys)
- Network request failures
- API error responses (rate limits, content policy, server errors)
- Response parsing failures
- Callback errors (if any callback returns
Err, the stream is aborted)
Built-in Implementations
| Client | Provider | Module |
|---|---|---|
AnthropicClient | Anthropic Messages API | appam::llm::anthropic |
OpenAIClient | OpenAI Responses API | appam::llm::openai |
OpenAICodexClient | OpenAI Codex subscription-backed Responses API | appam::llm::openai_codex |
OpenRouterCompletionsClient | OpenRouter Completions API | appam::llm::openrouter::completions |
OpenRouterClient | OpenRouter Responses API | appam::llm::openrouter::responses |
VertexClient | Google Vertex AI Gemini API | appam::llm::vertex |
DynamicLlmClient | Runtime dispatch to any provider | appam::llm::provider |
Azure OpenAI uses OpenAIClient with Azure-specific configuration. Azure Anthropic and Bedrock both use AnthropicClient with transport-specific configuration.
Related Types
- DynamicLlmClient -- enum wrapper that delegates to the correct provider client
- LlmProvider -- selects which client implementation to use
- UnifiedMessage -- the message format passed to
chat_with_tools_streaming - StreamConsumer -- higher-level abstraction that receives stream events from the agent runtime