Skip to content

Provider Abstraction

How Alloy normalizes provider differences (~3 minutes).


  • Request assembly: messages, tools, and schema built once; adapters map to provider requests.
  • Streaming: adapters expose text streaming when supported; structured streaming preview assembles list elements.
  • Structured outputs: JSON Schema sent via provider‑native mechanisms; primitives wrapped/unwrapped as needed.
  • Error handling: normalize transient vs configuration vs parse errors.

Provider Mapping

OpenAI

  • API: Responses API (responses.create / responses.stream)
  • Tools: yes (function calling); parallel tool requests possible
  • Structured outputs: yes (json_schema) with strict parse; primitives wrapped via {value: ...}
  • Streaming: text‑only
  • Finalization: one extra turn (no tools) to produce final structured answer when missing (auto‑finalize)
  • Code: src/alloy/models/openai.py

Anthropic (Claude)

  • API: messages.create
  • Tools: yes (tool_use/tool_result)
  • Structured outputs: yes (schema guidance + prefill)
  • Streaming: text‑only
  • Requirements: max_tokens required (defaults to 512 if unset)
  • Code: src/alloy/models/anthropic.py

Google Gemini

  • API: google-genai (responses + tool config)
  • Tools: yes
  • Structured outputs: yes (response_json_schema)
  • Streaming: text‑only
  • Requirements: max_tool_turns must be configured
  • Code: src/alloy/models/gemini.py

Ollama (local)

  • API: ollama.chat
  • Tools: not implemented in scaffold
  • Structured outputs: limited (prompt steering for primitives)
  • Streaming: not implemented in scaffold
  • Code: src/alloy/models/ollama.py

Fake (offline)

  • Purpose: deterministic outputs for CI/examples
  • Tools: no; Structured: yes (stubbed objects); Streaming: text chunks
  • Code: src/alloy/models/base.py (inlined class)

Shared Tool Loop & LoopState (for contributors)

All providers now share a single tool‑calling loop implemented in the base backend. This removes duplicated control flow and makes adding new providers straightforward.

  • Shared logic: ModelBackend.run_tool_loop() and ModelBackend.arun_tool_loop() handle request/response iteration, turn‑limit enforcement, and parallel tool execution.
  • Contract: Providers implement a *LoopState(BaseLoopState) that supplies only provider‑specific behavior.

BaseLoopState contract

  • make_request(client): build and fire one model request using the state’s transcript/config.
  • amake_request(client): async version of make_request.
  • extract_text(response): return the assistant’s final text from this step (used when no tools are present).
  • extract_tool_calls(response): return a list of normalized ToolCall(id, name, args); return [] or None if there are no calls.
  • add_tool_results(calls, results): append provider‑native tool‑result messages/parts to the transcript so the next request can use them.

Loop semantics

  • Turn limit: increments only when tool calls are present; raises ToolLoopLimitExceeded if turns > max_tool_turns. The exception includes partial_text from the last assistant content.
  • Parallel tools: serial for one call; otherwise bounded by Config.parallel_tools_max (default), using threads in sync and asyncio.to_thread in async.
  • Streaming: text‑only. Provider front‑ends enforce this; streaming with tools or structured outputs raises a configuration error.

Provider responsibilities

  • Message shaping: build initial transcript (system/user prompts), tools/functions declarations, and any provider extras (e.g., tool_choice).
  • Tool extraction: parse provider responses into ToolCalls; where call IDs are unavailable (e.g., Gemini), rely on order.
  • Tool result injection: map ToolResult values into provider‑native tool result blocks/messages for the next turn.
  • Finalization (post‑loop): when structured outputs are requested and the primary turn produced no final JSON, issue a constrained follow‑up without tools to obtain the final object.

Adding a new provider

  • Create YourProviderLoopState(BaseLoopState) implementing the methods above.
  • In your backend, prepare the initial state (system/prompt/tools) and call run_tool_loop or arun_tool_loop.
  • Implement provider‑specific finalize‑JSON if applicable.