Models: Ollama¶
See also
- Guide → Providers: guide/providers.md
- Architecture → Provider Abstraction: architecture/provider-abstraction.md#provider-mapping
alloy.models.ollama ¶
OllamaBackend ¶
Bases: ModelBackend
Ollama backend using the ollama
Python SDK (chat endpoint).
Supports native tool-calling and strict structured outputs via the format
parameter on /api/chat, aligned with the shared tool loop semantics.
Usage¶
export ALLOY_MODEL=ollama:<model>
# Ensure the model is running locally: ollama run <model>
API strategy¶
Ollama supports two API strategies internally:
native
(default for most models): uses the Ollama Python SDK and/api/chat
.openai_chat
: uses the OpenAI SDK pointed at Ollama’s Chat Completions‑compatible endpoint (base_url=http://localhost:11434/v1
,api_key=ollama
).
Default is native
. The config layer auto‑routes ollama:*gpt-oss*
models to openai_chat
unless you explicitly set extra["ollama_api"]
. Override via Config.extra["ollama_api"] = "native" | "openai_chat"
to control it.
Notes:
- Tools: both strategies support function tools; native path uses
role="tool"
messages; OpenAI Chat path usestool_calls
/tool_call_id
. - Structured outputs: native supports JSON Schema via
format={...}
(strict). OpenAI‑compat works with Chat Completions parsing; Alloy can add one final follow‑up (no tools) whenauto_finalize_missing_output
is enabled.
Configuration extras¶
- Key:
ollama_api
- Values:
"native"
|"openai_chat"
- Default:
"native"
Example
from alloy import configure
cfg = configure(extra={"ollama_api": "openai_chat"})
Streaming¶
- Supports text‑only streaming (
ask.stream(...)
andcommand.stream(...)
). - Streaming with tools or typed outputs is not supported.
Model compatibility (tools + structured outputs)¶
Tool calling requires a “tool‑capable” model. Structured outputs (strict JSON) are enforced by the API; larger instruction‑tuned models adhere best. Quick picks (see Providers guide for details):
- Llama 3.1 (8B/70B): tools + structured OK; prefer 70B for reliability.
- Qwen 2.5/3 (mid/large): strong tool following; works with both APIs.
- Mistral Nemo 12B, Mixtral: good balance; tool‑tagged variants recommended.
- Command‑R / Command‑R+: robust tool chains.
- Firefunction‑v2 (70B): purpose‑built for function calls.
- Gemma 2 IT: tools require careful prompting or fine‑tunes; not first choice.
Caveats¶
- OpenAI‑compatible layer focuses on Chat Completions; it does not implement the OpenAI Responses API. Features tied to
/v1/responses
won’t work through the shim. - Some Ollama options (e.g.,
num_ctx
) aren’t exposed via the OpenAI shim; use the native API for full control.