Skip to content

Alloy Roadmap

This page outlines the planned work for Alloy. It captures themes and release-sized milestones. Day-to-day tasks are tracked in GitHub Issues and grouped under Milestones.

  • Source of truth: this document (high-level) + GitHub Issues/Milestones (detailed).
  • Versioning: pre-1.0 semantic versioning. New features bump the minor version; fixes bump patch.

Themes

  • Provider parity: close feature gaps across OpenAI, Anthropic, Gemini, and Ollama.
  • Structured outputs: robust, typed outputs for primitives, collections, dataclasses, TypedDict, and (opt-in) Pydantic.
  • Tooling & reasoning: native tool use where supported; ReAct fallback for others.
  • Performance & reliability: sensible defaults, retries/backoff, and efficient client reuse.
  • Developer experience: strong type stubs, docs, examples, and contribution workflow.
  • Testing: unit-first with fake backends; guarded integration tests per provider.

Milestones

v0.2 – Type Safety & Tool Schema

  • Improve type stubs
  • Add a Protocol for decorated commands exposing __call__, .stream(), and .async_() so IDEs/type-checkers know about these methods.
  • Tighten ask.stream_async return type to AsyncIterable[str] in .pyi.
  • Tool schema fidelity
  • Generate JSON schema from Python type hints and defaults (str/int/float/bool, Optional/Union, dataclasses), not just "string" placeholders.
  • Mark optional vs required based on defaults; include simple nested objects.
  • Tests
  • Add unit tests for @require/@ensure contract failures.
  • Add tests for TypedDict parsing and common generics (list[TypedDict], dict[str, int]).
  • Docs
  • Document stub behavior and how .stream()/.async_() type as part of the decorated command object.

v0.3 – ReAct Fallback & Streaming Improvements

  • ReAct fallback
  • Implement minimal ReAct loop for models without native tool/function calling.
  • Safety: max turns, clear termination, and error surfacing.
  • Routing: fallback path for providers lacking native tools.
  • Streaming
  • OpenAI: support streaming with tool calls (stream tool-call events; invoke tools mid-stream).
  • Anthropic: add streaming (where the SDK supports) for content-only flows.

v0.4 – Structured Outputs Beyond Dataclasses

  • TypedDict: add schema and parsing support equivalent to dataclasses.
  • Pydantic (opt-in): parse to models (v1/v2 detection); feature-flag behind an extra.
  • Parser hooks: allow custom parsers for project-specific types.

v0.5 – Provider Parity & Performance

  • Gemini: implement native tool-calling where supported by the SDK.
  • Streaming parity: explore streaming for Gemini and Ollama (subject to SDK capabilities).
  • Client/session reuse: cache SDK clients per backend to reduce per-call overhead while keeping thread/process safety.
  • Reliability: add exponential backoff with jitter on retryable failures (configurable).

v0.6 – DX, Docs, and CI Polish

  • Contributing
  • Add CONTRIBUTING.md and issue/PR templates.
  • Clarify release process and versioning policy.
  • Examples & docs
  • Expand examples (e.g., notebook walkthrough, expected outcomes section).
  • Auto-generate/maintain feature support matrix from code/tests.
  • CI
  • Add coverage reporting; keep unit/integration split and gating for provider keys.

v1.0 – Stability & API Freeze

  • Finalize API shapes for commands, tools, config, and backends.
  • Ensure provider parity meets documented support matrix.
  • Remove temporary or deprecated shims; lock public surface in __all__ and .pyi.

Incorporating Review Feedback

  • Performance
  • Keep overhead minimal; prioritize provider-side JSON schema and light parsing.
  • Reuse provider clients across calls (micro-optimization) without holding brittle global state.
  • Watch large JSON parsing costs; add note and guidance for chunked/streaming scenarios.
  • Code quality
  • Reduce sync/async duplication in backends by extracting shared helpers for tool loops and schema wrapping.
  • Maintain clear error messages (parsing failures with context snippets).
  • Testing
  • Strengthen contract tests and edge cases around retries and parsing.
  • Keep integration tests guarded by secrets; assert types/ranges rather than exact text.
  • Documentation & examples
  • Add outcome notes to examples; keep offline/dev tips like ALLOY_BACKEND=fake.
  • Expand contribution guidance and roadmap visibility.

Tracking & Process

  • Issues & Milestones: create GitHub milestones for each version above and group issues accordingly.
  • Labels: provider/openai, provider/anthropic, provider/gemini, provider/ollama, type-system, tools, react, performance, docs, tests.
  • Release notes: summarize changes by theme and link to issues/PRs.