Start with the big providers
This page shows you how to ship your first AI feature on a frontier model — Claude Opus, GPT-5, or another top-tier provider — and capture training data from day one so you can switch to a cheaper local model later. It's for engineers under quarterly deadlines who need a working feature in weeks, not months. By the end you'll have a runnable agent on a frontier model and a path to cut its cost by ~80% in a later quarter without rewriting application code.
Before you start
- Python 3.10+ and
pip install sagewai. - An API key for at least one frontier provider (Anthropic, OpenAI,
or another supported provider) in
~/.sagewai/.env. - A task you can describe in one sentence (e.g. "categorise support tickets as refund, technical, account, or other").
Why a frontier model first
If you reach for the cheapest open model under deadline, you risk burning two weeks debugging context windows, one week tuning prompts, and shipping an agent that misclassifies 8% of inputs. Quality is unknown until you measure it.
A frontier model gives you known quality on day one. The bill is real, but a working feature with a known cost is a much easier conversation with finance than a missed deadline. Once it ships, you have something concrete to optimise.
Sagewai is built around this arc: ship on a frontier model, capture every response as training data, then fine-tune a local model on that data when cost matters more than time-to-ship.
Configure a frontier model
Set the model string on Agent. The same code runs against Claude,
GPT, and (later) a local Ollama model — only the model value
changes.
import asyncio
from sagewai.core import Agent
async def main():
agent = Agent(
name="support-triage",
model="claude-opus-4-7", # frontier model
system_prompt=(
"Categorise the support ticket as: refund, "
"technical, account, or other."
),
)
result = await agent.run("My order #1234 never arrived")
print(result.output)
asyncio.run(main())
To swap to GPT-5, change one string:
agent = Agent(name="support-triage", model="gpt-5", ...)
Once you have a fine-tuned local model (covered later in the lifecycle), the same swap points at Ollama:
agent = Agent(name="support-triage", model="ollama/my-finetuned-llama:latest", ...)
Application code does not change. See Example 18 — local LLM routing for the full swap demonstration across Claude, GPT, and Ollama.
Capture training data from day one
The cost reduction later only works if you've been collecting input/output pairs the whole time. Configure the Curator on the same project as your agent; every successful run becomes a candidate training sample, scored by user feedback (thumbs-up / thumbs-down) or by an LLM-judge classifier you configure.
from sagewai.curator import Curator
curator = Curator(
project_id="acme-support",
quality_filter="user_rating >= 4 AND human_override == False",
)
After a few weeks of production traffic on the frontier model, the Curator typically holds several thousand high-quality input/output pairs — enough to fine-tune a 3B local model that hits 90%+ of Opus quality on your specific task at roughly 1% of the per-call cost.
For the full Curator → JSONL → Unsloth → Ollama loop, see Example 36 — autopilot training loop and Example 38 — Unsloth fine-tune.
Monitor cost as you go
Wire the Observatory cost dashboard on the same project. It tracks per-agent spend in near real-time so you can:
- Show the bill with a screenshot, not a spreadsheet estimate.
- Catch a runaway loop before it costs more than a weekend of fine-tuning.
- Know when the per-call cost crosses the threshold where a local model starts paying back.
The full lifecycle
A typical Sagewai project moves through four stages. You start at the top; each row builds on the last.
| Stage | Model | Why |
|---|---|---|
| Ship | Claude Opus / GPT-5 | Known quality, deadline met |
| Show the cost | Same model + Observatory | Visible spend numbers |
| Cut the cost | Curator → fine-tune → Ollama | ~80% bill drop |
| Vendor independence | Local model on your infra | No provider lock-in |
Each transition is a one-line model swap in your Agent config.
Application code does not change.
Anti-patterns
-
Optimising before you have a product. The cheapest model is the one that ships your feature on time. That is almost never the smallest open model in week one.
-
Skipping the Curator. Training data has to exist before you can fine-tune. Capturing retroactively from logs is a week of pain; capturing prospectively via the Curator is a one-line wire-up.
-
Treating the cost reduction as a 10-week project. With Sagewai, the same engineer can run an Unsloth fine-tune on a free Colab T4 over a weekend. See Free CUDA via Colab.
Next steps
- Run the agent above against your real task and confirm the output quality.
- Wire the Curator from day one — it costs nothing and it makes the cost reduction possible.
- Open the Observatory cost dashboard so you have spend numbers ready.
- Read Free CUDA via Colab before the cost-down lands so you already know how the fine-tune works.
- When Colab's 16 GB cap or 12-hour session limit bites, move to Rent when you grow.