Training Loop

Bootstrap on a frontier model (Opus, GPT-5). Capture every successful agent answer through the Curator. Fine-tune a small open-weight model on commodity GPUs. Deploy via Ollama or mlx_lm.server. Subsequent traffic runs at near-zero per-token cost.

End-to-end, the loop is achievable in a weekend for under $5 of compute. Once your fine-tuned model handles the bulk of traffic, the frontier provider becomes a fallback for the long tail — and your AI bill stops being a function of your usage.

What you can do with it

  • Capture every answer automatically. The Curator instruments agent runs and writes successful responses to JSONL at ~/.sagewai/training/. No manual capture code.
  • Promote training-grade samples. Filter, dedupe, and promote the captured corpus into a dataset ready for fine-tuning.
  • Trigger fine-tunes on a threshold. When the dataset crosses a configured size, Sagewai kicks off a fine-tune job — manually or via the autopilot loop.
  • Fine-tune small models on cheap GPUs. Unsloth integration runs real LoRA fine-tunes of 3-7B base models on a free Colab T4 or a $0.34/hr RunPod A10G.
  • Deploy locally. Ship the LoRA via Ollama on your laptop, mlx_lm.server on Apple Silicon, or any OpenAI-compatible endpoint.
  • Measure cost-down end-to-end. Each fine-tune example reports $/call before and after, against the cloud baseline.

See it in action

Primary example — the loop end-to-end

Train your own model walks the full capture → fine-tune → deploy arc. Real numbers, real LoRA, real cost-down.

Deploy half — inference deployment

Inference deployment covers the deploy half in detail: five GPU-provisioning tiers, two local-deploy paths, one SDK surface.

Pattern + foundation examples

Where to go to ship it

See also

  • Autopilot — the loop closes automatically when Autopilot triggers a fine-tune as the corpus crosses the threshold.
  • Observatory — the cost-down measurement surface.
  • All products — the other components.