Training Loop
Bootstrap on a frontier model (Opus, GPT-5). Capture every successful agent answer through the Curator. Fine-tune a small open-weight model on commodity GPUs. Deploy via Ollama or mlx_lm.server. Subsequent traffic runs at near-zero per-token cost.
End-to-end, the loop is achievable in a weekend for under $5 of compute. Once your fine-tuned model handles the bulk of traffic, the frontier provider becomes a fallback for the long tail — and your AI bill stops being a function of your usage.
What you can do with it
- Capture every answer automatically. The Curator instruments agent runs and writes successful responses to JSONL at
~/.sagewai/training/. No manual capture code. - Promote training-grade samples. Filter, dedupe, and promote the captured corpus into a dataset ready for fine-tuning.
- Trigger fine-tunes on a threshold. When the dataset crosses a configured size, Sagewai kicks off a fine-tune job — manually or via the autopilot loop.
- Fine-tune small models on cheap GPUs. Unsloth integration runs real LoRA fine-tunes of 3-7B base models on a free Colab T4 or a $0.34/hr RunPod A10G.
- Deploy locally. Ship the LoRA via Ollama on your laptop,
mlx_lm.serveron Apple Silicon, or any OpenAI-compatible endpoint. - Measure cost-down end-to-end. Each fine-tune example reports $/call before and after, against the cloud baseline.
See it in action
Primary example — the loop end-to-end
Train your own model walks the full capture → fine-tune → deploy arc. Real numbers, real LoRA, real cost-down.
Deploy half — inference deployment
Inference deployment covers the deploy half in detail: five GPU-provisioning tiers, two local-deploy paths, one SDK surface.
Pattern + foundation examples
- Example 25 —
training_data_pipeline— the Curator capture surface. - Example 36 —
autopilot_training_loop— the autopilot closes the loop automatically. - Example 38 —
unsloth_finetune— real Unsloth LoRA fine-tune. - Example 38a —
mlx_lm_server_deploy— Apple Silicon deploy. - Example 18 —
local_llm_routing— Ollama / LM Studio swap.
Where to go to ship it
- Self-learning agents — the SDK concept page for the loop.
- Training & fine-tuning guide — operator-level guide to running a fine-tune.
- Inference overview — the five-tier comparison.
- Inference — start with juggernauts — why a frontier model is the right starting point.
- Inference — free CUDA via Colab — the cheapest fine-tune path.
- Inference — rent when you grow — RunPod, Vast.ai, Modal — when to pick which.
- Inference — deploy locally — Ollama and LiteLLM for the cost-down deploy.
See also
- Autopilot — the loop closes automatically when Autopilot triggers a fine-tune as the corpus crosses the threshold.
- Observatory — the cost-down measurement surface.
- All products — the other components.