Training Loop

Bootstrap on a frontier model (Opus, GPT-5). Capture every successful agent answer through the Curator. Fine-tune a small open-weight model on commodity GPUs. Deploy via Ollama or mlx_lm.server. Subsequent traffic runs at near-zero per-token cost.

End-to-end, the loop is achievable in a weekend for under $5 of compute. Once your fine-tuned model handles the bulk of traffic, the frontier provider becomes a fallback for the long tail — and your AI bill stops being a function of your usage.

What you can do with it

Capture every answer automatically. The Curator instruments agent runs and writes successful responses to JSONL at ~/.sagewai/training/. No manual capture code.
Promote training-grade samples. Filter, dedupe, and promote the captured corpus into a dataset ready for fine-tuning.
Trigger fine-tunes on a threshold. When the dataset crosses a configured size, Sagewai kicks off a fine-tune job — manually or via the autopilot loop.
Fine-tune small models on cheap GPUs. Unsloth integration runs real LoRA fine-tunes of 3-7B base models on a free Colab T4 or a $0.34/hr RunPod A10G.
Deploy locally. Ship the LoRA via Ollama on your laptop, mlx_lm.server on Apple Silicon, or any OpenAI-compatible endpoint.
Measure cost-down end-to-end. Each fine-tune example reports $/call before and after, against the cloud baseline.

See it in action

Primary example — the loop end-to-end

Train your own model walks the full capture → fine-tune → deploy arc. Real numbers, real LoRA, real cost-down.

Deploy half — inference deployment

Inference deployment covers the deploy half in detail: five GPU-provisioning tiers, two local-deploy paths, one SDK surface.

Pattern + foundation examples

Example 25 — training_data_pipeline — the Curator capture surface.
Example 36 — autopilot_training_loop — the autopilot closes the loop automatically.
Example 38 — unsloth_finetune — real Unsloth LoRA fine-tune.
Example 38a — mlx_lm_server_deploy — Apple Silicon deploy.
Example 18 — local_llm_routing — Ollama / LM Studio swap.

Where to go to ship it

Self-learning agents — the SDK concept page for the loop.
Training & fine-tuning guide — operator-level guide to running a fine-tune.
Inference overview — the five-tier comparison.
Inference — start with juggernauts — why a frontier model is the right starting point.
Inference — free CUDA via Colab — the cheapest fine-tune path.
Inference — rent when you grow — RunPod, Vast.ai, Modal — when to pick which.
Inference — deploy locally — Ollama and LiteLLM for the cost-down deploy.