Train your own model — the $5 weekend training loop

"Anthropic raised prices 10x. We were fine — we already had our own model."

This is the company's reason to be in one page. Bootstrap with Opus or GPT-5 in Q1. Capture every answer through the Curator. Fine-tune a 3B base model on the captured dataset using Unsloth on a free Colab T4 or a $0.34/hr RunPod A10G. Deploy the resulting LoRA via Ollama or mlx-lm. Serve real traffic at zero per-token cost.

End-to-end the loop costs under $5 and a weekend. By Q3 your CFO has a credible answer to "why did the API bill quadruple?": it didn't.

What this proves

Four invariants the audience-pin person needs before they trust this in front of their actual customers:

  1. The capture step is invisible. The Curator wraps every agent call so you don't have to instrument anything. Examples 25 and 36 ship the capture loop.
  2. The fine-tune step is honest. Example 38 runs a real Unsloth fine-tune of Qwen2.5-3B-Instruct on a real dataset, prints real loss curves, and deploys the resulting LoRA. No synthetic numbers.
  3. The deploy step is portable. mlx-lm on Apple Silicon (Example 38a), Ollama everywhere else. Same captured dataset, two backends, one swap.
  4. The cost-down is forecastable. Each inference-tier example prints a $/call comparison against the cloud baseline. You point at it and the CFO line falls out.

Architecture

Loading diagram...

Run it

The training loop is composed of shipped examples, each runnable on its own. Read them in this order:

  1. Example 25 — training data pipeline — the Curator capture surface. Run a few agent calls, see the JSONL accumulate at ~/.sagewai/training/.
  2. Example 36 — autopilot training loop — the loop closes: capture triggers a FineTuneJob when the dataset crosses the threshold.
  3. Pick a GPU tier:
  4. Example 38 — real Unsloth fine-tune — runs the actual fine-tune on the chosen tier, prints loss curves, saves the LoRA.
  5. Deploy:

Total wall-clock for a weekend developer: ~2 hours dev time, ~30-60 minutes fine-tuning, ~$0-5 in compute depending on the tier.

Real-world use cases

The pattern in this loop — Curator captures, JSONL accumulates, Unsloth fine-tunes, Ollama deploys — is what a senior engineer at a 50-500-person SaaS reaches for once they've shipped the AI feature in Q1 and the CFO is asking the Q3 question. Four domains:

1. SaaS support-triage cost-down

You've shipped Example 42's triage agent in Q1 on Claude Haiku. By Q3 you're triaging 12,000 emails a month at $0.0007 per call.

ConcernHow this pattern solves it
The CFO wants the API bill cut in halfFine-tune Qwen2.5-3B-Instruct on the 12K captured triage decisions; deploy via Ollama on a $40/month VPS; per-call cost drops to zero
Quality must not regress on the P0/P1 casesCurator records every triage; the fine-tune sees real production data, not synthetic; the soak harness in _soaks/directives_soak.py grades the candidate model before promotion
The CTO wants to see the receiptsExample 38 prints the loss curve, the eval dataset accuracy, and the $/call delta — paste it into the OKR doc

2. E-commerce product description generation

Your catalogue has 50K SKUs. You've been generating descriptions on GPT-4o at $0.0024 per SKU, which is $120/month and growing as the catalogue does.

ConcernHow this pattern solves it
Description quality must match the brand voiceCapture 1K human-edited descriptions, fine-tune Mistral-7B on them, the LoRA learns the voice
You want to add more categories without re-fine-tuningThe captured dataset stays in ~/.sagewai/training/; the next fine-tune trains on the merged corpus
Catalogue ingestion runs nightly and can't depend on a flaky third-partyOllama runs on the same machine as the ingestion job; no network call leaves the box

3. Healthcare-compliant note summarisation

Your scribe app summarises clinical notes for primary-care physicians. HIPAA forbids sending PHI to OpenAI without a BAA — and the BAA cost is on top of the API spend.

ConcernHow this pattern solves it
Compliance forbids PHI leaving the boundaryThe fine-tuned model runs on a HIPAA-eligible Modal endpoint or on-prem; PHI never touches a third-party LLM
The audit committee wants a model cardExample 38 emits the training-set hash, eval-set accuracy, and the LoRA SHA; that's the model card
You want to upgrade the base model when a new one shipsThe Curator dataset is base-model-agnostic; re-run Example 38 against Llama-3.2-3B instead of Qwen2.5-3B and compare

4. Internal knowledge-base Q&A on engineering wikis

Your platform team is on the hook for "why is X failing?" questions across 200 services and a 5K-page Confluence. You've been pointing GPT-4o at it via RAG and the bill is $300/month.

ConcernHow this pattern solves it
The cost is unjustifiable for an internal toolFine-tune a 3B model on captured Q&A pairs; deploy on a $40/month VPS; the cost line vanishes
The team writes new runbooks every weekCurator captures Q&A from the live tool; the next fine-tune trains on the latest corpus
You want self-hosted to avoid vendor risk on internal dataSame as healthcare — Ollama on-prem, no PHI/IP leaves the boundary

Companion examples

#ExampleWhat it adds
25training_data_pipelineThe Curator capture surface
36autopilot_training_loopThe loop closes — capture triggers FineTuneJob
38unsloth_finetuneReal Unsloth fine-tune, real numbers
38amlx_lm_server_deployApple Silicon deploy via mlx_lm.server
44colab_free_cudaFree Tesla T4 via Drive-sync
45vastai_marketplace_bidBid-cheapest aggregator with reliability scoring
46custom_inference_as_toolBring-your-own endpoint
47runpod_finetune_orchestrationRunPod reliable rental
48modal_serverless_inferencePer-second serverless inference