Moderation and classification — ML and LLM, both first-class

"Three HuggingFace classifiers carry the deterministic half. A cheap LLM judges the boundary cases. Sandbox keeps the workload boundaries clean."

This page is the cleanest single proof of three Sagewai claims at once: ML and LLM are both first-class (three transformers classifiers run in a sealed container, not just LLM calls), sealed containers protect production workloads (the ML and the agent run with scoped credentials, no cross-tenant leak), and cheap LLMs hold their own (the judge is Haiku or gpt-4o-mini or local Ollama, not Opus).

The pattern: pre-trained classifiers carry the deterministic half of the decision; an LLM agent reasons over their structured output and resolves the boundary cases. Audit trail per call. Local + rented modes.

What this proves

Four invariants the audience-pin person needs before they trust this in front of real user-generated content:

The classifiers run on commodity hardware. Three HuggingFace transformers (under 500MB total) run inside a sealed sandbox-ml container on laptop CPU for development, on Vast.ai when production load arrives.
The LLM is cheap. The judge model is Haiku, gpt-4o-mini, or local Ollama — it's evaluating structured classifier output, not generating from scratch. Cost per moderated post is in the sub-cent range.
The audit trail is per-classifier. Every flagged post records each classifier's verdict, the per-tool latency, the per-tool cost, and the LLM's final reasoning. The community team sees why, not just that.
Context-sensitivity is demonstrated. The example includes test cases where the classifiers vote one way but the LLM disagrees with reasoning — sarcasm, reclaimed language, in-group/out-group framing.

Architecture

Loading diagram...

Run it

On a clean machine (free path, laptop CPU)

pip install sagewai
python 49_community_moderation.py

The example pulls the three HuggingFace classifiers (one-time, ~500MB), runs them on a small set of representative posts on laptop CPU, and dispatches the final judgement to local Ollama. No paid spend.

Full live path (rented GPU + paid LLM)

export VASTAI_API_KEY=...
export ANTHROPIC_API_KEY=sk-ant-...
python 49_community_moderation.py --live

The classifiers spin up on a Vast.ai GPU pod (paired with Example 45's orchestration); the LLM judge stays on Haiku for the cost-conscious half of the demo.

Triage variant: support-ticket triage

The same pattern in a different shape — the classifier ensemble is replaced by a single LLM call doing tier+reason+draft. Strict JSON output, swap-proof across LLMs:

python 42_support_triage_agent.py

Real-world use cases

The pattern in this lighthouse — classifier ensemble inside a sealed boundary, surfaced as MCP tools, judged by a cheap LLM, audit per call — is what a senior engineer at a 50-500-person SaaS reaches for when they need a moderation or classification surface that's both defensible to compliance and affordable at scale. Five domains:

1. Community-moderation for a SaaS forum

You run the community surface for a developer-tools SaaS. 800 posts a day. Today a part-time moderator reads every one.

Concern	How this pattern solves it
Posts must not leave the boundary (privacy, GDPR)	Three classifiers run in a sealed container; the LLM judge is local Ollama; the post text never reaches a third-party API
Moderators want to see why a post was flagged	The audit log records each classifier's verdict and the LLM's reasoning sentence; click any flag, see the full chain
Sarcasm and reclaimed language must not get auto-rejected	The LLM judge can override the classifier ensemble with reasoning; the test suite includes adversarial cases

2. Customer-support triage (drop-in for tonight)

You have 200 tickets a day. Half are the same five questions. You're personally on the hook.

Concern	How this pattern solves it
The CTO wants AI-shipped this quarter; the CFO wants the cost capped	Example 42 ships a single LLM call returning strict JSON — tier, reason, draft. Soaked at 100% JSON validity across 150 calls on three local 7B models
Auto-responding to a P0 by accident is the worst outcome	The router never auto-responds to P0/P1; tier semantics are pinned in the system prompt
You want to start cheap and only escalate if quality slips	Run Ollama as primary; promote to Haiku only for the boundary cases the soak identifies

3. Sales-lead qualification from a contact form

Your marketing site fires off 100-300 contact-form submissions a week, mostly junk plus a few real deals.

Concern	How this pattern solves it
AEs spend the day clicking through 80% obvious junk	Re-label tiers: P0 = "real deal, has budget"; P3 = "spam / wrong fit"; the router does the routing
You want to trial frontier models then move to a cheaper one	Run Haiku week 1, GPT-4o-mini week 2, compare swap-proof agreement numbers — pick cheaper if 95%+ agreement
Qualified leads need response in under a minute	Sub-10s p50 on Haiku, sub-1s on local llama3.2 — the latency block is your SLA evidence

4. GitHub-issue triage on an OSS repo

You maintain an open-source project with 10-50 incoming issues a week. Most are duplicates, doc questions, or feature requests.

Concern	How this pattern solves it
Maintainer time gets eaten by re-asking for repro on every "doesn't work" issue	The auto-respond pile asks for repro politely in your voice; you only see issues with repro already in hand
Keep the human in the loop on judgement calls	P0/P1 always escalate; the agent does the boring 80%
Don't pay anything for OSS tooling	Ollama path is $0/month

5. Internal IT helpdesk

Your "submit a ticket" portal at HQ gets 30-60 tickets a day asking for password resets, software access, and "it's slow."

Concern	How this pattern solves it
L1 work is mostly click reset password	Auto-respond drains the password-reset and software-access pile with grounded drafts; L1 reviews and clicks send
Compliance forbids employee data going to a third-party LLM	Pin `--primary ollama/llama3.2:latest`; data never leaves the machine
You need an audit trail of every triage decision	Every triage has a `reason` string; log it next to the email ID and the tier

Companion examples

#	Example	What it adds
49	community_moderation	Three classifiers + LLM judge, sealed sandbox
42	support_triage_agent	Single-LLM triage with strict JSON, cost forecast, swap-proof
06	guardrails	Foundation — safety filters before exposing an agent to users
08	directives	Foundation — directive library, the harness any LLM moat