Safety & Guardrails
Sagewai runs guardrails automatically on every agent call: input guardrails check incoming messages, output guardrails check LLM responses, budget enforcement caps per-request and per-project spend, and audit logging records every action for compliance. This page shows how to configure each layer and how to write custom guardrails.
Prerequisites: Agents · Next: Self-Learning · Context Engine
How Guardrails Work
Each guardrail returns one of three actions:
block— Reject the message, raiseGuardrailViolationErrorwarn— Log the violation but let the message throughescalate— Emit aGUARDRAIL_ESCALATIONevent, let the message through
Guardrails apply to all entry points: chat(), chat_with_history(), and chat_stream().
PIIGuard
PIIGuard detects personally identifiable information in agent inputs and outputs and takes the action you configure.
from sagewai import UniversalAgent, PIIGuard
from sagewai.safety.pii import PIIEntityType
agent = UniversalAgent(
name="safe-agent",
model="gpt-4o",
guardrails=[
PIIGuard(
action="redact",
entity_types=[
PIIEntityType.EMAIL,
PIIEntityType.PHONE,
PIIEntityType.SSN,
PIIEntityType.CREDIT_CARD,
],
),
],
)
Supported Entity Types
| Entity Type | Example | Redaction Label |
|---|---|---|
EMAIL | user@example.com | [REDACTED_EMAIL] |
PHONE | (555) 123-4567 | [REDACTED_PHONE] |
SSN | 123-45-6789 | [REDACTED_SSN] |
CREDIT_CARD | 4111 1111 1111 1111 | [REDACTED_CARD] |
IBAN | DE89370400440532013000 | [REDACTED_IBAN] |
IP_ADDRESS | 192.168.1.1 | [REDACTED_IP] |
PASSPORT | AB1234567 | [REDACTED_PASSPORT] |
Actions
| Action | Behavior |
|---|---|
block | Raise GuardrailViolationError |
redact | Replace PII with labels, then pass through |
warn | Log violation, allow through |
escalate | Emit event, allow through |
log_only | Log without triggering any action |
Standalone Usage
You can use PIIGuard outside of an agent when you need direct detection or redaction:
guard = PIIGuard(action="redact")
findings = guard.detect("Contact me at john@example.com or 555-123-4567")
# [(PIIEntityType.EMAIL, "john@example.com"), (PIIEntityType.PHONE, "555-123-4567")]
clean_text = guard.redact("Contact me at john@example.com or 555-123-4567")
# "Contact me at [REDACTED_EMAIL] or [REDACTED_PHONE]"
HallucinationGuard
HallucinationGuard checks whether the LLM's output is grounded in the RAG context that was retrieved for the request. It uses keyword overlap scoring to compute a grounding score from 0.0 to 1.0.
from sagewai import UniversalAgent, HallucinationGuard
agent = UniversalAgent(
name="grounded-agent",
model="gpt-4o",
memory=rag_engine,
guardrails=[
HallucinationGuard(
threshold=0.3,
action="warn",
),
],
)
How It Works
- The agent generates a response using retrieved RAG context.
HallucinationGuardcompares the response against that context.- It computes a grounding score based on keyword overlap (0.0–1.0).
- If the score falls below the threshold, the configured action triggers.
The guard only activates when RAG context is present. Without it, the check always passes.
Threshold Tuning
| Range | Behavior |
|---|---|
| 0.1–0.2 | Very permissive. Only flags near-zero overlap |
| 0.3–0.5 | Balanced. Good default for most applications |
| 0.6–0.8 | Strict. May flag valid paraphrases |
| 0.9+ | Very strict. Only accepts near-verbatim responses |
ContentFilter
ContentFilter blocks messages containing specific words or regex patterns.
from sagewai import UniversalAgent, ContentFilter
agent = UniversalAgent(
name="filtered-agent",
model="gpt-4o",
guardrails=[
ContentFilter(
blocklist=["password", "secret", "confidential"],
patterns=[r"\d{3}-\d{2}-\d{4}"], # SSN pattern
action="block",
),
],
)
TokenBudgetGuard
TokenBudgetGuard tracks estimated token costs during the agent loop and blocks further processing when the per-request budget is exceeded.
from sagewai import UniversalAgent, TokenBudgetGuard
agent = UniversalAgent(
name="budget-agent",
model="gpt-4o",
guardrails=[
TokenBudgetGuard(max_usd=1.0),
],
)
OutputSchemaGuard
OutputSchemaGuard validates that the agent's output matches a JSON schema before returning it to the caller.
from sagewai import UniversalAgent, OutputSchemaGuard
schema = {
"type": "object",
"required": ["title", "body"],
"properties": {
"title": {"type": "string"},
"body": {"type": "string"},
},
}
agent = UniversalAgent(
name="structured-agent",
model="gpt-4o",
guardrails=[OutputSchemaGuard(schema=schema)],
)
Budget Enforcement
Beyond per-request TokenBudgetGuard, Sagewai enforces project-level budgets in BaseAgent._call_llm(). Every LLM call checks the budget and records spend automatically — no extra code needed.
Budget actions:
| Action | Effect |
|---|---|
stop | Raises SagewaiBudgetExceededError, blocks the LLM call |
throttle | Delays the LLM call with exponential backoff |
warn | Logs a warning but allows the call |
Budget limits are configured per-project via the admin API or dashboard. When a budget is exceeded, the notification system can alert via email, Slack, or in-app notifications.
from sagewai.errors import SagewaiBudgetExceededError
try:
response = await agent.chat("Analyze this large dataset")
except SagewaiBudgetExceededError as e:
print(f"Budget exceeded: {e}")
Audit Logging
AuditLogger records agent actions for compliance and debugging:
from sagewai.observability.audit import AuditLogger
logger = AuditLogger(backend="file", path="/var/log/sagewai/audit.jsonl")
# Attach to an agent via event listeners
agent.on_event(logger.handle_event)
Each log entry captures: run starts and finishes, tool calls, guardrail violations, budget checks, and workflow state transitions. Logs can be archived to S3 or GCS using the archival backend.
Combining Guardrails
Guardrails execute in the order you list them. Stack multiple guardrails to address different failure modes at each stage:
from sagewai import (
UniversalAgent,
PIIGuard,
ContentFilter,
HallucinationGuard,
TokenBudgetGuard,
)
from sagewai.safety.pii import PIIEntityType
agent = UniversalAgent(
name="production-agent",
model="gpt-4o",
guardrails=[
PIIGuard(action="redact", entity_types=[
PIIEntityType.EMAIL,
PIIEntityType.SSN,
PIIEntityType.CREDIT_CARD,
]),
ContentFilter(blocklist=["DROP TABLE", "DELETE FROM"]),
HallucinationGuard(threshold=0.3, action="warn"),
TokenBudgetGuard(max_usd=2.0),
],
)
Recommended order:
- PIIGuard first — redact sensitive data before it reaches the LLM
- ContentFilter — block injection attempts early
- HallucinationGuard — check output grounding before returning
- TokenBudgetGuard — prevent runaway costs
Custom Guardrails
Implement the Guardrail abstract class to write your own:
from sagewai import Guardrail, GuardrailResult
class ProfanityFilter(Guardrail):
async def check_input(self, message: str, context: dict) -> GuardrailResult:
if any(word in message.lower() for word in self.bad_words):
return GuardrailResult(
passed=False,
violation="Profanity detected",
action="block",
)
return GuardrailResult(passed=True)
async def check_output(self, response: str, context: dict) -> GuardrailResult:
return GuardrailResult(passed=True)
Both check_input and check_output must be async and return a GuardrailResult.
What's Next
- Agents — Where guardrails are applied in the agent lifecycle
- Workflows — Budget enforcement across multi-step workflows
- Context Engine — RAG context that powers the HallucinationGuard