Safety & Guardrails

Sagewai runs guardrails automatically on every agent call: input guardrails check incoming messages, output guardrails check LLM responses, budget enforcement caps per-request and per-project spend, and audit logging records every action for compliance. This page shows how to configure each layer and how to write custom guardrails.

Prerequisites: Agents · Next: Self-Learning · Context Engine

How Guardrails Work

Loading diagram...

Each guardrail returns one of three actions:

  • block — Reject the message, raise GuardrailViolationError
  • warn — Log the violation but let the message through
  • escalate — Emit a GUARDRAIL_ESCALATION event, let the message through

Guardrails apply to all entry points: chat(), chat_with_history(), and chat_stream().


PIIGuard

PIIGuard detects personally identifiable information in agent inputs and outputs and takes the action you configure.

from sagewai import UniversalAgent, PIIGuard
from sagewai.safety.pii import PIIEntityType

agent = UniversalAgent(
    name="safe-agent",
    model="gpt-4o",
    guardrails=[
        PIIGuard(
            action="redact",
            entity_types=[
                PIIEntityType.EMAIL,
                PIIEntityType.PHONE,
                PIIEntityType.SSN,
                PIIEntityType.CREDIT_CARD,
            ],
        ),
    ],
)

Supported Entity Types

Entity TypeExampleRedaction Label
EMAILuser@example.com[REDACTED_EMAIL]
PHONE(555) 123-4567[REDACTED_PHONE]
SSN123-45-6789[REDACTED_SSN]
CREDIT_CARD4111 1111 1111 1111[REDACTED_CARD]
IBANDE89370400440532013000[REDACTED_IBAN]
IP_ADDRESS192.168.1.1[REDACTED_IP]
PASSPORTAB1234567[REDACTED_PASSPORT]

Actions

ActionBehavior
blockRaise GuardrailViolationError
redactReplace PII with labels, then pass through
warnLog violation, allow through
escalateEmit event, allow through
log_onlyLog without triggering any action

Standalone Usage

You can use PIIGuard outside of an agent when you need direct detection or redaction:

guard = PIIGuard(action="redact")

findings = guard.detect("Contact me at john@example.com or 555-123-4567")
# [(PIIEntityType.EMAIL, "john@example.com"), (PIIEntityType.PHONE, "555-123-4567")]

clean_text = guard.redact("Contact me at john@example.com or 555-123-4567")
# "Contact me at [REDACTED_EMAIL] or [REDACTED_PHONE]"

HallucinationGuard

HallucinationGuard checks whether the LLM's output is grounded in the RAG context that was retrieved for the request. It uses keyword overlap scoring to compute a grounding score from 0.0 to 1.0.

from sagewai import UniversalAgent, HallucinationGuard

agent = UniversalAgent(
    name="grounded-agent",
    model="gpt-4o",
    memory=rag_engine,
    guardrails=[
        HallucinationGuard(
            threshold=0.3,
            action="warn",
        ),
    ],
)

How It Works

  1. The agent generates a response using retrieved RAG context.
  2. HallucinationGuard compares the response against that context.
  3. It computes a grounding score based on keyword overlap (0.0–1.0).
  4. If the score falls below the threshold, the configured action triggers.

The guard only activates when RAG context is present. Without it, the check always passes.

Threshold Tuning

RangeBehavior
0.1–0.2Very permissive. Only flags near-zero overlap
0.3–0.5Balanced. Good default for most applications
0.6–0.8Strict. May flag valid paraphrases
0.9+Very strict. Only accepts near-verbatim responses

ContentFilter

ContentFilter blocks messages containing specific words or regex patterns.

from sagewai import UniversalAgent, ContentFilter

agent = UniversalAgent(
    name="filtered-agent",
    model="gpt-4o",
    guardrails=[
        ContentFilter(
            blocklist=["password", "secret", "confidential"],
            patterns=[r"\d{3}-\d{2}-\d{4}"],  # SSN pattern
            action="block",
        ),
    ],
)

TokenBudgetGuard

TokenBudgetGuard tracks estimated token costs during the agent loop and blocks further processing when the per-request budget is exceeded.

from sagewai import UniversalAgent, TokenBudgetGuard

agent = UniversalAgent(
    name="budget-agent",
    model="gpt-4o",
    guardrails=[
        TokenBudgetGuard(max_usd=1.0),
    ],
)

OutputSchemaGuard

OutputSchemaGuard validates that the agent's output matches a JSON schema before returning it to the caller.

from sagewai import UniversalAgent, OutputSchemaGuard

schema = {
    "type": "object",
    "required": ["title", "body"],
    "properties": {
        "title": {"type": "string"},
        "body": {"type": "string"},
    },
}

agent = UniversalAgent(
    name="structured-agent",
    model="gpt-4o",
    guardrails=[OutputSchemaGuard(schema=schema)],
)

Budget Enforcement

Beyond per-request TokenBudgetGuard, Sagewai enforces project-level budgets in BaseAgent._call_llm(). Every LLM call checks the budget and records spend automatically — no extra code needed.

Budget actions:

ActionEffect
stopRaises SagewaiBudgetExceededError, blocks the LLM call
throttleDelays the LLM call with exponential backoff
warnLogs a warning but allows the call

Budget limits are configured per-project via the admin API or dashboard. When a budget is exceeded, the notification system can alert via email, Slack, or in-app notifications.

from sagewai.errors import SagewaiBudgetExceededError

try:
    response = await agent.chat("Analyze this large dataset")
except SagewaiBudgetExceededError as e:
    print(f"Budget exceeded: {e}")

Audit Logging

AuditLogger records agent actions for compliance and debugging:

from sagewai.observability.audit import AuditLogger

logger = AuditLogger(backend="file", path="/var/log/sagewai/audit.jsonl")

# Attach to an agent via event listeners
agent.on_event(logger.handle_event)

Each log entry captures: run starts and finishes, tool calls, guardrail violations, budget checks, and workflow state transitions. Logs can be archived to S3 or GCS using the archival backend.


Combining Guardrails

Guardrails execute in the order you list them. Stack multiple guardrails to address different failure modes at each stage:

from sagewai import (
    UniversalAgent,
    PIIGuard,
    ContentFilter,
    HallucinationGuard,
    TokenBudgetGuard,
)
from sagewai.safety.pii import PIIEntityType

agent = UniversalAgent(
    name="production-agent",
    model="gpt-4o",
    guardrails=[
        PIIGuard(action="redact", entity_types=[
            PIIEntityType.EMAIL,
            PIIEntityType.SSN,
            PIIEntityType.CREDIT_CARD,
        ]),
        ContentFilter(blocklist=["DROP TABLE", "DELETE FROM"]),
        HallucinationGuard(threshold=0.3, action="warn"),
        TokenBudgetGuard(max_usd=2.0),
    ],
)

Recommended order:

  1. PIIGuard first — redact sensitive data before it reaches the LLM
  2. ContentFilter — block injection attempts early
  3. HallucinationGuard — check output grounding before returning
  4. TokenBudgetGuard — prevent runaway costs

Custom Guardrails

Implement the Guardrail abstract class to write your own:

from sagewai import Guardrail, GuardrailResult

class ProfanityFilter(Guardrail):
    async def check_input(self, message: str, context: dict) -> GuardrailResult:
        if any(word in message.lower() for word in self.bad_words):
            return GuardrailResult(
                passed=False,
                violation="Profanity detected",
                action="block",
            )
        return GuardrailResult(passed=True)

    async def check_output(self, response: str, context: dict) -> GuardrailResult:
        return GuardrailResult(passed=True)

Both check_input and check_output must be async and return a GuardrailResult.


What's Next

  • Agents — Where guardrails are applied in the agent lifecycle
  • Workflows — Budget enforcement across multi-step workflows
  • Context Engine — RAG context that powers the HallucinationGuard