Safety & Guardrails

Guardrails validate agent inputs and outputs before and after each LLM call. Attach one or more to any agent to enforce content policies, detect PII, check response grounding, and cap costs.

from sagewai import UniversalAgent, ContentFilter, PIIGuard

agent = UniversalAgent(
    name="safe-agent",
    model="gpt-4o",
    guardrails=[
        ContentFilter(blocklist=["password"], action="block"),
        PIIGuard(action="redact"),
    ],
)

Guardrail (ABC)

Abstract base class for all guardrails. Implement check_input and check_output to create custom guardrails.

from sagewai import Guardrail, GuardrailResult

class MyGuardrail(Guardrail):
    async def check_input(self, message: str, context: dict) -> GuardrailResult:
        if "forbidden" in message:
            return GuardrailResult(passed=False, violation="Forbidden content")
        return GuardrailResult(passed=True)

    async def check_output(self, response: str, context: dict) -> GuardrailResult:
        return GuardrailResult(passed=True)

Required Methods

Method	Signature	Returns	Description
`check_input`	`async check_input(message: str, context: dict)`	`GuardrailResult`	Validate input before LLM call
`check_output`	`async check_output(response: str, context: dict)`	`GuardrailResult`	Validate output before returning to user

GuardrailResult

Result of a guardrail check.

from sagewai import GuardrailResult

# Passed
result = GuardrailResult(passed=True)

# Failed
result = GuardrailResult(
    passed=False,
    violation="Blocked content detected",
    action="block",
)

Fields

Field	Type	Default	Description
`passed`	`bool`	required	Whether the check passed
`violation`	`str \| None`	`None`	Description of the violation
`action`	`Literal["block", "warn", "escalate"]`	`"block"`	Action to take on violation

GuardrailViolationError

Raised when a guardrail blocks an input or output (action = "block").

from sagewai import GuardrailViolationError

try:
    response = await agent.chat("forbidden content")
except GuardrailViolationError as e:
    print(e.result.violation)

Attributes

Attribute	Type	Description
`result`	`GuardrailResult`	The guardrail result that triggered the error

ContentFilter

Block messages containing forbidden words or regex patterns.

from sagewai import ContentFilter

guard = ContentFilter(
    blocklist=["password", "secret"],
    patterns=[r"\d{3}-\d{2}-\d{4}"],  # SSN pattern
    action="block",
)

Constructor

Parameter	Type	Default	Description
`blocklist`	`list[str] \| None`	`None`	Forbidden words (case-insensitive)
`patterns`	`list[str] \| None`	`None`	Regex patterns to block
`action`	`Literal["block", "warn", "escalate"]`	`"block"`	Action on detection

PIIGuard

Detect and handle personally identifiable information in agent inputs and outputs.

from sagewai import PIIGuard
from sagewai.safety.pii import PIIEntityType

guard = PIIGuard(
    action="redact",
    entity_types=[PIIEntityType.EMAIL, PIIEntityType.PHONE],
)

# Direct detection
findings = guard.detect("Contact user@example.com")
# Returns: [(PIIEntityType.EMAIL, "user@example.com")]

# Redaction
clean = guard.redact("Contact user@example.com")
# Returns: "Contact [REDACTED_EMAIL]"

Constructor

Parameter	Type	Default	Description
`action`	`Literal["block", "redact", "warn", "escalate", "log_only"]`	`"block"`	How to handle detected PII
`entity_types`	`list[PIIEntityType] \| None`	`None`	Which PII types to detect (all by default)

PIIEntityType Enum

Value	Redaction Label
`EMAIL`	`[REDACTED_EMAIL]`
`PHONE`	`[REDACTED_PHONE]`
`SSN`	`[REDACTED_SSN]`
`CREDIT_CARD`	`[REDACTED_CARD]`
`IBAN`	`[REDACTED_IBAN]`
`IP_ADDRESS`	`[REDACTED_IP]`
`PASSPORT`	`[REDACTED_PASSPORT]`

Methods

Method	Returns	Description
`detect(text)`	`list[tuple[PIIEntityType, str]]`	Detect all PII in text
`redact(text)`	`str`	Replace PII with redaction labels

HallucinationGuard

Detect potential hallucinations by checking response grounding against RAG context. Uses keyword overlap scoring (lightweight, no additional LLM calls).

from sagewai import HallucinationGuard

guard = HallucinationGuard(threshold=0.3, action="warn")

Constructor

Parameter	Type	Default	Description
`threshold`	`float`	`0.3`	Minimum grounding score (0-1). Below this triggers violation
`action`	`Literal["block", "warn", "escalate"]`	`"warn"`	Action on low grounding

TokenBudgetGuard

Block requests that would exceed a cost budget.

from sagewai import TokenBudgetGuard

guard = TokenBudgetGuard(max_usd=1.0)

Constructor

Parameter	Type	Default	Description
`max_usd`	`float`	required	Maximum cost in USD before blocking

OutputSchemaGuard

Validate that agent output conforms to a JSON schema. Checks for valid JSON and required fields.

from sagewai import OutputSchemaGuard

guard = OutputSchemaGuard(
    schema={"type": "object", "required": ["title", "body"]},
)

Constructor

Parameter	Type	Default	Description
`schema`	`dict[str, Any]`	required	JSON Schema to validate against