Agents

Agents are the core building block of Sagewai. An agent wraps an LLM with a tool-calling loop: send a message, the LLM responds (possibly requesting tool calls), tools are executed, and the loop continues until the LLM produces a final text response.

BaseAgent

BaseAgent is the abstract foundation class. All agents inherit from it. You never instantiate BaseAgent directly — instead, use one of the concrete engines.

Architecture

User Message
    |
    v
BaseAgent.chat(message)
    |
    v
[Build Messages] --> [Inject Memory Context]
    |
    v
[Check Input Guardrails]
    |
    v
[ExecutionStrategy.execute()]
    |
    +---> [_call_llm()] --> LLM Response
    |         |
    |         v
    |    Has tool_calls?
    |    YES --> [Execute Tools] --> Loop back
    |    NO  --> Return text response
    |
    v
[Check Output Guardrails]
    |
    v
Return response text

Constructor Parameters

ParameterTypeDefaultDescription
namestrrequiredAgent name (used in logging, events, admin)
modelstr"gpt-4o"LLM model identifier
system_promptstr""System message prepended to all conversations
toolslist[ToolSpec][]Tools the agent can use
temperaturefloat0.7LLM temperature
max_tokensint | NoneNoneMax output tokens per LLM call
max_iterationsint10Max tool-calling loop iterations
strategyExecutionStrategyReActStrategy()Reasoning loop strategy
memoryAnyNoneMemory backend (ContextEngine, VectorMemory, RAGEngine)
guardrailslist[Guardrail][]Input/output safety guardrails
max_context_tokensint | NoneNoneAuto-compact context when exceeded
directivesbool | DirectiveEngineNoneEnable directive preprocessing
api_basestr | NoneNoneOverride LLM API base URL
api_keystr | NoneNoneOverride LLM API key

Public Methods

chat(message: str) -> str

Send a single message and get a text response. This is the simplest interface.

response = await agent.chat("What is quantum computing?")

chat_with_history(messages: list[ChatMessage]) -> ChatMessage

Run the agent loop with an explicit conversation history. Useful for multi-turn conversations where you manage state externally.

from sagewai import ChatMessage

messages = [
    ChatMessage.system("You are an expert physicist."),
    ChatMessage.user("Explain quantum entanglement"),
]
response = await agent.chat_with_history(messages)

chat_stream(message: str) -> AsyncGenerator[str, None]

Stream text chunks in real time. Tool calls are handled internally — only text content is yielded.

async for chunk in agent.chat_stream("Tell me about black holes"):
    print(chunk, end="", flush=True)

on_event(callback)

Register a listener for agent lifecycle events (run started, tool calls, errors, etc.):

from sagewai.core.events import AgentEvent

async def my_handler(event: AgentEvent, data: dict):
    print(f"Event: {event.value}, Data: {data}")

agent.on_event(my_handler)

Events emitted: RUN_STARTED, RUN_FINISHED, RUN_ERROR, RUN_CANCELLED, STEP_STARTED, STEP_FINISHED, TOOL_CALL_START, TOOL_CALL_END, TOOL_CALL_RESULT, TEXT_MESSAGE_START, TEXT_MESSAGE_CONTENT, TEXT_MESSAGE_END, GUARDRAIL_ESCALATION, CONTEXT_COMPACTED.


UniversalAgent

UniversalAgent is the primary concrete agent, backed by LiteLLM. It supports 100+ LLM providers through a unified interface.

from sagewai import UniversalAgent

agent = UniversalAgent(
    name="assistant",
    model="gpt-4o",
    system_prompt="You are a helpful assistant.",
    temperature=0.7,
)

response = await agent.chat("Hello!")

Supported Models

Any model that LiteLLM supports works out of the box:

ProviderModel ExamplesPrefix
OpenAIgpt-4o, gpt-4o-mini, o1-preview(none)
Anthropicclaude-sonnet-4-20250514, claude-3-haiku-20240307(none)
Google Geminigemini/gemini-2.0-flash, gemini/gemini-2.5-progemini/
Mistralmistral/mistral-large-latestmistral/
Coherecommand-r-plus(none)
Azure OpenAIazure/gpt-4oazure/
AWS Bedrockbedrock/anthropic.claude-3-sonnetbedrock/
Ollama (local)ollama/llama3.1, ollama/codellamaollama/
Together AItogether_ai/meta-llama/Llama-3.1-405Btogether_ai/
Groqgroq/llama-3.1-70b-versatilegroq/

Streaming

UniversalAgent implements true token-level streaming via _stream_llm, accumulating tool call fragments across chunks:

async for chunk in agent.chat_stream("Explain relativity"):
    print(chunk, end="")

GoogleNativeAgent

GoogleNativeAgent uses the google.genai SDK for native Gemini access, bypassing LiteLLM. This provides access to Gemini-specific features like native function calling.

from sagewai import GoogleNativeAgent

agent = GoogleNativeAgent(
    name="gemini-agent",
    model="gemini-2.0-flash",
    system_prompt="You are a helpful assistant.",
)

response = await agent.chat("Hello!")

Use GoogleNativeAgent when you need:

  • Native Gemini function calling format
  • Direct access to Google GenAI SDK features
  • Vertex AI integration

Use UniversalAgent with model="gemini/..." for general-purpose Gemini access via LiteLLM.


Tools

Tools are defined using the @tool decorator, which converts a function into a ToolSpec:

from sagewai import tool

@tool
async def search_database(query: str, limit: int = 10) -> str:
    """Search the knowledge base for relevant documents.

    Args:
        query: The search query string.
        limit: Maximum number of results to return.
    """
    results = await db.search(query, limit=limit)
    return format_results(results)

The decorator extracts:

  • Name from the function name
  • Description from the docstring
  • Parameters from function type annotations
  • Handler reference to execute the function

Both sync and async handlers are supported. Tool execution is parallelized when multiple tools are called in a single LLM response.

MCP Tools

Tools can also be discovered from MCP (Model Context Protocol) servers:

from sagewai import McpClient

# Connect to an MCP server via stdio
tools = await McpClient.connect(["python", "-m", "mcp_stripe"])

# Or via SSE
tools = await McpClient.connect_sse("http://localhost:8080/sse")

# Use discovered tools with any agent
agent = UniversalAgent(name="auditor", model="gpt-4o", tools=tools)

Agent Composition

Sagewai provides four deterministic workflow agents for composing sub-agents into pipelines. Unlike a single agent that decides what to do via LLM, workflow agents follow a fixed structure while each sub-agent within uses its own LLM.

SequentialAgent

Execute sub-agents one after another, passing each agent's output as input to the next:

from sagewai import UniversalAgent, SequentialAgent

researcher = UniversalAgent(
    name="researcher",
    model="gpt-4o",
    system_prompt="You research topics and return key findings.",
)
writer = UniversalAgent(
    name="writer",
    model="claude-sonnet-4-20250514",
    system_prompt="You write polished articles from research notes.",
)
reviewer = UniversalAgent(
    name="reviewer",
    model="gpt-4o-mini",
    system_prompt="You review articles for accuracy.",
)

pipeline = SequentialAgent(
    name="article-pipeline",
    agents=[researcher, writer, reviewer],
)

result = await pipeline.chat("Write about the future of quantum computing")

ParallelAgent

Run multiple agents concurrently on the same input and merge their outputs:

from sagewai import UniversalAgent, ParallelAgent

legal = UniversalAgent(name="legal", system_prompt="Review for legal issues.")
financial = UniversalAgent(name="financial", system_prompt="Review for financial accuracy.")
grammar = UniversalAgent(name="grammar", model="gpt-4o-mini", system_prompt="Review grammar.")

review_panel = ParallelAgent(
    name="review-panel",
    agents=[legal, financial, grammar],
)

result = await review_panel.chat("Review this contract: ...")

All agents process the same input via asyncio.gather(). Results are merged with a default newline joiner, or you can pass a custom merge function.

ConditionalAgent

Route input to different agents based on a condition function:

from sagewai import UniversalAgent, ConditionalAgent

escalation = UniversalAgent(name="escalation", system_prompt="Handle complaints.")
auto_reply = UniversalAgent(name="auto-reply", system_prompt="Respond helpfully.")

router = ConditionalAgent(
    name="sentiment-router",
    condition=lambda text: "negative" if "terrible" in text.lower() else "positive",
    branches={
        "negative": escalation,
        "positive": auto_reply,
    },
    default_branch=auto_reply,
)

result = await router.chat("This product is terrible!")
# Routes to the escalation agent

The condition can be synchronous or async. For LLM-based classification, pass an async function that calls a classifier model.

LoopAgent

Repeat a single agent until a condition is met or max_iterations is exhausted:

from sagewai import UniversalAgent, LoopAgent

refiner = UniversalAgent(
    name="refiner",
    model="gpt-4o",
    system_prompt="Improve the text. Output DONE when satisfied.",
)

loop = LoopAgent(
    name="iterative-refiner",
    agent=refiner,
    max_iterations=5,
    should_stop=lambda result, iteration: "DONE" in result,
)

result = await loop.chat("Draft: AI is good at many things...")

Agent-as-Tool

Wrap any agent as a tool so that an orchestrator agent can invoke sub-agents dynamically based on LLM reasoning:

from sagewai import UniversalAgent, agent_as_tool

researcher = UniversalAgent(name="researcher", model="gpt-4o")
writer = UniversalAgent(name="writer", model="claude-sonnet-4-20250514")

orchestrator = UniversalAgent(
    name="orchestrator",
    model="gpt-4o",
    tools=[
        agent_as_tool(researcher, description="Researches a topic thoroughly"),
        agent_as_tool(writer, description="Writes polished content"),
    ],
)

result = await orchestrator.chat("Research and write about quantum computing")

The orchestrator's LLM decides which sub-agents to invoke and in what order. This differs from SequentialAgent (fixed order) and ConditionalAgent (rule-based routing) because the LLM makes dynamic delegation decisions.


Choosing an Agent Pattern

PatternOrchestrationUse Case
Single UniversalAgentLLM decides everythingSimple Q&A, single-domain tasks
SequentialAgentFixed pipelineResearch -> Write -> Review
ParallelAgentFan-out, mergeMulti-perspective analysis
ConditionalAgentRule-based routingIntent classification, triage
LoopAgentIterative refinementEdit until quality threshold
agent_as_toolLLM-decided delegationDynamic multi-agent orchestration

Patterns compose freely. A SequentialAgent can contain a ParallelAgent as one of its steps, which itself contains UniversalAgent sub-agents with different models and strategies.


What's Next

  • Strategies — Control how agents reason: ReAct, Tree of Thoughts, LATS, Planning
  • Memory — Give agents long-term memory with vector, graph, and hybrid retrieval
  • Workflows — Durable execution with checkpointing, human approval, distributed workers