Getting Started

Welcome to Sagewai — agent infrastructure you own. Sagewai is an LLM-agnostic framework for building production-grade AI agents with built-in memory, tool use, guardrails, and durable workflows. It works with any model provider: OpenAI, Anthropic, Google, Mistral, or open-source models via Ollama.

This guide takes you from zero to a working agent in under five minutes.


Two Ways to Use Sagewai

Self-Hosted (Open Source): Install the SDK, start your own Sagewai server, and manage everything on your infrastructure. On first launch, the admin panel shows a setup wizard to create your admin account and configure your organization. After setup, users log in to access the dashboard.

Sagewai Cloud: Purchase and activate your account at sagewai.ai, then manage your agents through the cloud admin panel at cloud.sagewai.ai. No infrastructure to manage — just connect and start building.

Both paths use the same SDK and the same APIs. Choose self-hosted for full control, or cloud for convenience.


Install

Install the SDK with pip:

pip install sagewai

Or with uv (recommended):

uv add sagewai

Sagewai ships optional extras for advanced features:

ExtraWhat it adds
sagewai[memory]Docling document parsing, Milvus vectors, NebulaGraph, tiktoken
sagewai[fastapi]FastAPI + SSE for serving agents over HTTP
sagewai[postgres]asyncpg + SQLAlchemy + Alembic for durable state
sagewai[storage]S3 and GCS archival backends

Prerequisites

Environment Setup

Create a .env file with your API key:

# .env
OPENAI_API_KEY=sk-...

Sagewai uses LiteLLM under the hood, which reads standard provider environment variables automatically. You only need the key for the provider you want to use:

# Pick one (or several)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AI...

Your First Agent

Create a file called hello_agent.py:

import asyncio
from sagewai import UniversalAgent

async def main():
    agent = UniversalAgent(
        name="my-agent",
        model="gpt-4o",
        system_prompt="You are a helpful assistant.",
    )

    response = await agent.chat("What is the capital of France?")
    print(response)

asyncio.run(main())

Run it:

python hello_agent.py

That is three lines to create an agent and one line to get a response. The chat() method handles the full agentic loop: it sends your message to the LLM, processes any tool calls, and returns the final text response as a string.

Switch Models Instantly

Change the model parameter to use any LiteLLM-supported provider. No code changes needed:

# Anthropic
agent = UniversalAgent(name="claude", model="claude-sonnet-4-20250514")

# Google Gemini
agent = UniversalAgent(name="gemini", model="gemini/gemini-2.0-flash")

# Mistral
agent = UniversalAgent(name="mistral", model="mistral/mistral-large-latest")

# Local models via Ollama
agent = UniversalAgent(name="local", model="ollama/llama3.1")

Add a Tool

Agents become powerful when they can call tools. Use the @tool decorator to turn any typed Python function into a tool the LLM can invoke:

import asyncio
from sagewai import UniversalAgent, tool

@tool
async def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"Sunny, 22°C in {city}"

@tool
async def calculate(expression: str) -> str:
    """Safely evaluate a math expression."""
    import ast
    return str(ast.literal_eval(expression))

async def main():
    agent = UniversalAgent(
        name="assistant",
        model="gpt-4o",
        tools=[get_weather, calculate],
    )

    response = await agent.chat(
        "What's the weather in Berlin, and what is 42 * 17?"
    )
    print(response)

asyncio.run(main())

The @tool decorator extracts the function name, docstring, and type hints to build a tool specification that the LLM understands. The agent decides which tools to call based on the user message, executes them, and incorporates the results into its response.

You can also connect external tools via the Model Context Protocol (MCP):

from sagewai import McpClient

# Connect to any MCP server
tools = await McpClient.connect_stdio(["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"])
agent = UniversalAgent(name="files", model="gpt-4o", tools=tools)

Add Memory

For multi-turn conversations where the agent remembers previous exchanges, use chat_with_history() with explicit message management:

from sagewai import UniversalAgent, ChatMessage

agent = UniversalAgent(name="tutor", model="gpt-4o")

messages = [
    ChatMessage.system("You are a helpful math tutor."),
    ChatMessage.user("What is 2 + 2?"),
]

response = await agent.chat_with_history(messages)
print(response.content)  # "4"

# Continue the conversation — the agent has full context
messages.append(response)
messages.append(ChatMessage.user("Now multiply that by 3"))

response = await agent.chat_with_history(messages)
print(response.content)  # "12"

For automatic state management, use ConversationManager:

from sagewai.core.conversation import ConversationManager

manager = ConversationManager(agent=agent)
await manager.send("What is 2 + 2?")
await manager.send("Now multiply that by 3")  # remembers context

Long-Term Memory with the Context Engine

For persistent, searchable memory that survives across sessions, use the Context Engine. It ingests documents, chunks them, embeds them into a vector store, and retrieves relevant context on every chat turn:

from sagewai import UniversalAgent, ContextEngine, ContextScope
from sagewai.context import (
    InMemoryMetadataStore,
    InMemoryVectorStore,
)

# 1. Create a context engine (in-memory — no infra needed)
engine = ContextEngine(
    metadata_store=InMemoryMetadataStore(),
    vector_store=InMemoryVectorStore(),
    project_id="my-project",
)

# 2. Ingest some knowledge
await engine.ingest_text(
    text="Sagewai supports 100+ LLM providers via LiteLLM.",
    title="About Sagewai",
    scope=ContextScope.PROJECT,
    scope_id="my-project",
)

# 3. Pass the engine as memory — the agent retrieves context automatically
agent = UniversalAgent(
    name="researcher",
    model="gpt-4o",
    memory=engine,
    tools=engine.get_tools(),  # gives the agent memory_store, memory_search, etc.
    auto_learn=True,           # auto-extracts facts from conversations
)

response = await agent.chat("What LLM providers does Sagewai support?")

The Context Engine implements the MemoryProvider protocol, so passing memory=engine gives the agent automatic RAG retrieval on every turn. Adding tools=engine.get_tools() lets the agent curate its own memory with store, search, forget, and update operations.


Stream Responses

For real-time output, use chat_stream():

async def main():
    agent = UniversalAgent(name="writer", model="gpt-4o")

    async for chunk in agent.chat_stream("Tell me a short story"):
        print(chunk, end="", flush=True)
    print()

Streaming handles tool calls internally. When the LLM requests a tool call, it is executed behind the scenes, and only text content is yielded to the caller.


Use Directives

Sagewai's Directive Engine is a unique prompt preprocessor that enriches prompts before the LLM sees them. Use @ sigils to pull in context, invoke tools, or delegate to other agents — directly inside your prompt text:

agent = UniversalAgent(
    name="analyst",
    model="gpt-4o",
    memory=engine,      # context engine from above
    directives=True,    # enable directive preprocessing
)

# @context pulls relevant documents, @memory searches stored facts
response = await agent.chat(
    "@context('Q1 revenue') Summarize our Q1 financial performance."
)

Available directives:

DirectiveWhat it does
@context('query')Retrieves matching documents from the Context Engine
@context('query', scope='org', tags='finance')Scoped retrieval with tag filtering
@memory('query')Searches agent memory for stored facts
@agent:name('task')Delegates a task to another agent
@wf:name('input')Invokes a saved workflow
/tool.name('args')Calls a tool inline
#model:nameOverrides the model for this request

Directives adapt automatically to the model's capability. For large models (GPT-4o, Claude), full context is injected. For small models (Phi-3, Llama), the engine compresses context and rewrites tool descriptions to fit within the token budget.


Launch the Admin Console

Sagewai includes a built-in admin API for monitoring your agents. Start it with:

sagewai admin serve

This launches a FastAPI server (requires sagewai[fastapi]) where you can:

  • View agent runs and execution traces
  • Monitor cost and token usage
  • Manage workflows and approvals
  • Access the playground for interactive agent testing

The full admin panel with a web UI is available when running the Sage platform. See the Admin Panel guide for details.


What's Next?

You now have a working agent with tools, memory, streaming, and directives. Here is where to go from here:

Learn

  • Tutorials — 8 progressive hands-on tutorials from first agent to enterprise deployment
  • Video Tutorials — 20 video walkthroughs covering every feature
  • Your First Agent — Step-by-step tutorial building a real-world research agent

Build

  • Agents — BaseAgent, UniversalAgent, GoogleNativeAgent, composition patterns
  • Strategies — ReAct, Tree of Thoughts, LATS, Planning, Self-Correction, Routing
  • Memory & RAG — Vector memory, graph memory, hybrid RAG, episodic memory
  • Workflows — Sequential, parallel, durable workflows, human approval, distributed workers
  • Context Engine — Document ingestion, scoped access, multi-strategy retrieval
  • Directives — Prompt preprocessing with @context, @memory, @agent sigils
  • Safety & Guardrails — PII protection, hallucination detection, budget enforcement

Integrate

Deploy

Reference