Memory & RAG

Memory gives agents access to information beyond the current conversation. Sagewai layers four mechanisms: conversation history for short-term context, vector and graph stores for long-term retrieval, a hybrid RAG engine that combines both, and the Context Engine for production document management with scoped access.

Prerequisites: Agents · Next: Context Engine · Directives

Architecture Overview

User Query
    |
    v
ContextEngine / RAGEngine (orchestrator)
    |
    +---> Vector search (semantic similarity)
    +---> BM25 search (keyword matching)
    +---> Graph search (relationship traversal)
    |
    v
Reciprocal Rank Fusion (merge results)
    |
    v
Optional re-ranking (cross-encoder)
    |
    v
Retrieved context injected into agent messages

Memory integrates directly into BaseAgent. When you set the memory parameter, the agent retrieves relevant context and injects it before each LLM call — no extra code on your side.


ConversationManager

For multi-turn conversations where the agent needs to remember previous exchanges, ConversationManager handles state automatically:

from sagewai import UniversalAgent
from sagewai.core.conversation import ConversationManager

agent = UniversalAgent(name="tutor", model="gpt-4o")

manager = ConversationManager(agent=agent)
await manager.send("What is 2 + 2?")      # "4"
await manager.send("Now multiply that by 3")  # "12" — remembers context

If you prefer to manage message history yourself, use chat_with_history() directly:

from sagewai import UniversalAgent, ChatMessage

agent = UniversalAgent(name="tutor", model="gpt-4o")

messages = [
    ChatMessage.system("You are a helpful math tutor."),
    ChatMessage.user("What is 2 + 2?"),
]
response = await agent.chat_with_history(messages)

messages.append(response)
messages.append(ChatMessage.user("Now multiply that by 3"))
response = await agent.chat_with_history(messages)

VectorMemory

Semantic similarity search over embedding vectors. Use it for retrieving documents, passages, and factual knowledge.

In-Memory (Development)

from sagewai.memory.vector import VectorMemory

memory = VectorMemory()

await memory.store("doc-1", "Quantum computing uses qubits instead of classical bits.")
await memory.store("doc-2", "Machine learning models learn patterns from data.")

results = await memory.retrieve("How do quantum computers work?")

MilvusVectorMemory (Production)

For production workloads, use Milvus:

from sagewai.memory.milvus import MilvusVectorMemory

memory = MilvusVectorMemory(
    collection="knowledge_base",
    uri="http://localhost:19530",
    embedding_model="text-embedding-3-small",
    dimension=1536,
    top_k=5,
)

await memory.initialize()
await memory.store("doc-1", "Quantum computing uses qubits...")
results = await memory.retrieve("quantum computing basics", top_k=3)

Configuration

ParameterTypeDefaultDescription
collectionstrrequiredMilvus collection name
uristr"http://localhost:19530"Milvus server URI
embedding_modelstr"text-embedding-3-small"Embedding model name
dimensionint1536Embedding vector dimension
top_kint5Default number of results

GraphMemory

Knowledge graph storage for relational data. Use it when queries ask about entity relationships, hierarchies, or structured knowledge rather than semantic similarity.

In-Memory (Development)

from sagewai.memory.graph import GraphMemory

memory = GraphMemory()

await memory.store_relation("Alice", "works_at", "Acme Corp")
await memory.store_relation("Acme Corp", "located_in", "Berlin")
await memory.store_relation("Bob", "works_at", "Acme Corp")

results = await memory.retrieve("Who works at Acme Corp?")

NebulaGraphMemory (Production)

For production, use NebulaGraph for persistent storage with temporal fact tracking:

from sagewai.memory.nebula import NebulaGraphMemory

memory = NebulaGraphMemory(
    space="knowledge",
    hosts="127.0.0.1:9669",
    user="root",
    password="nebula",
)

await memory.initialize()
await memory.store_relation("Alice", "manages", "Project X")
results = await memory.retrieve("What does Alice manage?")

NebulaGraph tracks facts over time using valid_from and superseded_at timestamps — so you can store and query the history of a relationship, not just its current value.


RAGEngine

RAGEngine orchestrates both vector and graph backends for hybrid retrieval. Its QueryRouter classifies each incoming query and routes it to the right backend — or both.

from sagewai.memory.rag import RAGEngine, RetrievalStrategy
from sagewai.memory.milvus import MilvusVectorMemory
from sagewai.memory.nebula import NebulaGraphMemory

rag = RAGEngine(
    vector=MilvusVectorMemory(collection="articles"),
    graph=NebulaGraphMemory(space="knowledge"),
    strategy=RetrievalStrategy.HYBRID,
)

Retrieval Strategies

StrategyBehavior
VECTOR_ONLYOnly use vector similarity search
GRAPH_ONLYOnly use graph relationship traversal
HYBRIDUse both and merge results (recommended)

Using RAG with Agents

Pass the RAG engine as the memory parameter:

from sagewai import UniversalAgent

agent = UniversalAgent(
    name="rag-agent",
    model="gpt-4o",
    memory=rag,
)

response = await agent.chat("What are the key findings from our Q4 report?")

Episodic Memory

Episodic memory stores structured records of completed tasks: the goal, context used, actions taken, outcome, and lessons. On future similar tasks, the agent retrieves relevant past episodes to inform its approach.

from sagewai.context import EpisodeStore, Episode, InMemoryVectorStore

store = EpisodeStore(vector_store=InMemoryVectorStore())

# Capture a completed task
episode = Episode(
    goal="Audit Q3 financial statements",
    actions_taken=["Retrieved P&L data", "Compared against budget"],
    outcome="Found 3 discrepancies in marketing spend",
    lessons=["Always cross-reference with bank statements"],
    success=True,
)
await store.capture(episode)

# Later: retrieve relevant past experience
similar = await store.retrieve("audit financial statements", top_k=3)

For episodes that survive restarts, use PersistentEpisodeStore, which delegates storage to a ContextEngine instance.


MemoryWriter

MemoryWriter extracts key facts from conversation history on a schedule and stores them in the memory backend. Use it for long-running conversations to prevent important context from being lost during compaction.

from sagewai.core.memory_writer import MemoryWriter

writer = MemoryWriter(
    model="gpt-4o-mini",
    extract_every_n_turns=5,
)

if writer.should_extract(turn_count=10):
    facts = await writer.extract_and_store(messages, memory)

A small, fast model does the extraction — you control how often it runs by setting extract_every_n_turns.


QueryRouter

QueryRouter classifies a query and returns the best retrieval strategy for it:

from sagewai.memory.query_router import QueryRouter

router = QueryRouter()

router.classify("What is quantum computing?")      # "factual" -> VectorMemory
router.classify("Who manages Project X?")           # "relational" -> GraphMemory
router.classify("Team leads working on AI?")        # "hybrid" -> both

RAGEngine uses QueryRouter internally. You can also call it directly to route queries yourself.


Best Practices

  1. Start with in-memory stores for development and testing, then switch to Milvus/NebulaGraph before going to production.

  2. Use the Context Engine when you need document ingestion, scoped access, and lifecycle management. See the Context Engine page for details.

  3. Use RetrievalStrategy.HYBRID when you have both factual documents and relational data. The RAG engine merges results from both backends using reciprocal rank fusion.

  4. Set top_k carefully — too few results may miss relevant context; too many may push past the LLM's context window. Start with 5 and adjust based on output quality.

  5. Use MemoryWriter for long-running conversations to keep important facts in scope even after context compaction.

  6. Enable episodic memory for agents that run the same types of tasks repeatedly. Past outcomes inform future runs without any additional training.


What's Next

  • Context Engine — Production-grade document ingestion, scoped access, and multi-strategy retrieval
  • Directives — Inline @context and @memory directives for prompt-level retrieval
  • Safety — HallucinationGuard validates responses against RAG context