Memory & RAG
Memory gives agents access to information beyond the current conversation. Sagewai layers four mechanisms: conversation history for short-term context, vector and graph stores for long-term retrieval, a hybrid RAG engine that combines both, and the Context Engine for production document management with scoped access.
Prerequisites: Agents · Next: Context Engine · Directives
Architecture Overview
User Query
|
v
ContextEngine / RAGEngine (orchestrator)
|
+---> Vector search (semantic similarity)
+---> BM25 search (keyword matching)
+---> Graph search (relationship traversal)
|
v
Reciprocal Rank Fusion (merge results)
|
v
Optional re-ranking (cross-encoder)
|
v
Retrieved context injected into agent messages
Memory integrates directly into BaseAgent. When you set the memory parameter, the agent retrieves relevant context and injects it before each LLM call — no extra code on your side.
ConversationManager
For multi-turn conversations where the agent needs to remember previous exchanges, ConversationManager handles state automatically:
from sagewai import UniversalAgent
from sagewai.core.conversation import ConversationManager
agent = UniversalAgent(name="tutor", model="gpt-4o")
manager = ConversationManager(agent=agent)
await manager.send("What is 2 + 2?") # "4"
await manager.send("Now multiply that by 3") # "12" — remembers context
If you prefer to manage message history yourself, use chat_with_history() directly:
from sagewai import UniversalAgent, ChatMessage
agent = UniversalAgent(name="tutor", model="gpt-4o")
messages = [
ChatMessage.system("You are a helpful math tutor."),
ChatMessage.user("What is 2 + 2?"),
]
response = await agent.chat_with_history(messages)
messages.append(response)
messages.append(ChatMessage.user("Now multiply that by 3"))
response = await agent.chat_with_history(messages)
VectorMemory
Semantic similarity search over embedding vectors. Use it for retrieving documents, passages, and factual knowledge.
In-Memory (Development)
from sagewai.memory.vector import VectorMemory
memory = VectorMemory()
await memory.store("doc-1", "Quantum computing uses qubits instead of classical bits.")
await memory.store("doc-2", "Machine learning models learn patterns from data.")
results = await memory.retrieve("How do quantum computers work?")
MilvusVectorMemory (Production)
For production workloads, use Milvus:
from sagewai.memory.milvus import MilvusVectorMemory
memory = MilvusVectorMemory(
collection="knowledge_base",
uri="http://localhost:19530",
embedding_model="text-embedding-3-small",
dimension=1536,
top_k=5,
)
await memory.initialize()
await memory.store("doc-1", "Quantum computing uses qubits...")
results = await memory.retrieve("quantum computing basics", top_k=3)
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
collection | str | required | Milvus collection name |
uri | str | "http://localhost:19530" | Milvus server URI |
embedding_model | str | "text-embedding-3-small" | Embedding model name |
dimension | int | 1536 | Embedding vector dimension |
top_k | int | 5 | Default number of results |
GraphMemory
Knowledge graph storage for relational data. Use it when queries ask about entity relationships, hierarchies, or structured knowledge rather than semantic similarity.
In-Memory (Development)
from sagewai.memory.graph import GraphMemory
memory = GraphMemory()
await memory.store_relation("Alice", "works_at", "Acme Corp")
await memory.store_relation("Acme Corp", "located_in", "Berlin")
await memory.store_relation("Bob", "works_at", "Acme Corp")
results = await memory.retrieve("Who works at Acme Corp?")
NebulaGraphMemory (Production)
For production, use NebulaGraph for persistent storage with temporal fact tracking:
from sagewai.memory.nebula import NebulaGraphMemory
memory = NebulaGraphMemory(
space="knowledge",
hosts="127.0.0.1:9669",
user="root",
password="nebula",
)
await memory.initialize()
await memory.store_relation("Alice", "manages", "Project X")
results = await memory.retrieve("What does Alice manage?")
NebulaGraph tracks facts over time using valid_from and superseded_at timestamps — so you can store and query the history of a relationship, not just its current value.
RAGEngine
RAGEngine orchestrates both vector and graph backends for hybrid retrieval. Its QueryRouter classifies each incoming query and routes it to the right backend — or both.
from sagewai.memory.rag import RAGEngine, RetrievalStrategy
from sagewai.memory.milvus import MilvusVectorMemory
from sagewai.memory.nebula import NebulaGraphMemory
rag = RAGEngine(
vector=MilvusVectorMemory(collection="articles"),
graph=NebulaGraphMemory(space="knowledge"),
strategy=RetrievalStrategy.HYBRID,
)
Retrieval Strategies
| Strategy | Behavior |
|---|---|
VECTOR_ONLY | Only use vector similarity search |
GRAPH_ONLY | Only use graph relationship traversal |
HYBRID | Use both and merge results (recommended) |
Using RAG with Agents
Pass the RAG engine as the memory parameter:
from sagewai import UniversalAgent
agent = UniversalAgent(
name="rag-agent",
model="gpt-4o",
memory=rag,
)
response = await agent.chat("What are the key findings from our Q4 report?")
Episodic Memory
Episodic memory stores structured records of completed tasks: the goal, context used, actions taken, outcome, and lessons. On future similar tasks, the agent retrieves relevant past episodes to inform its approach.
from sagewai.context import EpisodeStore, Episode, InMemoryVectorStore
store = EpisodeStore(vector_store=InMemoryVectorStore())
# Capture a completed task
episode = Episode(
goal="Audit Q3 financial statements",
actions_taken=["Retrieved P&L data", "Compared against budget"],
outcome="Found 3 discrepancies in marketing spend",
lessons=["Always cross-reference with bank statements"],
success=True,
)
await store.capture(episode)
# Later: retrieve relevant past experience
similar = await store.retrieve("audit financial statements", top_k=3)
For episodes that survive restarts, use PersistentEpisodeStore, which delegates storage to a ContextEngine instance.
MemoryWriter
MemoryWriter extracts key facts from conversation history on a schedule and stores them in the memory backend. Use it for long-running conversations to prevent important context from being lost during compaction.
from sagewai.core.memory_writer import MemoryWriter
writer = MemoryWriter(
model="gpt-4o-mini",
extract_every_n_turns=5,
)
if writer.should_extract(turn_count=10):
facts = await writer.extract_and_store(messages, memory)
A small, fast model does the extraction — you control how often it runs by setting extract_every_n_turns.
QueryRouter
QueryRouter classifies a query and returns the best retrieval strategy for it:
from sagewai.memory.query_router import QueryRouter
router = QueryRouter()
router.classify("What is quantum computing?") # "factual" -> VectorMemory
router.classify("Who manages Project X?") # "relational" -> GraphMemory
router.classify("Team leads working on AI?") # "hybrid" -> both
RAGEngine uses QueryRouter internally. You can also call it directly to route queries yourself.
Best Practices
-
Start with in-memory stores for development and testing, then switch to Milvus/NebulaGraph before going to production.
-
Use the Context Engine when you need document ingestion, scoped access, and lifecycle management. See the Context Engine page for details.
-
Use
RetrievalStrategy.HYBRIDwhen you have both factual documents and relational data. The RAG engine merges results from both backends using reciprocal rank fusion. -
Set
top_kcarefully — too few results may miss relevant context; too many may push past the LLM's context window. Start with 5 and adjust based on output quality. -
Use
MemoryWriterfor long-running conversations to keep important facts in scope even after context compaction. -
Enable episodic memory for agents that run the same types of tasks repeatedly. Past outcomes inform future runs without any additional training.
What's Next
- Context Engine — Production-grade document ingestion, scoped access, and multi-strategy retrieval
- Directives — Inline
@contextand@memorydirectives for prompt-level retrieval - Safety — HallucinationGuard validates responses against RAG context