LiteLLM Proxy Integration
This guide shows you how to run LiteLLM Proxy as a shared gateway and point Sagewai agents at it. Every LLM call from Claude Code, Cursor, and your agents flows through a single endpoint. You get unified spend tracking, model access control, and rate limiting without changing any application code.
Prerequisites: Docker or Podman installed, Sagewai SDK installed, at least one provider API key.
Architecture
+----------------------------------------------------+
| Your Organization |
| |
| +---------+ +---------+ +------------------+ |
| | Claude | | Cursor | | Sagewai Agents | |
| | Code | | | | (orchestration) | |
| +----+----+ +----+----+ +--------+---------+ |
| | | | |
| +------------+----------------+ |
| | |
| +---------v----------+ |
| | LiteLLM Proxy | <-- Virtual Keys |
| | (AI Gateway) | Budget Limits |
| | | Model Routing |
| +---------+----------+ |
| | |
+--------------------+--------------------------------+
|
+-----------+-----------+
v v v
+---------+ +---------+ +---------+
| OpenAI | | Claude | | Gemini |
| GPT-4o | | Sonnet | | Flash |
+---------+ +---------+ +---------+
Sagewai uses LiteLLM as its inference backend via litellm.acompletion. Pointing agents at a LiteLLM Proxy requires only setting api_base — no code changes beyond that.
Quick start
1. Deploy LiteLLM Proxy
# docker-compose.litellm.yml
services:
litellm:
image: ghcr.io/berriai/litellm:main-latest
ports:
- "4000:4000"
volumes:
- ./litellm-config.yaml:/app/config.yaml
environment:
LITELLM_MASTER_KEY: sk-your-master-key
DATABASE_URL: postgresql://postgres:secret@postgres:5432/litellm
command: --config /app/config.yaml
# litellm-config.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-sonnet
litellm_params:
model: claude-sonnet-4-5-20250929
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: gemini-flash
litellm_params:
model: gemini/gemini-2.0-flash
api_key: os.environ/GOOGLE_API_KEY
general_settings:
master_key: sk-your-master-key
2. Point Sagewai at the proxy
Set api_base on any UniversalAgent to redirect its LLM calls through the proxy:
from sagewai import UniversalAgent, AgentConfig, InferenceConfig
agent = UniversalAgent(
name="my-agent",
config=AgentConfig(
inference=InferenceConfig(
model="gpt-4o",
api_base="http://litellm:4000", # proxy URL
api_key="sk-your-master-key", # proxy key
)
),
)
To apply this to all agents without modifying each one, set environment variables instead:
LITELLM_PROXY_URL=http://litellm:4000
LITELLM_API_KEY=sk-your-master-key
3. Verify connectivity
Use LiteLLMProxyClient to check the proxy is reachable and to see which models are registered:
from sagewai.integrations import LiteLLMProxyClient
client = LiteLLMProxyClient(
proxy_url="http://litellm:4000",
api_key="sk-your-master-key",
)
health = await client.health_check()
# {"healthy": True, "status": 200}
models = await client.list_models()
for m in models:
print(f"{m.model_name} ({m.provider}) - max {m.max_tokens} tokens")
Virtual keys per project
Give each Sagewai project its own LiteLLM virtual key to track spend separately and restrict which models the project can reach.
Create a virtual key on the proxy:
curl -X POST http://litellm:4000/key/generate \
-H "Authorization: Bearer sk-your-master-key" \
-H "Content-Type: application/json" \
-d '{
"models": ["gpt-4o", "claude-sonnet"],
"max_budget": 50.0,
"budget_duration": "30d",
"metadata": {"project": "research-team"}
}'
Configure the project's agents to use that key:
agent = UniversalAgent(
name="research-agent",
config=AgentConfig(
inference=InferenceConfig(
model="gpt-4o",
api_base="http://litellm:4000",
api_key="sk-project-virtual-key", # project-scoped key
)
),
)
Full docker-compose example
A complete stack with Sagewai admin, LiteLLM Proxy, and PostgreSQL:
# docker-compose.yml
services:
postgres:
image: postgres:16
environment:
POSTGRES_DB: sagewai
POSTGRES_PASSWORD: secret
volumes:
- pgdata:/var/lib/postgresql/data
litellm:
image: ghcr.io/berriai/litellm:main-latest
ports:
- "4000:4000"
depends_on:
- postgres
volumes:
- ./litellm-config.yaml:/app/config.yaml
environment:
LITELLM_MASTER_KEY: sk-master-key
DATABASE_URL: postgresql://postgres:secret@postgres:5432/litellm
OPENAI_API_KEY: ${OPENAI_API_KEY}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
GOOGLE_API_KEY: ${GOOGLE_API_KEY}
command: --config /app/config.yaml
sagewai-admin:
image: sagewai/admin:latest
ports:
- "8000:8000"
depends_on:
- postgres
- litellm
environment:
DATABASE_URL: postgresql://postgres:secret@postgres:5432/sagewai
LITELLM_PROXY_URL: http://litellm:4000
LITELLM_API_KEY: sk-master-key
sagewai-worker:
image: sagewai/worker:latest
depends_on:
- sagewai-admin
environment:
FLEET_GATEWAY_URL: http://sagewai-admin:8000
ENROLLMENT_KEY: ${ENROLLMENT_KEY}
WORKER_MODELS: gpt-4o,claude-sonnet,gemini-flash
volumes:
pgdata:
Start with:
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
docker compose up -d
Spend monitoring
Retrieve current spend from the proxy using LiteLLMProxyClient:
from sagewai.integrations import LiteLLMProxyClient
client = LiteLLMProxyClient(
proxy_url="http://litellm:4000",
api_key="sk-your-master-key",
)
spend = await client.get_spend()
LiteLLM Proxy tracks spend per virtual key automatically. Combined with Sagewai's check_budget() / record_spend() in BaseAgent, you have two independent layers of cost control.
Model routing strategies
Configure how LiteLLM Proxy distributes requests across providers:
| Strategy | Behavior |
|---|---|
simple-shuffle | Round-robin across provider deployments |
least-busy | Routes to the deployment with the fewest in-flight requests |
latency-based-routing | Routes to the fastest-responding deployment |
cost-based-routing | Routes to the cheapest available deployment |
Set the strategy in litellm-config.yaml:
router_settings:
routing_strategy: cost-based-routing
num_retries: 3
retry_after: 5
allowed_fails: 2
Note: LiteLLM routing and Sagewai fleet routing operate at different levels. LiteLLM decides which LLM provider deployment to call; Sagewai fleet decides which worker machine runs the agent.
Budget governance: two layers
| Layer | Scope | Enforcement |
|---|---|---|
| LiteLLM Proxy | Per virtual key | Hard — proxy rejects requests when budget is exhausted |
| Sagewai SDK | Per project | Soft — check_budget() can warn, throttle, or stop agents |
Set the LiteLLM virtual key budget about 20% above the Sagewai project budget. The SDK provides graceful degradation (warnings, throttling); the proxy acts as a hard ceiling.
# Sagewai budget: $80/month per project
# LiteLLM key budget: $100/month (safety margin)
Fleet workers with LiteLLM
Workers running on your own infrastructure can also route through the proxy. Set api_base in worker credentials so that workers in different pools use different virtual keys:
from sagewai import WorkflowWorker
worker = WorkflowWorker(
project_id="production",
pool="gpu-workers",
labels={"env": "production"},
models=["gpt-4o", "claude-sonnet"],
gateway_url="https://admin.internal:8000",
enrollment_key="wrt-1.eyJ...",
credentials={
"inference_overrides": {
"api_base": "http://litellm:4000",
"api_key": "sk-worker-pool-key",
}
},
)
await worker.start()
inference_overrides are injected via a ContextVar into every UniversalAgent that runs on the worker, so all LLM calls flow through the proxy without any per-agent configuration.
Security considerations
- API keys encrypted at rest: Sagewai stores credentials with Fernet encryption in PostgreSQL. LiteLLM stores provider keys server-side; virtual keys never expose the underlying provider key to callers.
- SSRF protection: The SDK validates proxy URLs before use.
- Network isolation: In production, put the LiteLLM proxy inside your internal network. Agents and workers connect to it; it does not need a public address.
- Virtual key scoping: Each project key has an explicit model allowlist and budget cap. A compromised key cannot reach models outside its scope.
- Master key management: Store the LiteLLM master key in a secrets manager (GCP Secret Manager, AWS Secrets Manager, HashiCorp Vault) and inject it via environment variable.
Troubleshooting
Agents return connection errors
Check that api_base points to the correct proxy URL and port. From the agent's container:
curl http://litellm:4000/health
Models not showing up
Model names in litellm-config.yaml must match what you pass to UniversalAgent. Run LiteLLMProxyClient.list_models() to see exactly what names the proxy exposes.
Spend data is empty
LiteLLM requires a PostgreSQL database to persist spend logs. Confirm DATABASE_URL is set in the LiteLLM container and the database is reachable.
Virtual key budget exceeded
The proxy returns HTTP 429 when a key's budget is exhausted. Increase the key's budget via the LiteLLM admin UI or API, or issue a new key with a higher limit.