LiteLLM Proxy Integration

This guide shows you how to run LiteLLM Proxy as a shared gateway and point Sagewai agents at it. Every LLM call from Claude Code, Cursor, and your agents flows through a single endpoint. You get unified spend tracking, model access control, and rate limiting without changing any application code.

Prerequisites: Docker or Podman installed, Sagewai SDK installed, at least one provider API key.

Architecture

Loading diagram...

Sagewai uses LiteLLM as its inference backend via litellm.acompletion. Pointing agents at a LiteLLM Proxy requires only setting api_base — no code changes beyond that.

Quick start

1. Deploy LiteLLM Proxy

# docker-compose.litellm.yml
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"
    volumes:
      - ./litellm-config.yaml:/app/config.yaml
    environment:
      LITELLM_MASTER_KEY: sk-your-master-key
      DATABASE_URL: postgresql://postgres:secret@postgres:5432/litellm
    command: --config /app/config.yaml

# litellm-config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: gpt-4o
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-sonnet
    litellm_params:
      model: claude-sonnet-4-5-20250929
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: gemini-flash
    litellm_params:
      model: gemini/gemini-2.0-flash
      api_key: os.environ/GOOGLE_API_KEY

general_settings:
  master_key: sk-your-master-key

2. Point Sagewai at the proxy

Set api_base on any UniversalAgent to redirect its LLM calls through the proxy:

from sagewai import UniversalAgent, AgentConfig, InferenceConfig

agent = UniversalAgent(
    name="my-agent",
    config=AgentConfig(
        inference=InferenceConfig(
            model="gpt-4o",
            api_base="http://litellm:4000",   # proxy URL
            api_key="sk-your-master-key",       # proxy key
        )
    ),
)

To apply this to all agents without modifying each one, set environment variables instead:

LITELLM_PROXY_URL=http://litellm:4000
LITELLM_API_KEY=sk-your-master-key

3. Verify connectivity

Use LiteLLMProxyClient to check the proxy is reachable and to see which models are registered:

from sagewai.integrations import LiteLLMProxyClient

client = LiteLLMProxyClient(
    proxy_url="http://litellm:4000",
    api_key="sk-your-master-key",
)

health = await client.health_check()
# {"healthy": True, "status": 200}

models = await client.list_models()
for m in models:
    print(f"{m.model_name} ({m.provider}) - max {m.max_tokens} tokens")

Virtual keys per project

Give each Sagewai project its own LiteLLM virtual key to track spend separately and restrict which models the project can reach.

Create a virtual key on the proxy:

curl -X POST http://litellm:4000/key/generate \
  -H "Authorization: Bearer sk-your-master-key" \
  -H "Content-Type: application/json" \
  -d '{
    "models": ["gpt-4o", "claude-sonnet"],
    "max_budget": 50.0,
    "budget_duration": "30d",
    "metadata": {"project": "research-team"}
  }'

Configure the project's agents to use that key:

agent = UniversalAgent(
    name="research-agent",
    config=AgentConfig(
        inference=InferenceConfig(
            model="gpt-4o",
            api_base="http://litellm:4000",
            api_key="sk-project-virtual-key",  # project-scoped key
        )
    ),
)

Full docker-compose example

A complete stack with Sagewai admin, LiteLLM Proxy, and PostgreSQL:

# docker-compose.yml
services:
  postgres:
    image: postgres:16
    environment:
      POSTGRES_DB: sagewai
      POSTGRES_PASSWORD: secret
    volumes:
      - pgdata:/var/lib/postgresql/data

  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    ports:
      - "4000:4000"
    depends_on:
      - postgres
    volumes:
      - ./litellm-config.yaml:/app/config.yaml
    environment:
      LITELLM_MASTER_KEY: sk-master-key
      DATABASE_URL: postgresql://postgres:secret@postgres:5432/litellm
      OPENAI_API_KEY: ${OPENAI_API_KEY}
      ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
      GOOGLE_API_KEY: ${GOOGLE_API_KEY}
    command: --config /app/config.yaml

  sagewai-admin:
    image: sagewai/admin:latest
    ports:
      - "8000:8000"
    depends_on:
      - postgres
      - litellm
    environment:
      DATABASE_URL: postgresql://postgres:secret@postgres:5432/sagewai
      LITELLM_PROXY_URL: http://litellm:4000
      LITELLM_API_KEY: sk-master-key

  sagewai-worker:
    image: sagewai/worker:latest
    depends_on:
      - sagewai-admin
    environment:
      FLEET_GATEWAY_URL: http://sagewai-admin:8000
      ENROLLMENT_KEY: ${ENROLLMENT_KEY}
      WORKER_MODELS: gpt-4o,claude-sonnet,gemini-flash

volumes:
  pgdata:

Start with:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
docker compose up -d

Spend monitoring

Retrieve current spend from the proxy using LiteLLMProxyClient:

from sagewai.integrations import LiteLLMProxyClient

client = LiteLLMProxyClient(
    proxy_url="http://litellm:4000",
    api_key="sk-your-master-key",
)

spend = await client.get_spend()

LiteLLM Proxy tracks spend per virtual key automatically. Combined with Sagewai's check_budget() / record_spend() in BaseAgent, you have two independent layers of cost control.

Model routing strategies

Configure how LiteLLM Proxy distributes requests across providers:

Strategy	Behavior
`simple-shuffle`	Round-robin across provider deployments
`least-busy`	Routes to the deployment with the fewest in-flight requests
`latency-based-routing`	Routes to the fastest-responding deployment
`cost-based-routing`	Routes to the cheapest available deployment

Set the strategy in litellm-config.yaml:

router_settings:
  routing_strategy: cost-based-routing
  num_retries: 3
  retry_after: 5
  allowed_fails: 2

Note: LiteLLM routing and Sagewai fleet routing operate at different levels. LiteLLM decides which LLM provider deployment to call; Sagewai fleet decides which worker machine runs the agent.

Budget governance: two layers

Layer	Scope	Enforcement
LiteLLM Proxy	Per virtual key	Hard — proxy rejects requests when budget is exhausted
Sagewai SDK	Per project	Soft — `check_budget()` can warn, throttle, or stop agents

Set the LiteLLM virtual key budget about 20% above the Sagewai project budget. The SDK provides graceful degradation (warnings, throttling); the proxy acts as a hard ceiling.

# Sagewai budget: $80/month per project
# LiteLLM key budget: $100/month (safety margin)

Fleet workers with LiteLLM

Workers running on your own infrastructure can also route through the proxy. Set api_base in worker credentials so that workers in different pools use different virtual keys:

from sagewai import WorkflowWorker

worker = WorkflowWorker(
    project_id="production",
    pool="gpu-workers",
    labels={"env": "production"},
    models=["gpt-4o", "claude-sonnet"],
    gateway_url="https://admin.internal:8000",
    enrollment_key="wrt-1.eyJ...",
    credentials={
        "inference_overrides": {
            "api_base": "http://litellm:4000",
            "api_key": "sk-worker-pool-key",
        }
    },
)
await worker.start()

inference_overrides are injected via a ContextVar into every UniversalAgent that runs on the worker, so all LLM calls flow through the proxy without any per-agent configuration.

Security considerations

API keys encrypted at rest: Sagewai stores credentials with Fernet encryption in PostgreSQL. LiteLLM stores provider keys server-side; virtual keys never expose the underlying provider key to callers.
SSRF protection: The SDK validates proxy URLs before use.
Network isolation: In production, put the LiteLLM proxy inside your internal network. Agents and workers connect to it; it does not need a public address.
Virtual key scoping: Each project key has an explicit model allowlist and budget cap. A compromised key cannot reach models outside its scope.
Master key management: Store the LiteLLM master key in a secrets manager (GCP Secret Manager, AWS Secrets Manager, HashiCorp Vault) and inject it via environment variable.

Troubleshooting

Agents return connection errors

Check that api_base points to the correct proxy URL and port. From the agent's container:

curl http://litellm:4000/health

Models not showing up

Model names in litellm-config.yaml must match what you pass to UniversalAgent. Run LiteLLMProxyClient.list_models() to see exactly what names the proxy exposes.

Spend data is empty

LiteLLM requires a PostgreSQL database to persist spend logs. Confirm DATABASE_URL is set in the LiteLLM container and the database is reachable.

Virtual key budget exceeded

The proxy returns HTTP 429 when a key's budget is exhausted. Increase the key's budget via the LiteLLM admin UI or API, or issue a new key with a higher limit.