Fleet Architecture

A deep-dive into how the Sagewai Fleet system is designed — enrollment flow, task dispatch, heartbeat management, security layers, multi-project isolation, and production deployment.

For the operator how-to (deploy workers, enrollment steps, pool configuration), see Fleet Deployment. For a conceptual overview, see Fleet.


System overview

┌─────────────────────────────────────────────────────────────────┐
│                        Sagewai Cloud                            │
│                                                                 │
│  ┌──────────────┐    ┌─────────────────┐    ┌───────────────┐  │
│  │ Admin Panel  │    │  Fleet REST API  │    │  Task Queue   │  │
│  │ (approval,   │◄──►│  /api/v1/fleet/  │◄──►│ (Postgres     │  │
│  │  monitoring) │    │                 │    │  SKIP LOCKED) │  │
│  └──────────────┘    └────────┬────────┘    └───────┬───────┘  │
│                               │                     │          │
│                    ┌──────────▼──────────┐           │          │
│                    │   FleetRegistry     │           │          │
│                    │  (worker state,     │           │          │
│                    │   enrollment keys,  │           │          │
│                    │   approval FSM)     │           │          │
│                    └──────────┬──────────┘           │          │
│                               │                     │          │
│                    ┌──────────▼──────────┐  ┌────────▼───────┐ │
│                    │   WRTTokenManager   │  │ FleetDispatcher│ │
│                    │  (JWT issuance,     │  │ (long-poll,    │ │
│                    │   JTI revocation)   │  │  encryption,   │ │
│                    └─────────────────────┘  │  audit)        │ │
│                                             └────────┬───────┘ │
└──────────────────────────────────────────────────────┼─────────┘
                                                       │ HTTPS / WRT
                              ─────────────────────────┤
                             │                         │
               ┌─────────────▼──────┐    ┌─────────────▼──────┐
               │   Worker A          │    │   Worker B          │
               │   pool: gpu-prod    │    │   pool: default     │
               │   models: llama3    │    │   models: gpt-4o    │
               │   labels: gpu=true  │    │   labels: env=prod  │
               │   (your data center)│    │   (your K8s cluster)│
               └─────────────────────┘    └─────────────────────┘

Component reference

ComponentLocationResponsibility
FleetRegistrysagewai/fleet/registry.pyWorker registration, approval state machine, enrollment key CRUD (SHA-256 hashed keys)
WRTTokenManagersagewai/fleet/tokens.pyIssuing and verifying Worker Registration Tokens (JWT, wrt-1. prefix, JTI revocation list)
FleetDispatchersagewai/fleet/dispatcher.pyLong-poll task claiming, Fernet payload encryption, dispatch audit events
FleetAuditEventsagewai/fleet/audit.pyAppend-only audit trail with 13 event types (registered, approved, revoked, task_claimed, etc.)
FleetAnomalyDetectorsagewai/fleet/anomaly.pyDetects rate anomalies, excessive failures, heartbeat timeouts, model mismatches; auto-revokes
LLMHealthProbesagewai/fleet/probe.pyProbes Ollama and OpenAI-compatible endpoints to verify model availability
MTLSVerifiersagewai/fleet/mtls.pyMutual TLS verification for Enterprise tier (planned)
ModelNormalizersagewai/fleet/models.pyNormalizes model name variants (e.g. gpt4o, gpt-4o, openai/gpt-4o → canonical form)

Enrollment flow

Worker starts
    │
    ▼
Read ENROLLMENT_KEY env var (wrt-1.eyJ...)
    │
    ▼
POST /api/v1/fleet/register
  body: { pool, labels, models, capabilities }
    │
    ▼
FleetRegistry.register_worker()
  → store worker record with status=PENDING
  → emit WORKER_REGISTERED audit event
    │
    ▼
Admin sees worker in panel (status: PENDING)
    │
    ▼
Admin approves → PATCH /api/v1/fleet/workers/{id}/approve
  → FleetRegistry transitions: PENDING → APPROVED
  → WRTTokenManager issues short-lived task token
  → emit WORKER_APPROVED audit event
    │
    ▼
Worker polls for tasks (approved status required)

Approval states: PENDINGAPPROVEDREVOKED (one-way; revoked workers cannot re-register with the same key).


Task dispatch flow

Workflow step ready for execution
    │
    ▼
FleetDispatcher.claim_task()
  SELECT ... FOR UPDATE SKIP LOCKED
  WHERE pool = $target_pool
    AND labels @> $target_labels      (JSONB containment)
    AND models_canonical @> $model    (GIN index on TEXT[])
    │
    ▼
Task payload encrypted with org's Fernet key
  → only workers in that org can decrypt
    │
    ▼
Worker receives encrypted task over HTTPS long-poll
  → decrypts payload
  → injects worker credentials into ContextVar
  → UniversalAgent reads credentials, overrides model/api_key
    │
    ▼
Worker executes task, reports result
  → POST /api/v1/fleet/tasks/{id}/complete
  → emit TASK_COMPLETED audit event
    │
    ▼
WorkflowMonitor marks step complete, continues workflow

Three dimensions of matching

DimensionHow It Works
ModelTask requires gpt-4o → only workers declaring gpt-4o can claim it
PoolTask targets gpu pool → only workers in the gpu pool see it
LabelsTask requires {region: eu-west} → only workers with that label match

Model normalization

openai/gpt-4o, gpt-4o, and GPT-4o are all treated as the same model. The normalizer strips provider prefixes, lowercases, and replaces colons with hyphens.


Heartbeat flow

Workers send a heartbeat every 30 seconds to signal liveness:

Worker heartbeat loop
    │
    ▼
POST /api/v1/fleet/workers/{id}/heartbeat
  body: { timestamp, tasks_completed, tasks_failed }
    │
    ▼
FleetRegistry updates last_heartbeat_at
    │
    ▼
FleetAnomalyDetector checks (runs every 60s):
  - missed_heartbeats > 3 → HEARTBEAT_TIMEOUT anomaly
  - failure_rate > threshold → EXCESSIVE_FAILURES anomaly
  - model mismatch → MODEL_MISMATCH anomaly
  - request_rate spike → RATE_ANOMALY anomaly
    │
    ▼
If 2+ anomaly types → auto-revoke worker
  → emit WORKER_REVOKED audit event
  → admin alerted via notification channel

Security layers

Layer 1: Enrollment token (WRT)

Each enrollment key is single-use or time-limited. The stored key is SHA-256 hashed; the plaintext is only shown once at creation time.

# Key structure (JWT claims)
{
  "sub": "worker-<uuid>",
  "iss": "sagewai-fleet",
  "jti": "<unique token id>",   # added to revocation list on revoke
  "scopes": ["worker:register", "worker:heartbeat", "worker:claim"],
  "pool": "gpu-prod",
  "org_id": "acme",
  "exp": 1750000000
}

Layer 2: Payload encryption

Task payloads (prompts, tool inputs, agent configs) are encrypted with a per-org Fernet key before leaving the gateway. Workers in other organizations cannot decrypt tasks even if they intercept the traffic.

Layer 3: Approval gate

No worker receives tasks until an admin explicitly approves it. The approval is logged in the audit trail with the approver's identity and timestamp.

Layer 4: Anomaly detection

The FleetAnomalyDetector runs continuously. Any worker exhibiting suspicious patterns (sudden rate spike, repeated failures, claiming tasks for models it doesn't have) is automatically revoked pending investigation.

Auto-alerts fire on:

  • More than 60 claims per minute (possible bot)
  • More than 10 failures per hour (unhealthy worker)
  • Missed heartbeats for 5+ minutes (crashed worker)
  • Model mismatches (worker claiming tasks it can't serve)

Layer 5: Per-worker credential injection

Different workers can use different LLM backends. Credentials are injected via ContextVar at execution time — never stored in the database, never sent to the server.

from sagewai import WorkerCredentials

# GPU worker in EU: uses local Ollama
gpu_creds = WorkerCredentials(
    model_overrides={"default": "ollama/llama3.1:70b"},
    inference_overrides={"api_base": "http://localhost:11434"},
)

# Cloud worker in US: uses OpenAI
cloud_creds = WorkerCredentials(
    model_overrides={"default": "gpt-4o"},
    inference_overrides={"api_key": "sk-..."},
)

Layer 6: mTLS (Enterprise, planned)

Workers will present client certificates issued by the org's CA. The gateway verifies both the WRT token and the client certificate. Revoked certificates are propagated via OCSP.


Multi-project isolation

Every operation in Sagewai is scoped to a project. Each project gets its own namespace, quotas, and data isolation.

from sagewai import ProjectContext

async with ProjectContext(
    project_id="team-marketing",
    max_tokens_per_minute=100_000,
    max_requests_per_minute=50,
    max_cost_per_day_usd=30.0,
):
    # All agent runs, memory queries, and budget checks
    # are automatically scoped to "team-marketing"
    response = await agent.chat("Generate Q4 campaign ideas")

Team A (marketing, $30/day, cpu pool) and Team B (engineering, $200/day, gpu pool) run on the same server. Each team's spend, agents, memory, and workflow runs are fully isolated.

Per-project quotas

QuotaDescriptionEnforcement
max_tokens_per_minuteToken throughput limit60-second sliding window
max_requests_per_minuteRequest rate limit60-second sliding window
max_cost_per_day_usdDaily spend capResets at midnight UTC

Quotas are enforced in the ProjectContext before every LLM call.


Server setup

The server container runs the complete Sagewai platform:

  • Gateway API — Fleet task dispatch, webhook triggers, OpenAI-compatible endpoint
  • Admin Console — Project management, analytics, budget enforcement
  • Fleet Registry — Worker enrollment, approval, health monitoring
  • Workflow Supervisor — Stale run detection and recovery (5-min heartbeat timeout)

Compose spec (container-runtime agnostic)

services:
  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: sagewai
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports: ["5432:5432"]
    volumes: ["postgres_data:/var/lib/postgresql/data"]
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "postgres"]

  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]

  sagewai-server:
    image: sagewai/server:latest
    ports: ["8000:8000"]
    environment:
      DATABASE_URL: postgresql://postgres:${POSTGRES_PASSWORD}@postgres:5432/sagewai
      REDIS_URL: redis://redis:6379
      JWT_SECRET: ${JWT_SECRET}
      SAGEWAI_ENCRYPTION_KEY: ${ENCRYPTION_KEY}
    depends_on:
      postgres: { condition: service_healthy }
      redis: { condition: service_healthy }
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/api/v1/health"]
    deploy:
      resources:
        limits: { memory: 2G }

volumes:
  postgres_data:

Container runtime support

RuntimeCommandNotes
Docker Compose V2docker compose up -dDefault on modern Docker Desktop
Podman Composepodman-compose up -dRootless, daemonless
nerdctl (containerd)nerdctl compose up -dLightweight
KubernetesSee K8s section belowProduction-grade

Production Kubernetes deployment

Server deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sagewai-server
spec:
  replicas: 2
  selector:
    matchLabels: { app: sagewai-server }
  template:
    metadata:
      labels: { app: sagewai-server }
    spec:
      containers:
      - name: server
        image: sagewai/server:latest
        ports:
        - containerPort: 8000
        envFrom:
        - secretRef: { name: sagewai-secrets }
        resources:
          requests: { memory: "1Gi", cpu: "500m" }
          limits: { memory: "2Gi", cpu: "2000m" }
        livenessProbe:
          httpGet: { path: /api/v1/health, port: 8000 }
          initialDelaySeconds: 10
        readinessProbe:
          httpGet: { path: /api/v1/health, port: 8000 }
          initialDelaySeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: sagewai-server
spec:
  selector: { app: sagewai-server }
  ports:
  - port: 8000
    targetPort: 8000

Terraform module

module "sagewai_fleet" {
  source = "sagewai/fleet/kubernetes"

  server_replicas = 2
  server_image    = "sagewai/server:latest"

  worker_pools = {
    cpu = {
      replicas      = 3
      models        = ["gpt-4o", "claude-sonnet-4"]
      node_selector = {}
      resources     = { memory = "2Gi", cpu = "2" }
    }
    gpu = {
      replicas      = 0  # DaemonSet on GPU nodes instead
      models        = ["ollama/llama3.1:70b"]
      node_selector = { "nvidia.com/gpu" = "true" }
      resources     = { memory = "8Gi", gpu = 1 }
    }
  }

  database_url   = var.database_url
  redis_url      = var.redis_url
  encryption_key = var.encryption_key
}

Pulumi (TypeScript)

import * as k8s from "@pulumi/kubernetes";

const server = new k8s.apps.v1.Deployment("sagewai-server", {
  spec: {
    replicas: 2,
    selector: { matchLabels: { app: "sagewai-server" } },
    template: {
      metadata: { labels: { app: "sagewai-server" } },
      spec: {
        containers: [{
          name: "server",
          image: "sagewai/server:latest",
          ports: [{ containerPort: 8000 }],
          envFrom: [{ secretRef: { name: "sagewai-secrets" } }],
          resources: {
            requests: { memory: "1Gi", cpu: "500m" },
            limits: { memory: "2Gi", cpu: "2000m" },
          },
        }],
      },
    },
  },
});

const cpuWorkers = new k8s.apps.v1.Deployment("sagewai-cpu-workers", {
  spec: {
    replicas: 3,
    selector: { matchLabels: { app: "sagewai-worker", pool: "cpu" } },
    template: {
      metadata: { labels: { app: "sagewai-worker", pool: "cpu" } },
      spec: {
        containers: [{
          name: "worker",
          image: "sagewai/worker:latest",
          env: [
            { name: "WORKER_POOL", value: "cpu" },
            { name: "WORKER_MODELS", value: "gpt-4o,claude-sonnet-4" },
          ],
          envFrom: [{ secretRef: { name: "sagewai-worker-secrets" } }],
        }],
      },
    },
  },
});

Database schema

Three tables support the fleet system (migration 005_fleet.py):

-- Worker registry
workers (
  id, org_id, pool, labels JSONB, models TEXT[], models_canonical TEXT[],
  status, enrollment_key_id, last_heartbeat_at, capabilities JSONB,
  target_pool, target_labels JSONB, target_worker_id    -- routing columns (migration 004)
)

-- Enrollment keys
enrollment_keys (
  id, org_id, key_hash TEXT, pool, labels JSONB, expires_at,
  created_by, used_at, revoked_at
)

-- Audit trail
fleet_audit_events (
  id, org_id, worker_id, event_type, actor_id, metadata JSONB, created_at
)

The models_canonical column uses a GIN index for efficient @> containment queries during task routing.


Example: multi-team deployment

Acme Corp runs three teams on one Sagewai server:

TeamProjectPoolBudgetModelsUse Case
Marketingmktcpu-fast$30/dayGPT-4o-miniContent generation agents
Engineeringenggpu-local$0/dayLlama 3.1 70B (Ollama)Code review agents
Researchresearchcpu-smart$100/dayClaude SonnetAnalysis and reasoning agents
  1. Ops deploys 1 server + 5 workers (2 cpu-fast, 1 gpu-local, 2 cpu-smart)
  2. Each team gets a scoped enrollment key for their pool
  3. Workers auto-register and start claiming tasks
  4. Marketing submits a content workflow → routed to cpu-fast → GPT-4o-mini
  5. Engineering submits a code review → routed to gpu-local → Llama 3.1 70B ($0)
  6. Research submits analysis → routed to cpu-smart → Claude Sonnet
  7. Each team's spend is tracked independently; engineering runs for free

Three teams, fully isolated, with automatic routing and cost control. No team can exhaust another team's budget.


See also

  • Fleet — conceptual overview
  • Fleet Deployment — step-by-step worker setup, enrollment, and monitoring