bigRAG
Deployment

Production

Configuration and best practices for production deployments.

Production hardening checklist

Tick these off before pointing real traffic at a bigRAG deployment:

  • BIGRAG_ENV=prod — enables the startup guard.
  • BIGRAG_MASTER_KEY set to a 32-byte Fernet key (see Encryption at rest).
  • Rotate POSTGRES_PASSWORD from the shipped default (bigrag).
  • BIGRAG_SESSION_COOKIE_SECURE=true and terminate TLS at your reverse proxy (nginx, Caddy, Traefik, Cloudflare).
  • BIGRAG_LOG_LEVEL=info and BIGRAG_LOG_FORMAT=json for production log collection. Local dev.sh also uses info by default for readable progress.
  • BIGRAG_CORS_ORIGINS set to an explicit list of your admin UI / app origins — no *.
  • BIGRAG_TRUSTED_PROXIES set to the CIDRs of your own reverse proxies if bigRAG sits behind nginx, Caddy, Traefik, or a load balancer.
  • Redis with requirepass set and appendonly yes persisted to a mounted volume.
  • Turbopuffer API key and region saved from the admin UI for managed vector and full-text search.
  • Postgres warm-standby replica for failover.
  • Disaster-recovery strategy for Postgres, Turbopuffer exports, Redis, and the local staging volume.
  • Set metadata_schema on collections that accept untrusted metadata so uploads with invalid shape are rejected at the edge.
  • Wire GET /v1/admin/audit into your SIEM / log pipeline.
  • Configure webhooks to your monitoring stack for connector.sync.failed events so connector errors don't sit silently.

Startup safety guard

bigRAG refuses to boot when BIGRAG_ENV=prod is set and any of the following insecure defaults are still active:

  • BIGRAG_SESSION_COOKIE_SECURE is false — set to true so cookies are HTTPS-only.
  • BIGRAG_DATABASE_URL still uses the shipped bigrag:bigrag credentials — rotate the Postgres password.
  • BIGRAG_MASTER_KEY is unset — bigRAG needs a Fernet key to envelope-encrypt provider credentials, embedding-cache rows, and Redis cache payloads. See Encryption at rest.
  • BIGRAG_HOST binds to 0.0.0.0 or :: without BIGRAG_ALLOW_PUBLIC_BIND_IN_PROD=true after you have confirmed the service sits behind TLS and a network boundary.
  • BIGRAG_SESSION_COOKIE_DOMAIN is set while BIGRAG_TRUSTED_PROXIES is empty.

When the guard trips, it logs every violation before exiting so you can fix the whole list in one edit. To run with development defaults (e.g. on a private network behind a VPN), leave BIGRAG_ENV unset or set it to dev.

Production Docker Compose

services:
  bigrag-api:
    image: yoginth/bigrag-api:2026.4.30
    ports:
      - "4000:4000"
    volumes:
      - bigrag_data:/data
    environment:
      BIGRAG_ENV: prod
      BIGRAG_DATABASE_URL: postgres://bigrag:strongpassword@postgres:5432/bigrag
      BIGRAG_REDIS_URL: ${BIGRAG_REDIS_URL}
      BIGRAG_HOST: 0.0.0.0
      BIGRAG_MASTER_KEY: ${BIGRAG_MASTER_KEY}
      BIGRAG_MASTER_KEY_PREVIOUS: '${BIGRAG_MASTER_KEY_PREVIOUS:-[]}'
      BIGRAG_ALLOW_PUBLIC_BIND_IN_PROD: "true"
      BIGRAG_SESSION_COOKIE_SECURE: "true"
      BIGRAG_SESSION_COOKIE_SAMESITE: "lax"
      BIGRAG_CORS_ORIGINS: '["https://admin.example.com"]'
      BIGRAG_TRUSTED_PROXIES: '["10.0.0.0/8"]'
      BIGRAG_LOG_LEVEL: info
      BIGRAG_LOG_FORMAT: json
      BIGRAG_UPLOAD_DIR: /data/uploads
      BIGRAG_WORKER_HEALTHCHECK_KEY: "${BIGRAG_WORKER_HEALTHCHECK_KEY:-bigrag:dramatiq:worker:heartbeat}"
    deploy:
      resources:
        limits:
          memory: 4g
          cpus: "4"
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:4000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  bigrag-worker:
    image: yoginth/bigrag-api:2026.4.30
    command: ["bigrag-worker", "--processes", "1", "--threads", "8"]
    volumes:
      - bigrag_data:/data
    environment:
      BIGRAG_ENV: prod
      BIGRAG_DATABASE_URL: postgres://bigrag:strongpassword@postgres:5432/bigrag
      BIGRAG_REDIS_URL: ${BIGRAG_REDIS_URL}
      BIGRAG_MASTER_KEY: ${BIGRAG_MASTER_KEY}
      BIGRAG_MASTER_KEY_PREVIOUS: '${BIGRAG_MASTER_KEY_PREVIOUS:-[]}'
      BIGRAG_SESSION_COOKIE_SECURE: "true"
      BIGRAG_SESSION_COOKIE_SAMESITE: "lax"
      BIGRAG_CORS_ORIGINS: '["https://admin.example.com"]'
      BIGRAG_TRUSTED_PROXIES: '["10.0.0.0/8"]'
      BIGRAG_LOG_LEVEL: info
      BIGRAG_LOG_FORMAT: json
      BIGRAG_UPLOAD_DIR: /data/uploads
    depends_on:
      bigrag-api:
        condition: service_healthy
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "python -c 'import os, sys, redis; client = redis.Redis.from_url(os.environ.get(\"BIGRAG_REDIS_URL\", \"redis://redis:6379/0\")); key = os.environ.get(\"BIGRAG_WORKER_HEALTHCHECK_KEY\", \"bigrag:dramatiq:worker:heartbeat\"); sys.exit(0 if client.ttl(key) > 0 else 1)'",
        ]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

  bigrag-ui:
    image: yoginth/bigrag-ui:2026.4.30
    ports:
      - "3000:3000"
    environment:
      BIGRAG_URL: https://api.example.com
    depends_on:
      bigrag-api:
        condition: service_healthy

  postgres:
    image: postgres:17
    environment:
      POSTGRES_USER: bigrag
      POSTGRES_PASSWORD: strongpassword
      POSTGRES_DB: bigrag
    volumes:
      - postgres_data:/var/lib/postgresql/data
    deploy:
      resources:
        limits:
          memory: 1g
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U bigrag"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    environment:
      REDIS_PASSWORD: ${REDIS_PASSWORD}
    command:
      - sh
      - -c
      - |
        exec redis-server --appendonly yes --maxmemory 1gb --maxmemory-policy noeviction --requirepass "$$REDIS_PASSWORD"
    volumes:
      - redis_data:/data
    deploy:
      resources:
        limits:
          memory: 1536m
    healthcheck:
      test: ["CMD-SHELL", "redis-cli -a \"$${REDIS_PASSWORD}\" --no-auth-warning ping"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  bigrag_data:
  postgres_data:
  redis_data:

Set REDIS_PASSWORD and a matching BIGRAG_REDIS_URL in the Compose environment. If the password contains @, :, /, ?, #, or other URL-reserved characters, URL-encode it in BIGRAG_REDIS_URL.

# Mode
BIGRAG_ENV=prod                     # refuses to boot with insecure defaults

# Sessions
BIGRAG_SESSION_COOKIE_SECURE=true   # HTTPS-only
BIGRAG_SESSION_COOKIE_SAMESITE=lax      # use none when API and admin UI are cross-site
BIGRAG_CORS_ORIGINS=["https://admin.example.com"]
BIGRAG_TRUSTED_PROXIES=["10.0.0.0/8"]  # your reverse-proxy/load-balancer CIDRs

# Logs
BIGRAG_LOG_LEVEL=info
BIGRAG_LOG_FORMAT=json

# Auth is admin accounts + minted bigrag_sk_ keys.
# Bootstrap with POST /v1/auth/setup, then mint service keys at /v1/admin/api-keys.

# Infrastructure
BIGRAG_DATABASE_URL=postgres://user:pass@host:5432/bigrag?sslmode=require
REDIS_PASSWORD=<strong Redis password>
BIGRAG_REDIS_URL=redis://:<url-encoded Redis password>@redis:6379/0
BIGRAG_MASTER_KEY=<generated Fernet key>
BIGRAG_ALLOW_PUBLIC_BIND_IN_PROD=true  # required when the API image binds 0.0.0.0 in prod

# Performance
BIGRAG_WORKERS=8                    # Match CPU cores
BIGRAG_DB_POOL_MAX=100              # Increase for high concurrency
BIGRAG_EMBEDDING_CONCURRENCY=16     # Parallel embedding requests

# Worker command
bigrag-worker --processes 1 --threads 8

# Storage
BIGRAG_UPLOAD_DIR=/data/uploads

Never commit secrets (API keys or database passwords) to version control. Use a .env file, Docker secrets, or a secrets manager like AWS Secrets Manager or HashiCorp Vault.

Storage

Active ingestion files are staged on the local filesystem:

upload_dir = "./data/uploads"

Logging

API and worker logs are emitted through structlog with stable event names and key-value fields.

What's logged:

  • Request activity: method, path, sanitized query params, selected request headers, client IP, route, endpoint/action context, status, first_byte_ms, and total elapsed_ms — in JSON and debug request logs.
  • Vector-store, webhook, and auth activity.
  • Worker output: Dramatiq process label and PID (e.g. worker=worker-1 pid=12345) so concurrent output can be traced back to the active process.
  • Ingestion and RAG reads as concise one-line progress messages.
  • Every HTTP response includes X-Request-ID. If a client sends that header, bigRAG preserves it; otherwise the API generates one and includes it in request JSON logs and access-log rows.

What's redacted:

  • Raw request bodies, prompts, document content, cookies, authorization headers, and secret-like query params.
  • URL-valued headers such as Referer are logged without URL userinfo, with secret-like query params and fragments stripped.
  • Secrets are redacted before log rendering.

Format and level notes:

  • Text logs use colored output with fixed columns for time, level, logger, event, and fields. Control characters in event and field values are escaped before writing to stdout, so request paths and headers cannot emit terminal control sequences.
  • Dependency loggers (Dramatiq, Alembic, OpenAI, Uvicorn access, HTTPX, Turbopuffer) are kept at warning level so local terminals show bigRAG actions instead of library startup chatter.
  • Text logs at info render each API request as one concise method/path/status/latency line. Internal chunk, cache, provider, job, request, and document identifiers stay out of the default text terminal stream.
  • Set BIGRAG_LOG_FORMAT=json when Docker, Railway, Fluent Bit, or another log shipper will parse stdout.
  • Use BIGRAG_LOG_LEVEL=debug only for short-lived diagnostics.

Basic operations checklist

Use this quick path before deeper debugging:

  1. GET /health — confirms the API process is alive.
  2. GET /health/ready — confirms Postgres, Redis, Turbopuffer search, and the embedding provider are reachable. Dependency errors are category labels such as timeout, unreachable, auth_failed, misconfigured, or unknown.
  3. GET /v1/stats — confirms document counts, queue depth, queue health, and the latest bigrag-worker heartbeat.
  4. The admin UI Health tab shows the same dependency and queue state without opening logs.
  5. If documents stop moving, check worker logs, then inspect queue_health, workers.status, dead_lettered, retrying, and stale_processing.

Troubleshooting

On this page