bigRAG
Deployment

Encryption at Rest

How bigRAG protects secrets on disk, and what the operator is responsible for.

bigRAG treats "at-rest encryption" as a split responsibility:

  • App-layer envelope encryption (handled by bigRAG) — sensitive credential columns, persistent embedding-cache rows, and Redis cache payloads are Fernet-encrypted before they ever touch disk.
  • Storage-layer encryption (handled by the operator) — document chunks, Turbopuffer vectors and payload attributes, temporary ingestion staging, Redis persistence, and operator-managed exports still rely on the disk, managed service, or object-store encrypting its own data.

Both layers are required for a defensible "encrypted at rest" posture.

What bigRAG encrypts at the app layer

These values are encrypted transparently using a single Fernet master key:

LocationFieldWhat it holds
embedding_presetsapi_keyOpenAI / Cohere / compatible provider key
webhookssecretHMAC signing secret issued to the webhook subscriber
embedding_cachevectorPersistent chunk embeddings reused during ingestion
Redis cachevalue envelopeQuery embeddings, query results, principals, idempotency payloads, and short-lived platform cache values

Ciphertext looks like gAAAAABl... — the Fernet v1 envelope. Decryption is HMAC-authenticated, so a tampered row fails to load rather than returning corrupt data.

User passwords are already Argon2id-hashed and API keys (bigrag_sk_…) are SHA-256-hashed; neither is reversible, so no envelope is needed.

Configuring the master key

bigRAG refuses to start in BIGRAG_ENV=prod without BIGRAG_MASTER_KEY. Generate one:

python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

Hand the value to the process via env var, a secrets file mounted from your secret store, or an entrypoint that pulls from KMS before exec'ing bigRAG. Do not commit it to the repo or to your docker-compose file.

export BIGRAG_MASTER_KEY="<paste-the-output-of-the-fernet-generate-command>"

In BIGRAG_ENV=dev, an unset key leaves secret encryption unavailable and disables the persistent embedding cache. dev.sh ships a fixed dev key so fresh clones start cleanly.

Losing BIGRAG_MASTER_KEY means the encrypted columns above become permanently unreadable. Back it up the same way you back up your Postgres root credentials — offline, split across owners.

What the operator must encrypt

bigRAG delegates the bulk of at-rest protection to your infrastructure. For a production deployment, configure at least the following:

Postgres

  • Self-host: LUKS on the data volume. cryptsetup luksFormat /dev/sdX before initdb.
  • RDS / Aurora: enable Storage encryption with a customer-managed KMS key. Enable the same on every read replica and snapshot target.
  • Postgres exports: pipe pg_dump into a GPG or age-encrypted file, or store it in an encrypted archival system.

Turbopuffer

Turbopuffer stores vectors, full-text fields, payload attributes, and indexes in its managed service.

  • Select the production region deliberately.
  • Keep the API key scoped to the bigRAG service.
  • Save the Turbopuffer API key in the admin UI so it is encrypted in the instance_settings table with BIGRAG_MASTER_KEY.
  • Rotate it on the same cadence as other provider credentials, and restrict outbound network access where your platform allows it.

Redis

Redis holds the ingestion queue, the event bus, idempotency responses, and short-lived platform caches. bigRAG encrypts cache values when BIGRAG_MASTER_KEY is configured, but queue and event-bus payloads still need infrastructure protection.

  • Disable RDB/AOF persistence in prod, or put the persistence directory on an encrypted volume.
  • Use TLS (rediss://) between bigRAG and Redis.

Ingestion staging

The upload directory (default ./data/uploads) is temporary ingestion staging. Use an encrypted local volume because active uploads remain there until ingestion reaches a terminal state.

Rotation

bigRAG supports dual-read key rotation: set the new key as BIGRAG_MASTER_KEY and pass old keys in BIGRAG_MASTER_KEY_PREVIOUS so existing rows can still decrypt while newly written rows use the new key. API-key authentication also checks hashes derived from both current and previous master keys during the rotation window.

Manual process:

  1. Generate $NEW.
  2. Restart with BIGRAG_MASTER_KEY=$NEW and BIGRAG_MASTER_KEY_PREVIOUS=["$OLD"].
  3. Run a one-off rewrite that reads and updates every encrypted credential field, then purge or warm the embedding cache so new rows are written under $NEW.
  4. Remove $OLD from BIGRAG_MASTER_KEY_PREVIOUS after the rewrite is validated.

A built-in bigrag crypto rotate command is not available yet; use a controlled maintenance script for step 3.

What's intentionally not app-layer encrypted

DataWhy
Document chunks (document_chunks.content)Per-query decrypt kills search latency; rely on disk encryption.
Vectors and text payloads in TurbopufferSearch needs raw floats and filterable payloads, so use Turbopuffer API-key controls, tenant filters, network isolation, and managed-service encryption.
Audit log rowsAppend-only. Store encrypted at the disk layer; querying encrypted metadata is prohibitive.
User display names / emailsPII, not credentials. Disk-layer + column-level access control is the right split.

If your compliance regime demands column-level encryption for any of the above, open an issue — it's plausible but non-trivial and needs a performance budget first.

On this page