Usage

GET /v1/usage

Accepts a session cookie or an API key with audit:read. Returns the per-collection resource footprint over a trailing window — useful for quota enforcement, billing, and capacity planning.

GET /v1/status/usage returns the same response shape for admin UI polling. Collection-pinned API keys cannot use the status variant because it aggregates across collections.

Query parameters

Parameter	Type	Default	Notes
`window_days`	integer	`30`	1–365. Sliding window applied to `query_log`. Document totals are lifetime, not windowed.

Response

{
  "window_days": 30,
  "queries_total": 9122,
  "queries_per_day_avg": 304.07,
  "documents_total": 482,
  "chunks_total": 18420,
  "storage_bytes_total": 52428800,
  "embedding_tokens_total": 1820432,
  "embedding_cost_usd_estimate": 0.36,
  "avg_latency_ms": 142.31,
  "timeline": [
    {
      "date": "2026-05-16T00:00:00Z",
      "queries": 230,
      "avg_latency_ms": 138.44
    }
  ],
  "by_collection": [
    {
      "collection": "knowledge_base",
      "documents": 15,
      "chunks": 482,
      "storage_bytes": 52428800,
      "embedding_tokens": 125000,
      "embedding_cost_usd_estimate": 0.0250,
      "queries": 1203,
      "avg_latency_ms": 142.31
    }
  ]
}

Cost estimates use a fixed rate card (text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002, Cohere embed-*-v3.0 variants). Self-hosted openai_compatible endpoints and any model not in the rate card are reported as 0.

Common patterns

Billing join: map by_collection[].collection to your own tenant_id ↔ collection_name table and invoice. See Multi-tenant SaaS.
Quota alert: poll with window_days=1 and alert when a tenant's embedding_tokens exceeds their plan.
Capacity planning: pair with GET /v1/stats for global health alongside this per-collection roll-up.

Query parameters

Response

Common patterns

On this page