Token economics and cost control — first-class
Every prompt, every response, every embedding — counted, attributed, and capped. Set token budgets per project, team, user, or API key, in any time window. The same three-tier permission model (System / Project / Team) that owns REST quotas owns AI spend, so the people who own the workload also own the bill.
- Live token tracking — input, output, cached, and total — per request
- Per-window ceilings: minute, hour, day, month, or contract period
- Per-scope ceilings: user, API key, team, project, or model class
- Hard caps, soft caps with warnings, and burst windows for short spikes
- Cost attribution back to a project or cost center — finance gets a line item, not a mystery
- Auto fall-back to a cheaper model or a cached answer when the budget tips
- Tracked per request
- input · output · cached · total tokens
- Quota windows
- minute · hour · day · month · custom
- Quota scopes
- user · API key · team · project · model
- Enforcement
- hard cap · soft warning · graceful fallback
- Reporting
- cost by project · model · team · time range