Per-call token accounting
Prompt tokens, completion tokens, total tokens, cost — captured per call. No invoice-driven retrofits at month-end.
AI teams · Cost engineering
Apinizer's AI Gateway counts prompt tokens, completion tokens, cache savings, and fallbacks per call. Every number rolls up by consumer, intent, model, and time — the numerator and denominator of every cost decision.
The problem
Vendor invoices give you a monthly total. The total tells you nothing about which feature, which team, or which prompt drove the spend. Engineers can't optimize what they can't see. Apinizer captures every token at the gateway: prompt and completion, cached and fallback, tagged to consumer, intent, and model — in the same Elasticsearch the rest of the platform uses.
Capabilities
Prompt tokens, completion tokens, total tokens, cost — captured per call. No invoice-driven retrofits at month-end.
Consumer, project, intent, model, region, env — every call tagged. Aggregation is a saved query, not a finance project.
Hard and soft caps per consumer, project, or intent. Burst allowance for spikes; throttle when the cap hits.
What does 'summarize this' actually cost? Sum prompt + completion tokens across a million calls; rank by intent, by team, by model.
See savings explicitly — how much would have been spent without the cache, without smart routing. Justify the platform with the numbers it generates.
Token accounting joins the audit ledger. Finance, engineering, and audit see one truth.
Use cases
Every squad sees its own AI burn in the existing FinOps dashboard. Cost showbacks land monthly; the central platform reclaims its bonus.
100% attributed
'Summarize claim' costs €0.04 per call; 'analyze adverse signal' costs €1.20. Cheap intents on the small model; expensive intents capped and reviewed.
Each directorate has a quarterly cap. Real-time burn dashboards remove all surprise from the close.
Spend €X; cache saved €Y. Marketing presents the platform's ROI to finance with one chart.
Savings visible monthly
Cost-per-intent showed short-form prompts were 90% of volume and 30% of cost. Routing shifted them to a 7B model; cost per call dropped 70%.
Anomaly detector watches token burn vs. baseline. A prompt loop fired at 3am triggered alarm; on-call killed it in 9 minutes.
Every token attributed to the workflow that triggered it. Workflow owners see cost-per-encounter and optimize their prompts directly.
Shift-level quotas. A faulty agent loop would have run up €4k overnight; throttle stopped it at €200; on-call rolled back the deploy.
€4k → €200 saved
Recommended products
Per-call accounting, multi-axis attribution, budgets, quotas — built into every AI call.
Open the AI Gateway pageCost dashboards alongside latency, error rate, and traffic.
Open the Analytics pageAnomaly detection on burn rate; severity-aware alarms on budget thresholds.
Open the Monitoring pageCache savings quantified in the same view as model spend.
Open the Cache pageResources
Per-call counting, multi-axis attribution, dashboards, and budget enforcement.
Where token accounting lives — alongside routing, caching, and firewalls.
Cost and savings dashboards in the same Elasticsearch as everything else.
The executive view of the same problem.
Burn-rate anomaly detection and severity-aware alarms.
Where the cost plane sits in the AI lane.
Explore more
Tokens as units of work
A 30-minute walkthrough — per-call accounting, attribution, budgets, dashboards — on a Kubernetes of your choice.