# Cache

> Local in-pod tier for sub-millisecond hits. Distributed Hazelcast cluster on Kubernetes for cluster-wide truth. Twelve cache scopes, atomic invalidation on redeploy, throttle and quota on the same tier — all first-party, no add-on.

*Cache*

## Two cache tiers. One invalidation contract.

[Request a demo](https://calendly.com/apinizer/15min) · [Read the docs](https://apinizer.com/developers/docs)

**Highlights**

- **Tiers** — Local · Distributed
- **Scopes** — 12 first-party
- **Cluster** — Hazelcast on Kubernetes

---

## Capabilities

### 01 · One cache product. Two tiers. Decided per proxy.

Apinizer ships a true two-tier cache. The local tier lives inside every gateway pod — a hot path that never serializes, never crosses the network, and answers in under a millisecond. The distributed tier is a Hazelcast cluster on Kubernetes — the cluster-wide source of truth that survives a pod restart. Pick the tier on the cache policy: distributed when consistency matters, local when latency is the budget.

- Local tier lives in the gateway pod — no network hop, no serialization
- Distributed tier is a Hazelcast cluster on Kubernetes
- Per-proxy storage type — local or distributed on the same policy screen
- Local tier answers in under a millisecond — backed by an in-pod map
- Distributed tier is cluster-wide truth — survives pod restarts and rolls
- Same cache key contract on both tiers — switch without touching code
- Per-entity local cache sets — API proxies, routing, policy, circuit breaker
- Memory-tuned for the gateway request profile — no GC surprises

**Concepts:** `Local tier` · `Distributed tier` · `Per-proxy choice` · `Sub-ms hit` · `Hazelcast cluster`

### 02 · Pick the key on a screen. Build it from any variable.

The cache key is what decides whether two calls are the same call. Apinizer gives you two strategies on the same policy form — pick the query parameters that count, or build a custom key from any request variable: header, path, body field, claim, environment. Override the proxy default on a single method when one endpoint needs a different shape. HTTP-GET-only, null-value caching, and on-error behaviour stay on the same screen.

- Query-parameter mode — tick which params count, ignore the rest
- Custom-key mode — template built from any request variable
- Variable list spans headers, path, query, body field, claim, environment
- Per-method override — POST /catalog/refresh uses one strategy, GET /catalog another
- HTTP-GET-only toggle on the same form — never accidentally cache a mutation
- Null-value toggle — negative cache is opt-in, not surprise behaviour
- On-error policy — continue and serve stale, or stop and surface the error
- Custom error code and message returned when the cache pipeline trips

**Concepts:** `Query mode` · `Custom template` · `Per-method override` · `GET-only` · `Negative cache`

### 03 · Twelve cache scopes ship with the platform — not three.

Most gateways cache API responses and call it a cache. Apinizer ships twelve first-party scopes: four levels of API response caching, four identity caches the gateway needs to be fast, three counters for throttling and quota, and platform state for circuit breaker, routing, and banners. Every scope is tier-aware — pick local or distributed on the policy screen and the platform places the data correctly.

- API responses cached at proxy-group, proxy, proxy-method, or API-call scope
- OAuth 2.0 access and refresh tokens cached for sub-millisecond lookup
- OAuth 2.0 authorization codes cached for the single-use redirect window
- OIDC sessions shared across pods so SSO survives a pod replacement
- JWT refresh tokens with revoke-on-rotate semantics
- Throttling counters at second and minute granularity — in-memory
- Quota counters at hour, day, and month — backed by MongoDB persistence
- Circuit breaker state, routing tables, and client flow banners cached too

**Concepts:** `4 response scopes` · `OAuth + OIDC` · `JWT refresh` · `Throttle + quota` · `Circuit breaker`

### 04 · Cache invalidation is on the deploy contract, not a follow-up.

Cache invalidation is the hard part. Apinizer makes it atomic. When you redeploy a proxy, one signal fans out into five lanes — local cache, distributed cache, routing tables, marshalled routes, load balancer view — and all five flush in the same window. The first request after deploy is served fresh, not from a half-warm cache. The invalidation endpoint is auth-aware, and you choose the stop-or-continue behaviour when the cache pipeline trips.

- Single redeploy signal fans out to five invalidation lanes in parallel
- Local cache, distributed cache, routing, marshalled routes, load balancer — all atomic
- No half-state — the first request after deploy hits a fully warmed contract
- Invalidation API requires authentication — turn off per policy if needed
- Stop or continue when invalidation fails — operator choice, not default
- Custom error code and message returned when the choice is stop
- On backend error during cache fill — continue and serve stale is one toggle away
- Same contract whether you redeploy from the UI, APIops, or the CI pipeline

**Concepts:** `Atomic flush` · `Five lanes` · `Auth-aware` · `Stop or continue` · `Stale-on-error`

### 05 · Kubernetes-native cluster — managed or remote, your call.

The distributed tier runs as a real Kubernetes deployment. Apinizer can manage it for you — namespace, replicas, CPU, memory, service, ports — or register an existing cluster you already operate. Either way, the platform talks to the cluster through a service inside the namespace, with split-brain quorum, partition-aware membership, and MongoDB-backed persistence for the counters that need to survive a full restart.

- Managed mode — Apinizer provisions namespace, deployment, and service
- Remote mode — register an existing cluster, no provisioning required
- Replica count, CPU, and memory set on the cache server form
- ClusterIP by default for in-cluster traffic, NodePort when you need it
- Hazelcast discovery on Kubernetes — no static peer list to maintain
- Split-brain protection with a quorum of two and a quorum-aware merge
- Two-hundred-and-seventy-one partitions by default — tuned for throughput
- MongoDB write-behind keeps quota counters across full cluster restarts

**Concepts:** `Managed` · `Remote` · `ClusterIP · NodePort` · `Split-brain quorum` · `MongoDB persistence`

### 06 · Throttle and quota on the same cluster as the response cache.

Rate-limit counters belong with the cache, not in a separate Redis. Apinizer puts throttling and quota on the same Hazelcast cluster — second and minute counters in memory for absolute speed, hour, day, and month counters with MongoDB write-behind so they survive a full restart. Increments happen atomically on the cluster — no read-modify-write race, no ghost counts. The gateway emits standard rate-limit headers on every response, success or 429.

- Throttling counters at second and minute granularity — in-memory only
- Quota counters at hour, day, and month — MongoDB write-behind backed
- Atomic increments on the cluster — no read-modify-write race
- Fixed or sliding window per policy — operator choice on the form
- Standard rate-limit headers on every response — limit, remaining, reset
- Identity and type headers so clients can tell which limit they hit
- Counters survive a full cluster restart — no quota reset surprise
- Throttle policy and cache policy live on the same proxy settings screen

**Concepts:** `Second · minute` · `Hour · day · month` · `Atomic increment` · `Fixed · sliding` · `RateLimit headers`

### 07 · Operators see the cluster they run.

The Cache Dashboard is one screen, not a Hazelcast console you have to learn. Cluster members at the top with host and port. A navigation tree on the left by project, proxy, and proxy method. The keys for the selected scope in the middle with their counts and sizes. The cached payload on the right with creation time, expiry, hit count — and a clear button for the entry, or the whole scope, or every scope on the cluster. Multiple clusters supported when you run more than one environment.

- Cluster members panel — host, port, status — refreshed on demand
- Scope navigator by project, proxy group, proxy, and proxy method
- Key list with content count and byte size per entry
- Cached payload viewer with created time, TTL countdown, hit counter
- Clear one key, the whole scope, or every entry in the cluster
- Multi-cluster ready — register more than one cache server, switch tabs
- Same screen for managed and remote clusters — operations don't change
- Excel export from the key list — handy for capacity planning and audits

**Concepts:** `Cluster members` · `Scope tree` · `Payload viewer` · `Clear · one or all` · `Multi-cluster`

---

## Use cases

### Slash backend load on read-heavy traffic

Reference data, customer profiles, catalogs, regulatory lookups — pick the cache key on the policy form, pick the tier, and the gateway serves the same response under a millisecond. Backends breathe; clients see the answer faster.

- Per-proxy and per-method cache policy
- Local tier for sub-millisecond hot-path hits
- Distributed tier for cluster-wide consistency
- Atomic invalidation on every redeploy

### OAuth, OIDC, and JWT refresh at gateway speed

Token introspection, session lookup, and refresh-token rotation all run from the cluster — the same cluster that serves response caches. The identity surface stays fast even when the upstream IdP doesn't.

- OAuth 2.0 tokens and authorization codes cached
- OIDC sessions shared across all gateway pods
- JWT refresh tokens with revoke-on-rotate
- First-party — no separate Redis to operate

### Rate limits without a second store

Throttling and quota counters live on the same Hazelcast cluster as the cache. Second and minute counters in memory, hour, day, and month counters with MongoDB write-behind — atomic increments and standard rate-limit headers on every response.

- Second + minute throttling, in-memory
- Hour + day + month quota, durable
- Atomic increments — no race conditions
- RateLimit-Limit, Remaining, Reset headers

### Operate dev, test, and prod without trade-offs

Register one cluster per environment — managed by Apinizer or remote. The dashboard switches between them with a tab. Cache policies, key strategies, and tier choices promote alongside the proxy through APIops — no environment-specific cache code.

- Managed or remote per environment
- Multi-cluster registry in the dashboard
- Cache policy promotes with APIops
- Same operations playbook across environments

---

## What ships in the box

### Local tier

- In-pod cache sets for API proxies, routing, policy, and circuit breaker
- TTL ticker sweeps every minute — no stale-forever entries
- LRU eviction with a per-entity capacity cap
- Sub-millisecond hit latency, no network hop
- Coordinated flush on redeploy — atomic with routing tables

### Distributed tier

- Hazelcast cluster on Kubernetes, managed or remote
- Twelve cache scopes — responses, identity, throttle, quota, platform state
- 271 partitions by default, split-brain protection with a quorum
- MongoDB write-behind keeps quota counters across full restarts
- Dashboard for cluster members, scopes, keys, payloads, and clears

---

## Resources

- [Two-tier cache reference](https://apinizer.com/developers/docs) — Pick the tier per proxy. Read the architecture, the key contract, and the eviction semantics.
- [Cache key strategies](https://apinizer.com/developers/docs/cache-keys) — Query-parameter mode and custom-template mode side by side, with per-method override examples.
- [Atomic invalidation](https://apinizer.com/developers/docs/cache-invalidation) — How redeploy flushes the cache, routing, marshalled routes, and the load balancer view in one window.
- [Throttle + quota policies](https://apinizer.com/developers/docs/rate-limits) — Second-minute throttling, hour-day-month quota, fixed or sliding windows, rate-limit headers.
- [Cluster deployment](https://apinizer.com/developers/docs/cache-cluster) — Managed and remote cache servers on Kubernetes — namespace, service type, host alias, node list.
- [Cache dashboard](https://apinizer.com/developers/docs/cache-dashboard) — Cluster members, scope tree, key list, payload viewer, clear actions — one screen.

---

## Next step

*Cache that ships with the gateway*

**Two tiers. Twelve scopes. One contract.**

See how Apinizer's two-tier cache replaces a stack of point caches — response, auth, rate-limit, and platform state on the same Kubernetes-native cluster.

[Book a Demo](https://calendly.com/apinizer/15min) · [Read the docs](https://apinizer.com/developers/docs)

---

## Links

- Products: https://apinizer.com/products
- AI Gateway: https://apinizer.com/products/ai-gateway
- Solutions: https://apinizer.com/solutions
- Pricing: https://apinizer.com/pricing
- Developers: https://apinizer.com/developers
- Documentation: https://docs.apinizer.com/index-en
- Blog: https://apinizer.com/blog
- Contact: https://apinizer.com/company/contact

© 2026 Apinizer. All rights reserved.