Cache

Two cache tiers. One invalidation contract.

Local in-pod tier for sub-millisecond hits. Distributed Hazelcast cluster on Kubernetes for cluster-wide truth. Twelve cache scopes, atomic invalidation on redeploy, throttle and quota on the same tier — all first-party, no add-on.

Request a demo Read the docs

TiersLocal · Distributed
Scopes12 first-party
ClusterHazelcast on Kubernetes

Capabilities · deep dive

Seven properties that make a two-tier cache part of the gateway, not a separate stack.

A local in-pod tier for sub-millisecond hits, a Hazelcast cluster on Kubernetes for cluster-wide truth, two cache-key strategies on one form, twelve first-party scopes, atomic invalidation on every redeploy, throttle and quota on the same cluster, and one operator dashboard — every property a real Apinizer capability, not a roadmap promise.

01 · Two-tier architecture

One cache product. Two tiers. Decided per proxy.

Apinizer ships a true two-tier cache. The local tier lives inside every gateway pod — a hot path that never serializes, never crosses the network, and answers in under a millisecond. The distributed tier is a Hazelcast cluster on Kubernetes — the cluster-wide source of truth that survives a pod restart. Pick the tier on the cache policy: distributed when consistency matters, local when latency is the budget.

Local tier lives in the gateway pod — no network hop, no serialization
Distributed tier is a Hazelcast cluster on Kubernetes
Per-proxy storage type — local or distributed on the same policy screen
Local tier answers in under a millisecond — backed by an in-pod map
Distributed tier is cluster-wide truth — survives pod restarts and rolls
Same cache key contract on both tiers — switch without touching code
Per-entity local cache sets — API proxies, routing, policy, circuit breaker
Memory-tuned for the gateway request profile — no GC surprises

Local tier
Distributed tier
Per-proxy choice
Sub-ms hit
Hazelcast cluster

Two stacked tier panels — the upper L1 local tier inside the gateway pod listing API responses, routing tables, proxy config, circuit breaker state, and engine bindings; the lower L2 distributed Hazelcast cluster listing the same scopes plus OAuth, OIDC, JWT refresh, throttling, and quota; status pills underneath confirm sub-millisecond hits on L1 and three-millisecond hits on L2.

02 · Cache key strategy

Pick the key on a screen. Build it from any variable.

The cache key is what decides whether two calls are the same call. Apinizer gives you two strategies on the same policy form — pick the query parameters that count, or build a custom key from any request variable: header, path, body field, claim, environment. Override the proxy default on a single method when one endpoint needs a different shape. HTTP-GET-only, null-value caching, and on-error behaviour stay on the same screen.

Query-parameter mode — tick which params count, ignore the rest
Custom-key mode — template built from any request variable
Variable list spans headers, path, query, body field, claim, environment
Per-method override — POST /catalog/refresh uses one strategy, GET /catalog another
HTTP-GET-only toggle on the same form — never accidentally cache a mutation
Null-value toggle — negative cache is opt-in, not surprise behaviour
On-error policy — continue and serve stale, or stop and surface the error
Custom error code and message returned when the cache pipeline trips

Query mode
Custom template
Per-method override
GET-only
Negative cache

Two-column key strategy editor showing the same incoming request resolved two different ways — the left column picks query parameters region and tier and resolves to a query-shaped key; the right column uses a template combining tenant header, request path, and tier query, resolving to a tenant-shaped key; a settings strip pins HTTP-GET-only, cache-null-value, and on-error toggles underneath.

03 · Twelve cache scopes

Twelve cache scopes ship with the platform — not three.

Most gateways cache API responses and call it a cache. Apinizer ships twelve first-party scopes: four levels of API response caching, four identity caches the gateway needs to be fast, three counters for throttling and quota, and platform state for circuit breaker, routing, and banners. Every scope is tier-aware — pick local or distributed on the policy screen and the platform places the data correctly.

API responses cached at proxy-group, proxy, proxy-method, or API-call scope
OAuth 2.0 access and refresh tokens cached for sub-millisecond lookup
OAuth 2.0 authorization codes cached for the single-use redirect window
OIDC sessions shared across pods so SSO survives a pod replacement
JWT refresh tokens with revoke-on-rotate semantics
Throttling counters at second and minute granularity — in-memory
Quota counters at hour, day, and month — backed by MongoDB persistence
Circuit breaker state, routing tables, and client flow banners cached too

4 response scopes
OAuth + OIDC
JWT refresh
Throttle + quota
Circuit breaker

Twelve cache scope cards organized into three rows — API responses across proxy-group, proxy, proxy-method, and API-call; identity caches for OAuth tokens, OAuth codes, OIDC sessions, and JWT refresh; traffic control plus platform state for throttling, quota, circuit breaker, and routing — each card noting whether the scope is in-memory or persisted.

04 · Coordinated invalidation

Cache invalidation is on the deploy contract, not a follow-up.

Cache invalidation is the hard part. Apinizer makes it atomic. When you redeploy a proxy, one signal fans out into five lanes — local cache, distributed cache, routing tables, marshalled routes, load balancer view — and all five flush in the same window. The first request after deploy is served fresh, not from a half-warm cache. The invalidation endpoint is auth-aware, and you choose the stop-or-continue behaviour when the cache pipeline trips.

Single redeploy signal fans out to five invalidation lanes in parallel
Local cache, distributed cache, routing, marshalled routes, load balancer — all atomic
No half-state — the first request after deploy hits a fully warmed contract
Invalidation API requires authentication — turn off per policy if needed
Stop or continue when invalidation fails — operator choice, not default
Custom error code and message returned when the choice is stop
On backend error during cache fill — continue and serve stale is one toggle away
Same contract whether you redeploy from the UI, APIops, or the CI pipeline

Atomic flush
Five lanes
Auth-aware
Stop or continue
Stale-on-error

A redeploy event at the top fires a single invalidate signal that fans out into five lanes — local cache, distributed cache, routing tables, marshalled routes, load balancer view — each with a green progress fill and a millisecond timing; a settings card on the right pins the auth-required toggle and the stop-or-continue choice; a footer strip confirms zero half-state and a fresh response in fourteen milliseconds.

05 · Kubernetes-native cluster

Kubernetes-native cluster — managed or remote, your call.

The distributed tier runs as a real Kubernetes deployment. Apinizer can manage it for you — namespace, replicas, CPU, memory, service, ports — or register an existing cluster you already operate. Either way, the platform talks to the cluster through a service inside the namespace, with split-brain quorum, partition-aware membership, and MongoDB-backed persistence for the counters that need to survive a full restart.

Managed mode — Apinizer provisions namespace, deployment, and service
Remote mode — register an existing cluster, no provisioning required
Replica count, CPU, and memory set on the cache server form
ClusterIP by default for in-cluster traffic, NodePort when you need it
Hazelcast discovery on Kubernetes — no static peer list to maintain
Split-brain protection with a quorum of two and a quorum-aware merge
Two-hundred-and-seventy-one partitions by default — tuned for throughput
MongoDB write-behind keeps quota counters across full cluster restarts

Managed
Remote
ClusterIP · NodePort
Split-brain quorum
MongoDB persistence

A Kubernetes namespace frames a three-pod Hazelcast deployment with one primary and two backups joined by Kubernetes-native discovery, exposed through a ClusterIP service on port 5701; a managed-versus-remote toggle pins the top, a right-side panel lists replica count, CPU and memory, partition count, and split-brain quorum settings, and a MongoDB write-behind card sits below the cluster for quota durability.

06 · Throttle + quota on the same tier

Throttle and quota on the same cluster as the response cache.

Rate-limit counters belong with the cache, not in a separate Redis. Apinizer puts throttling and quota on the same Hazelcast cluster — second and minute counters in memory for absolute speed, hour, day, and month counters with MongoDB write-behind so they survive a full restart. Increments happen atomically on the cluster — no read-modify-write race, no ghost counts. The gateway emits standard rate-limit headers on every response, success or 429.

Throttling counters at second and minute granularity — in-memory only
Quota counters at hour, day, and month — MongoDB write-behind backed
Atomic increments on the cluster — no read-modify-write race
Fixed or sliding window per policy — operator choice on the form
Standard rate-limit headers on every response — limit, remaining, reset
Identity and type headers so clients can tell which limit they hit
Counters survive a full cluster restart — no quota reset surprise
Throttle policy and cache policy live on the same proxy settings screen

Second · minute
Hour · day · month
Atomic increment
Fixed · sliding
RateLimit headers

Two stacked rate-control rows — throttling at the top with second and minute buckets showing current call counts and progress fills, quota at the bottom with hour, day, and month buckets backed by MongoDB; a right-side response-headers card shows the rate-limit, remaining, reset, identity, and type headers; a footer strip pins the atomic-increment optimization, and a result strip splits a 200 OK call from a 429 throttled call.

07 · Cache dashboard

Operators see the cluster they run.

The Cache Dashboard is one screen, not a Hazelcast console you have to learn. Cluster members at the top with host and port. A navigation tree on the left by project, proxy, and proxy method. The keys for the selected scope in the middle with their counts and sizes. The cached payload on the right with creation time, expiry, hit count — and a clear button for the entry, or the whole scope, or every scope on the cluster. Multiple clusters supported when you run more than one environment.

Cluster members panel — host, port, status — refreshed on demand
Scope navigator by project, proxy group, proxy, and proxy method
Key list with content count and byte size per entry
Cached payload viewer with created time, TTL countdown, hit counter
Clear one key, the whole scope, or every entry in the cluster
Multi-cluster ready — register more than one cache server, switch tabs
Same screen for managed and remote clusters — operations don't change
Excel export from the key list — handy for capacity planning and audits

Cluster members
Scope tree
Payload viewer
Clear · one or all
Multi-cluster

In the box

What's included

The capabilities below are part of the standard install — no add-on SKUs and no separate licenses.

Local tier

In-pod cache sets for API proxies, routing, policy, and circuit breaker
TTL ticker sweeps every minute — no stale-forever entries
LRU eviction with a per-entity capacity cap
Sub-millisecond hit latency, no network hop
Coordinated flush on redeploy — atomic with routing tables

Distributed tier

Hazelcast cluster on Kubernetes, managed or remote
Twelve cache scopes — responses, identity, throttle, quota, platform state
271 partitions by default, split-brain protection with a quorum
MongoDB write-behind keeps quota counters across full restarts
Dashboard for cluster members, scopes, keys, payloads, and clears

Resources

Keep going

Two-tier cache reference

Pick the tier per proxy. Read the architecture, the key contract, and the eviction semantics.

Read the reference

Cache key strategies

Query-parameter mode and custom-template mode side by side, with per-method override examples.

Open the guide

Atomic invalidation

How redeploy flushes the cache, routing, marshalled routes, and the load balancer view in one window.

See the contract

Throttle + quota policies

Second-minute throttling, hour-day-month quota, fixed or sliding windows, rate-limit headers.

Read the doc

Cluster deployment

Managed and remote cache servers on Kubernetes — namespace, service type, host alias, node list.

Read the guide

Cache dashboard

Cluster members, scope tree, key list, payload viewer, clear actions — one screen.

Open the doc

Cache that ships with the gateway

Two tiers. Twelve scopes. One contract.

See how Apinizer's two-tier cache replaces a stack of point caches — response, auth, rate-limit, and platform state on the same Kubernetes-native cluster.

Request a demo Read the docs