Cache

Two cache tiers. One invalidation contract.

Local in-pod tier for sub-millisecond hits. Distributed Hazelcast cluster on Kubernetes for cluster-wide truth. Twelve cache scopes, atomic invalidation on redeploy, throttle and quota on the same tier — all first-party, no add-on.

  • TiersLocal · Distributed
  • Scopes12 first-party
  • ClusterHazelcast on Kubernetes

Capabilities · deep dive

Seven properties that make a two-tier cache part of the gateway, not a separate stack.

A local in-pod tier for sub-millisecond hits, a Hazelcast cluster on Kubernetes for cluster-wide truth, two cache-key strategies on one form, twelve first-party scopes, atomic invalidation on every redeploy, throttle and quota on the same cluster, and one operator dashboard — every property a real Apinizer capability, not a roadmap promise.

01 · Two-tier architecture

One cache product. Two tiers. Decided per proxy.

Apinizer ships a true two-tier cache. The local tier lives inside every gateway pod — a hot path that never serializes, never crosses the network, and answers in under a millisecond. The distributed tier is a Hazelcast cluster on Kubernetes — the cluster-wide source of truth that survives a pod restart. Pick the tier on the cache policy: distributed when consistency matters, local when latency is the budget.

  • Local tier lives in the gateway pod — no network hop, no serialization
  • Distributed tier is a Hazelcast cluster on Kubernetes
  • Per-proxy storage type — local or distributed on the same policy screen
  • Local tier answers in under a millisecond — backed by an in-pod map
  • Distributed tier is cluster-wide truth — survives pod restarts and rolls
  • Same cache key contract on both tiers — switch without touching code
  • Per-entity local cache sets — API proxies, routing, policy, circuit breaker
  • Memory-tuned for the gateway request profile — no GC surprises
  • Local tier
  • Distributed tier
  • Per-proxy choice
  • Sub-ms hit
  • Hazelcast cluster
Two stacked tier panels — the upper L1 local tier inside the gateway pod listing API responses, routing tables, proxy config, circuit breaker state, and engine bindings; the lower L2 distributed Hazelcast cluster listing the same scopes plus OAuth, OIDC, JWT refresh, throttling, and quota; status pills underneath confirm sub-millisecond hits on L1 and three-millisecond hits on L2.

02 · Cache key strategy

Pick the key on a screen. Build it from any variable.

The cache key is what decides whether two calls are the same call. Apinizer gives you two strategies on the same policy form — pick the query parameters that count, or build a custom key from any request variable: header, path, body field, claim, environment. Override the proxy default on a single method when one endpoint needs a different shape. HTTP-GET-only, null-value caching, and on-error behaviour stay on the same screen.

  • Query-parameter mode — tick which params count, ignore the rest
  • Custom-key mode — template built from any request variable
  • Variable list spans headers, path, query, body field, claim, environment
  • Per-method override — POST /catalog/refresh uses one strategy, GET /catalog another
  • HTTP-GET-only toggle on the same form — never accidentally cache a mutation
  • Null-value toggle — negative cache is opt-in, not surprise behaviour
  • On-error policy — continue and serve stale, or stop and surface the error
  • Custom error code and message returned when the cache pipeline trips
  • Query mode
  • Custom template
  • Per-method override
  • GET-only
  • Negative cache
Two-column key strategy editor showing the same incoming request resolved two different ways — the left column picks query parameters region and tier and resolves to a query-shaped key; the right column uses a template combining tenant header, request path, and tier query, resolving to a tenant-shaped key; a settings strip pins HTTP-GET-only, cache-null-value, and on-error toggles underneath.

03 · Twelve cache scopes

Twelve cache scopes ship with the platform — not three.

Most gateways cache API responses and call it a cache. Apinizer ships twelve first-party scopes: four levels of API response caching, four identity caches the gateway needs to be fast, three counters for throttling and quota, and platform state for circuit breaker, routing, and banners. Every scope is tier-aware — pick local or distributed on the policy screen and the platform places the data correctly.

  • API responses cached at proxy-group, proxy, proxy-method, or API-call scope
  • OAuth 2.0 access and refresh tokens cached for sub-millisecond lookup
  • OAuth 2.0 authorization codes cached for the single-use redirect window
  • OIDC sessions shared across pods so SSO survives a pod replacement
  • JWT refresh tokens with revoke-on-rotate semantics
  • Throttling counters at second and minute granularity — in-memory
  • Quota counters at hour, day, and month — backed by MongoDB persistence
  • Circuit breaker state, routing tables, and client flow banners cached too
  • 4 response scopes
  • OAuth + OIDC
  • JWT refresh
  • Throttle + quota
  • Circuit breaker
Twelve cache scope cards organized into three rows — API responses across proxy-group, proxy, proxy-method, and API-call; identity caches for OAuth tokens, OAuth codes, OIDC sessions, and JWT refresh; traffic control plus platform state for throttling, quota, circuit breaker, and routing — each card noting whether the scope is in-memory or persisted.

04 · Coordinated invalidation

Cache invalidation is on the deploy contract, not a follow-up.

Cache invalidation is the hard part. Apinizer makes it atomic. When you redeploy a proxy, one signal fans out into five lanes — local cache, distributed cache, routing tables, marshalled routes, load balancer view — and all five flush in the same window. The first request after deploy is served fresh, not from a half-warm cache. The invalidation endpoint is auth-aware, and you choose the stop-or-continue behaviour when the cache pipeline trips.

  • Single redeploy signal fans out to five invalidation lanes in parallel
  • Local cache, distributed cache, routing, marshalled routes, load balancer — all atomic
  • No half-state — the first request after deploy hits a fully warmed contract
  • Invalidation API requires authentication — turn off per policy if needed
  • Stop or continue when invalidation fails — operator choice, not default
  • Custom error code and message returned when the choice is stop
  • On backend error during cache fill — continue and serve stale is one toggle away
  • Same contract whether you redeploy from the UI, APIops, or the CI pipeline
  • Atomic flush
  • Five lanes
  • Auth-aware
  • Stop or continue
  • Stale-on-error
A redeploy event at the top fires a single invalidate signal that fans out into five lanes — local cache, distributed cache, routing tables, marshalled routes, load balancer view — each with a green progress fill and a millisecond timing; a settings card on the right pins the auth-required toggle and the stop-or-continue choice; a footer strip confirms zero half-state and a fresh response in fourteen milliseconds.

05 · Kubernetes-native cluster

Kubernetes-native cluster — managed or remote, your call.

The distributed tier runs as a real Kubernetes deployment. Apinizer can manage it for you — namespace, replicas, CPU, memory, service, ports — or register an existing cluster you already operate. Either way, the platform talks to the cluster through a service inside the namespace, with split-brain quorum, partition-aware membership, and MongoDB-backed persistence for the counters that need to survive a full restart.

  • Managed mode — Apinizer provisions namespace, deployment, and service
  • Remote mode — register an existing cluster, no provisioning required
  • Replica count, CPU, and memory set on the cache server form
  • ClusterIP by default for in-cluster traffic, NodePort when you need it
  • Hazelcast discovery on Kubernetes — no static peer list to maintain
  • Split-brain protection with a quorum of two and a quorum-aware merge
  • Two-hundred-and-seventy-one partitions by default — tuned for throughput
  • MongoDB write-behind keeps quota counters across full cluster restarts
  • Managed
  • Remote
  • ClusterIP · NodePort
  • Split-brain quorum
  • MongoDB persistence
A Kubernetes namespace frames a three-pod Hazelcast deployment with one primary and two backups joined by Kubernetes-native discovery, exposed through a ClusterIP service on port 5701; a managed-versus-remote toggle pins the top, a right-side panel lists replica count, CPU and memory, partition count, and split-brain quorum settings, and a MongoDB write-behind card sits below the cluster for quota durability.

06 · Throttle + quota on the same tier

Throttle and quota on the same cluster as the response cache.

Rate-limit counters belong with the cache, not in a separate Redis. Apinizer puts throttling and quota on the same Hazelcast cluster — second and minute counters in memory for absolute speed, hour, day, and month counters with MongoDB write-behind so they survive a full restart. Increments happen atomically on the cluster — no read-modify-write race, no ghost counts. The gateway emits standard rate-limit headers on every response, success or 429.

  • Throttling counters at second and minute granularity — in-memory only
  • Quota counters at hour, day, and month — MongoDB write-behind backed
  • Atomic increments on the cluster — no read-modify-write race
  • Fixed or sliding window per policy — operator choice on the form
  • Standard rate-limit headers on every response — limit, remaining, reset
  • Identity and type headers so clients can tell which limit they hit
  • Counters survive a full cluster restart — no quota reset surprise
  • Throttle policy and cache policy live on the same proxy settings screen
  • Second · minute
  • Hour · day · month
  • Atomic increment
  • Fixed · sliding
  • RateLimit headers
Two stacked rate-control rows — throttling at the top with second and minute buckets showing current call counts and progress fills, quota at the bottom with hour, day, and month buckets backed by MongoDB; a right-side response-headers card shows the rate-limit, remaining, reset, identity, and type headers; a footer strip pins the atomic-increment optimization, and a result strip splits a 200 OK call from a 429 throttled call.

07 · Cache dashboard

Operators see the cluster they run.

The Cache Dashboard is one screen, not a Hazelcast console you have to learn. Cluster members at the top with host and port. A navigation tree on the left by project, proxy, and proxy method. The keys for the selected scope in the middle with their counts and sizes. The cached payload on the right with creation time, expiry, hit count — and a clear button for the entry, or the whole scope, or every scope on the cluster. Multiple clusters supported when you run more than one environment.

  • Cluster members panel — host, port, status — refreshed on demand
  • Scope navigator by project, proxy group, proxy, and proxy method
  • Key list with content count and byte size per entry
  • Cached payload viewer with created time, TTL countdown, hit counter
  • Clear one key, the whole scope, or every entry in the cluster
  • Multi-cluster ready — register more than one cache server, switch tabs
  • Same screen for managed and remote clusters — operations don't change
  • Excel export from the key list — handy for capacity planning and audits
  • Cluster members
  • Scope tree
  • Payload viewer
  • Clear · one or all
  • Multi-cluster
A cache dashboard mockup with three panels — a left scope navigator tree organized by project, proxy, and proxy method; a middle key list showing six cached entries with their sizes and one selected row; a right key-content viewer showing the cached payload with created time, expiry countdown, hit counter, and a clear button; a top bar lists three cluster members with host and port and a refresh action.

In the box

What's included

The capabilities below are part of the standard install — no add-on SKUs and no separate licenses.

Local tier

  • In-pod cache sets for API proxies, routing, policy, and circuit breaker
  • TTL ticker sweeps every minute — no stale-forever entries
  • LRU eviction with a per-entity capacity cap
  • Sub-millisecond hit latency, no network hop
  • Coordinated flush on redeploy — atomic with routing tables

Distributed tier

  • Hazelcast cluster on Kubernetes, managed or remote
  • Twelve cache scopes — responses, identity, throttle, quota, platform state
  • 271 partitions by default, split-brain protection with a quorum
  • MongoDB write-behind keeps quota counters across full restarts
  • Dashboard for cluster members, scopes, keys, payloads, and clears

Cache that ships with the gateway

Two tiers. Twelve scopes. One contract.

See how Apinizer's two-tier cache replaces a stack of point caches — response, auth, rate-limit, and platform state on the same Kubernetes-native cluster.