EngineeringJul 9, 20256 min read

Hot deployment without dropping a request

What it takes for a gateway to accept a new proxy, a new policy, or a new route — while traffic is in flight. The runtime contract, not the marketing line.

MK

Mehmet Karaca

Platform Engineer

"Hot deployment" is one of those phrases every API gateway puts on the product page. The interesting question isn't whether the marketing line exists — it's what the runtime contract behind it actually guarantees.

This post walks through what Apinizer's Workers do when a new proxy deploys, what they don't do, and what that means for traffic in flight.

The setup

Apinizer separates the control plane from the data plane.

Manager holds the source of truth — proxies, policies, routes, credentials, identity, AI configuration. It's the system you log into to change things.
Worker is the data plane. It accepts traffic, runs the policy pipeline, calls the upstream, returns the response. There are usually multiple Workers in active-active.

A change in the Manager — deploy this proxy, update this route, rotate this credential — has to reach every Worker. The interesting part is how it reaches them and what happens to in-flight requests during the transition.

What "hot deployment" actually means

We're going to be specific. Hot deployment means three properties hold when a configuration change applies:

Zero dropped requests. Every request that started under the old configuration completes under the old configuration.
No process restart. The Worker JVM keeps running. There is no pod recreation, no JIT warm-up loss, no socket re-establishment.
No client retry. The change is not visible to consumers. The gateway doesn't return 503 for a moment while it reloads.

If a platform doesn't have all three of these, "hot deployment" is doing some marketing work. We've seen plenty where it means "you can push a new config, and most of the time it works."

How the change propagates

When a Manager-side change applies, three things happen:

The change writes to the source of truth. MongoDB stores the updated entity. The audit aspect captures the change with the actor and timestamp. The framework refuses the write if either the permission check or the audit can't fire.

A version-vector notification fans out. Every Worker subscribes to configuration change events. The notification carries the entity ID and the new version. Workers don't get the payload — they pull it on demand.

Each Worker hot-loads the new version. The Worker fetches the new configuration from the Manager (or from its local cache, if it's already warm), validates it, and atomically swaps the in-memory pointer for that proxy's runtime config.

Manager:        change written  → fan-out notification
Worker A:                       ↳ pull new config → validate → atomic swap
Worker B:                       ↳ pull new config → validate → atomic swap
Worker N:                       ↳ pull new config → validate → atomic swap

The fan-out is fast — single-digit seconds across a cluster — but it's not synchronous. Workers don't wait for each other. Each one swaps when ready.

What happens to in-flight requests

The atomic swap is the load-bearing part. A request that arrives one microsecond before the swap runs the old configuration. A request that arrives one microsecond after runs the new one. There is no in-between state where a request runs half-old, half-new.

Specifically:

The policy pipeline is bound to the proxy's runtime config at request acceptance. Once a request enters the pipeline, the pipeline doesn't re-resolve.
Upstream calls use the credentials, timeouts, and target list from the version that was current at acceptance. A credential rotated mid-request doesn't fail the in-flight call.
The audit record carries the configuration version the request ran under. Two requests touching the same proxy can carry different versions in the audit trail — and that's correct.

Behavior under load: requests in flight continue to completion under their pinned version; new requests pick up the new version. No client sees a 503 caused by the swap.

What this doesn't fix

Hot deployment isn't a magic wand. Two failure modes still exist and are worth naming.

Upstream incompatibility. If your new configuration points to a new upstream and that upstream is down, you'll get errors — but those are real errors from a real failure, not from the gateway's swap. The gateway is doing the right thing; the upstream isn't.

Schema-breaking changes. Some changes can't be hot-applied safely. Reordering an enum, changing a credential's encryption scheme, removing a policy field that another policy depends on. The Manager refuses to deploy these as hot. They require a Worker drain-and-restart cycle, which Apinizer can do as a rolling deploy without dropping traffic — but it's a different operation with different guarantees.

The Manager tells you when a change is hot-applicable and when it isn't. You don't have to guess.

What this lets you do in practice

Three patterns this enables, in production, that platforms without real hot deployment can't:

Credential rotation. Rotate an upstream credential weekly. Workers pick up the new credential within seconds. Traffic doesn't notice.

Policy hotfixes. A new injection pattern shows up in the wild. You add it to the prompt firewall policy, the change propagates, every Worker enforces it within seconds. No deploy window. No incident bridge.

Per-environment promotion. Apply the same manifest to DEV, then STAGE, then PROD. Each Worker pool hot-loads the change. The audit trail captures who promoted, when, and which environment received the change. Same operation, three environments, no downtime.

# Apply the same manifest across environments — APIops
$ apinizer apiops apply --env dev   ./payments/*.yaml
$ apinizer apiops apply --env stage ./payments/*.yaml
$ apinizer apiops apply --env prod  ./payments/*.yaml

Each apply is idempotent. Each apply is hot. The audit trail tells you which actor promoted which entity to which environment when.

What we measure

We watch four numbers during hot-deploy validation:

Propagation time — time from Manager write to last Worker swap. Single-digit seconds across a typical 3–5 Worker cluster.
Dropped requests during swap — measured at the load balancer. We want zero. We get zero.
Worker memory churn — a hot swap allocates the new runtime config before discarding the old. We watch GC pause to make sure the Old-Gen pressure stays sane.
Audit completeness — every change should produce an audit record on every Worker that picks it up. Missing audits indicate a Worker fell behind.

These four numbers are observable on the Analytics Engine. They're also the four numbers the operator wants when something feels wrong.

Why this is load-bearing

Without real hot deployment, every change is an incident. Teams batch changes into weekly deploys. Deploys go through a change-control board. A typo in a single regex becomes a 9pm Saturday operation. The platform slows down, the team slows down, and the rest of the organization routes around the platform.

With real hot deployment, the gateway becomes safe to change. Iteration on policies, routes, credentials, and AI configurations happens at engineering speed instead of operations speed. The audit trail keeps the change explainable. The active-active topology keeps the cluster available. The result is a platform that doesn't feel like an outage risk.

If you want to see the swap behavior on a live cluster — including the load-balancer view of "dropped requests during swap" — the team is one call away.

#hot-deploy
#kubernetes
#runtime
#platform

Walk through the platform with us.

A 30-minute tour of Manager, Worker, AI Gateway, and APIops on a Kubernetes of your choice.

Request a demo Read the docs