Backpressure & Flow Control

01

Why this matters

A producer emits 100k events/sec. A consumer can process 10k/sec. With no coordination, the consumer's queue grows unboundedly — memory bloats, latency cliffs, eventually the process OOMs. Backpressure is the mechanism by which downstream components signal upstream to slow down. Without it, any fast producer will eventually crash any slow consumer. This is THE problem in streaming systems.

02

The three options when overloaded

Buffer

Queue overflow

Absorb bursts in memory or disk. Fine for short spikes (seconds of headroom). Fails hard when sustained — memory exhausted, latency explodes, process OOMs.

Drop

Load shedding

When queue crosses a threshold, drop new messages. Keeps the consumer healthy. You choose what to drop (oldest, newest, low-priority). Acceptable for telemetry; unacceptable for payments.

Signal upstream

Real backpressure

Tell the producer "stop sending." Producer throttles, or buffers upstream, or drops there. Problem handled at the correct layer. This is what robust systems do.

03

Mechanisms of real backpressure

TCP-level — sockets have send + receive buffers. When receiver's buffer fills, TCP's sliding window shrinks; sender's send() blocks. Happens transparently. Covers "I'm overwhelmed at the transport layer."

HTTP 429 / 503 — server signals "too many requests" or "service unavailable" with Retry-After header. Caller honors (see retry with backoff).

Reactive Streams spec (Java, JS, .NET) — consumer calls request(n) to pull at most n items. Producer can only emit up to the outstanding demand. Built-in to RxJava, Project Reactor, akka-streams, Node.js streams.

Credit-based flow control — consumer grants producer N "credits." Each sent message consumes one credit. Producer stops when credits hit 0; more credits sent as consumer processes. Used by AMQP, gRPC streaming, HTTP/2.

Pull-based queues (Kafka, SQS long-poll) — consumer decides when to fetch. Producer writes to durable queue; consumer reads at its own pace. Queue itself acts as buffer with disk as the overflow floor.

Count-Min Sketch (streaming top-K)

import mmh3

class CountMinSketch:
    def __init__(self, width=10_000, depth=4):
        self.w, self.d = width, depth
        self.grid = [[0] * width for _ in range(depth)]

    def add(self, item, count=1):
        for i in range(self.d):
            self.grid[i][mmh3.hash(item, i) % self.w] += count

    def estimate(self, item):
        # MIN across rows — over-estimate only, never under
        return min(self.grid[i][mmh3.hash(item, i) % self.w]
                   for i in range(self.d))

# Trade: ~N×W×8 bytes for constant-space approx counts at any scale

04

Deep dive — why push without backpressure fails

Classic example: a Node.js stream pipeline.

source.on("data", chunk => destination.write(chunk));
// destination is slow. Chunks accumulate in memory. OOM.

The fix Node provides:

source.pipe(destination);
// .pipe() reads write()'s return value. false = slow; pauses source until
// destination emits "drain". Transparent backpressure, same API.

Same idea everywhere:

RxJava Flowable (backpressure-aware) vs Observable (not).
gRPC server streaming respects client's read() pace.
Kafka consumers poll at their own speed; broker never pushes.

Backpressure-aware APIs look almost identical to eager ones. The difference is correctness under load. Always choose the backpressure-aware variant. Cost: a little more code complexity. Benefit: your pipeline doesn't implode at peak traffic.

05

Backpressure vs rate limiting

Related but different:

Rate limiting = enforce a preset ceiling ("max 1000 RPS"). Protect against abusive clients, fair-share resources.
Backpressure = dynamically signal real-time downstream pressure. Protects the system from itself when legit traffic outpaces capacity.

You want both. Rate limits set the outer envelope; backpressure handles normal variance within it.

06

Real-world

Kafka consumer-pull model

Backpressure by design

Brokers never push. Consumers poll at their own pace. If a consumer falls behind, the lag metric rises but the system doesn't implode.

gRPC streaming + HTTP/2

Window-based flow control

HTTP/2's per-stream window controls how much data a sender can push before receiving ACKs. Native backpressure.

Reactive Streams

request(n) demand

Standardized JVM interface. Producer never emits beyond consumer demand. Project Reactor is the most popular implementation.

AWS Kinesis

Shard limits + throttling

Each shard allows 1 MB/s writes. Exceeding → ProvisionedThroughputExceededException. Clients must slow down or resize.

07

Used in problems

Distributed queue's core design challenge is backpressure between producers and consumers. Distributed logging uses credit-based backpressure to avoid overwhelming the storage tier. Live streaming applies per-viewer backpressure to smooth traffic.

📺