Concept · Reliability

Rate Limiting Algorithms

01

Why this matters

"Limit users to 100 requests/minute" sounds simple. It is not. Which minute? The last 60 seconds? The current calendar minute? Count as they come, or enforce an even drip? Each answer is a different algorithm with different behavior at boundaries, different memory cost, and different failure modes. Fixed window is easy to build and breaks under bursts. Sliding window log is accurate but expensive. Token bucket is the industry default.

Know all four. Interviews ask precisely this.

02

The four algorithms

AlgorithmHow it decidesMemoryBoundary behavior
Fixed windowCount requests in 12:00:00–12:00:59O(1) — one counter2× burst at minute boundary
Sliding window logStore every request timestamp; count in last 60sO(N) — grows with requestsExact
Sliding window counterWeighted blend of current + previous fixed windowsO(1)Approximately exact
Token bucketFill bucket at rate R; each request takes one tokenO(1) — bucket stateAllows burst up to bucket size, smooth beyond
03

Token bucket — the usual choice

A bucket holds up to capacity tokens. Tokens accrue at refill_rate per second. Each request takes 1 token. If the bucket is empty, the request is rejected (429) or queued.

Example: bucket capacity 100, refill 10/sec. Normal state: 100 tokens available. A burst of 50 requests arrives — all served (bucket had enough). Now 50 left; refilling at 10/sec. Sustained 10 RPS works indefinitely. A second burst of 60 requests — only 50 served (capacity); the 51st gets rejected until tokens refill.

This is the sweet spot: allows short bursts, enforces long-term rate, constant memory. Most APIs (Stripe, GitHub, Twitter) use some form of token bucket.

Token bucket rate limiter
import time

class TokenBucket:
    def __init__(self, rate_per_sec, burst):
        self.rate = rate_per_sec
        self.capacity = burst
        self.tokens = burst
        self.last = time.monotonic()

    def allow(self, cost=1):
        now = time.monotonic()
        # refill based on elapsed time
        self.tokens = min(self.capacity, self.tokens + (now - self.last) * self.rate)
        self.last = now
        if self.tokens >= cost:
            self.tokens -= cost
            return True
        return False

# 100 rps sustained, 500 burst → can handle short spikes without dropping
Token Bucket Over TimeMermaid
flowchart LR R[Refill 10/sec] --> B[(Bucket
max 100)] B --> C{Request
arrives} C -->|tokens >= 1| A[Allow + decrement] C -->|tokens == 0| X[Reject · 429]
04

Fixed window — simple but broken

Count requests in each calendar minute. Reset at the minute boundary. Implementation: INCR key:user42:12:00 with 60s TTL.

Problem: at minute boundaries, a user can send 100 requests at 11:59:59 and another 100 at 12:00:00. 200 requests in 2 seconds despite a "100/min limit." 2× the intended rate in the worst case.

Fix: sliding window counter. Track current and previous fixed window counts. At time 12:00:30 (50% through current window), estimated count = current_count + 0.5 × previous_count. Simple, bounded, close to exact without per-request storage. Cloudflare's rate limiter uses this.

05

Deep dive — distributed rate limiting

A single-node rate limiter with an in-memory counter is easy. Across N servers, you need a shared counter — and every rate-limited request must hit it. Redis is the default.

Atomic Redis primitives make this tractable:

  • Fixed window: INCR window_key + EXPIRE 60. One round-trip per request.
  • Token bucket: Lua script to check + refill + decrement atomically. Ensures no race between two concurrent requests from the same user.
  • Sliding window log: ZADD the timestamp, ZREMRANGEBYSCORE old entries, ZCARD to count. Three ops, atomic via MULTI/EXEC.

Cost per request: 0.5–1ms Redis round-trip. At 10k RPS, that's 10k Redis ops/sec — well within single-node Redis capacity (~100k ops/sec). Shard Redis by user ID for higher scale.

Alternative: approximate local limiting with occasional sync. Each server runs a local counter; gossips to peers every 100ms. Reduces Redis dependency but allows brief over-limits. Good for hot APIs where perfect fairness isn't required.

06

Real-world

Stripe

Token bucket per API key

Returns Retry-After header when limiting. Bursts of 25 requests tolerated; sustained 100/sec.

GitHub API

Fixed window per hour + per minute

5000/hour for authenticated users. 60/hour unauthenticated. Headers expose remaining and reset time.

Cloudflare

Sliding window counter + adaptive limits

Per-IP, per-path, per-custom-rule. Adaptive rate adjusts during traffic spikes to preserve origin.

AWS API Gateway

Token bucket

Per-account, per-method, per-stage quotas. Supports burst + steady-state rates.

07

Used in problems

Rate limiter problem is entirely about these algorithms. Web crawler uses politeness-based rate limiting per domain. Payment gateway rate-limits per merchant API key to avoid abuse.

Next up