03
Token bucket — the usual choice
A bucket holds up to capacity tokens. Tokens accrue at refill_rate per second. Each request takes 1 token. If the bucket is empty, the request is rejected (429) or queued.
Example: bucket capacity 100, refill 10/sec. Normal state: 100 tokens available. A burst of 50 requests arrives — all served (bucket had enough). Now 50 left; refilling at 10/sec. Sustained 10 RPS works indefinitely. A second burst of 60 requests — only 50 served (capacity); the 51st gets rejected until tokens refill.
This is the sweet spot: allows short bursts, enforces long-term rate, constant memory. Most APIs (Stripe, GitHub, Twitter) use some form of token bucket.
Token bucket rate limiter
import time
class TokenBucket:
def __init__(self, rate_per_sec, burst):
self.rate = rate_per_sec
self.capacity = burst
self.tokens = burst
self.last = time.monotonic()
def allow(self, cost=1):
now = time.monotonic()
# refill based on elapsed time
self.tokens = min(self.capacity, self.tokens + (now - self.last) * self.rate)
self.last = now
if self.tokens >= cost:
self.tokens -= cost
return True
return False
# 100 rps sustained, 500 burst → can handle short spikes without dropping
Token Bucket Over TimeMermaid
flowchart LR
R[Refill 10/sec] --> B[(Bucket
max 100)]
B --> C{Request
arrives}
C -->|tokens >= 1| A[Allow + decrement]
C -->|tokens == 0| X[Reject · 429]