Ticketmaster / StubHub

01

Problem Statement

Design a system like Ticketmaster or StubHub that allows users to browse events, view interactive seat maps, select and temporarily hold seats, and complete purchases. The system must handle extreme demand spikes — when a popular artist's tickets go on sale, millions of users may compete for tens of thousands of seats within seconds.

Unlike most e-commerce systems where inventory is abundant, ticket booking has a unique constraint: every seat is a unique, non-fungible item. Seat A1 in Row 3 is different from A2. This means the system can never oversell — selling the same seat to two people is a catastrophic failure, not just a data inconsistency.

Core question: How do you let thousands of people compete for a limited number of seats without overselling, while keeping the system responsive and fair?

The Two Fundamental Tensions

Availability vs. Correctness

You want the site to stay up under massive load (availability), but you absolutely cannot double-sell a seat (correctness). This is one of the rare systems where you lean toward consistency over availability — a brief "try again" is acceptable, selling two people the same seat is not.

UX vs. Inventory Accuracy

When a user browses the seat map, should they see real-time availability? If yes, you're hammering your inventory service. If no, they'll try to buy seats that are already gone. The solution: eventually consistent reads with strongly consistent writes.

02

Requirements

Functional Requirements

Browse & search events — by artist, venue, date, location, genre
View seat map — interactive map showing available seats with pricing tiers
Select & temporarily hold seats — when a user picks seats, hold them for ~8 minutes so nobody else can grab them
Purchase tickets — complete payment and issue a digital ticket with QR code
Cancel / refund — release seats back to inventory on cancellation
Virtual waiting room — for high-demand events, manage a fair queue before the sale starts

Non-Functional Requirements

Strong consistency on inventory — never oversell a single seat
High availability for reads — seat maps, event browsing should always work
Low latency under extreme concurrency — checkout flow must work when 100K+ users hit it simultaneously
Fairness — first-come-first-served during hot sales, no gaming the queue
Idempotent payments — never double-charge a customer

Out of scope: Secondary market (resale), dynamic pricing, recommendation engine, user authentication (assume existing OAuth/JWT).

03

Scale Estimation

We derive numbers from assumptions — the numbers drive the architecture, not the other way around.

500M

Registered Users

~550/day

Events on Sale

11M/day

Total Tickets Available

~2 TB/yr

Ticket Data Storage

The Spike That Drives the Design

Major On-Sale Event (Taylor Swift Scale)

50,000 seats, 5–10M users competing. 90% of seats sell in the first 2 minutes. That's ~375 write TPS for purchases — manageable. But the reads are the killer: 500K+ requests/second as millions refresh the seat map.

Derivation

200K events/year × 20K avg seats = 4B tickets/year. At ~500 bytes per ticket record: ~2 TB/year of ticket data. Event metadata, user profiles, and payment records add 5–10×, but still manageable. The read:write ratio during hot sales is ~1000:1 — this extreme ratio shapes the entire architecture. Reads must be cached aggressively; writes must be serialized per seat.

Parameter	Value	Drives...
Entrants (hot sale)	5–10M	Queue capacity, polling load
Seats per event	50K	Total admissions needed
Avg checkout time	~3 min	Batch admission rate
Hold timeout	8 min	Recapture window, queue drain speed
Inventory svc capacity	10K concurrent	Batch size per admission wave
Status poll interval	~5 sec	Queue status endpoint QPS (~1M)

04

API Design

Search Events

        GET /api/events?query=taylor+swift&location=dubai&date_from=2025-06-01
Authorization: Bearer {jwt}

Response 200:
{
  "events": [
    {
      "event_id": "evt_abc123",
      "name": "Taylor Swift — Eras Tour",
      "venue": "Dubai Arena",
      "date": "2025-09-15T20:00:00Z",
      "seats_available": 32847,
      "price_range": { "min": 150, "max": 950, "currency": "USD" },
      "queue_enabled": true,
      "sale_starts": "2025-06-01T10:00:00Z"
    }
  ],
  "total": 1,
  "cursor": "..."
}
      

Get Seat Map Availability

        GET /api/events/{event_id}/availability
Response 200:
{
  "event_id": "evt_abc123",
  "total_seats": 50000,
  "available": 32847,
  "sections": {
    "SEC-A": { "available": 120, "tier": "platinum", "min_price": 950 },
    "SEC-B": { "available": 380, "tier": "gold", "min_price": 450 },
    "SEC-C": { "available": 1200, "tier": "silver", "min_price": 250 }
  },
  "cached_at": "2025-06-01T10:00:02Z",
  "ttl_seconds": 3
}
// NOTE: This is eventually consistent (2-5s stale). OK for display.
      

Join Waiting Room

        POST /api/queue/join
Body: { "event_id": "evt_abc123" }
Response 200: { "queue_token": "qt_xyz789", "status": "waiting" }

GET /api/queue/status?token=qt_xyz789
Response 200:
{
  "position": 847293,
  "now_serving": 450000,
  "estimated_wait": "12 minutes",
  "status": "waiting"   // waiting | admitted | event_sold_out
}
      

Hold Seats (Critical Path)

        POST /api/inventory/hold
Headers: X-Admission-Token: {queue_admission_token}
Body: {
  "event_id": "evt_abc123",
  "seats": ["SEC-A_ROW-3_SEAT-15", "SEC-A_ROW-3_SEAT-16"],
  "hold_duration_seconds": 480
}
Response 200:
{
  "hold_id": "hold_abc",
  "seats": ["SEC-A_ROW-3_SEAT-15", "SEC-A_ROW-3_SEAT-16"],
  "expires_at": "2025-06-01T10:08:00Z",
  "total_price": 1900.00
}
Response 409: { "error": "seat_taken", "unavailable": ["SEC-A_ROW-3_SEAT-15"] }
      

Purchase Tickets

        POST /api/payments/charge
Headers: Idempotency-Key: {user_id}:{event_id}:{hold_id}
Body: {
  "hold_id": "hold_abc",
  "payment_method_id": "pm_stripe_xyz",
  "total": 1900.00
}
Response 200:
{
  "order_id": "ord_def456",
  "tickets": [
    { "ticket_id": "tkt_001", "seat": "SEC-A_ROW-3_SEAT-15", "qr_code": "..." },
    { "ticket_id": "tkt_002", "seat": "SEC-A_ROW-3_SEAT-16", "qr_code": "..." }
  ],
  "status": "confirmed"
}
      

05

High-Level Architecture

Every component exists to serve one step of the user journey: search → view seats → queue → hold → pay → confirm. Services are separated by scaling characteristics — read-heavy event browsing scales independently from write-critical inventory holds.

Component Responsibilities

Component	Role	Scaling Note
CDN	Serves static assets + seat map base images. 90% of traffic never reaches origin.	Edge-cached globally
Load Balancer	Path-based routing: /api/events/* → Event Svc, /api/inventory/* → Inventory Svc, etc.	Ensures hot-path isolation
Queue Service	Virtual waiting room for hot events. Randomized position assignment at on-sale time. Controlled batch admission.	Redis Sorted Set, stateless workers
Event Service	Read-heavy workhorse. Full-text search (Elasticsearch), event details (Redis cache), availability overlay.	50+ instances during hot sales
Inventory Service	Seat state machine: AVAILABLE → HELD → SOLD. Two-layer: Redis SET NX (fast gate) + PostgreSQL (authority).	Sharded by event_id
Payment Service	Decoupled, async. Charges via Stripe/Adyen with idempotency keys. Confirms or releases hold on result.	Independent scaling
Notification Service	Consumes Kafka events. Generates QR tickets, sends emails/push. Fire-and-forget from payment's perspective.	Async, retry from queue

Seat Map: CDN + Lightweight Data Overlay

A 50,000-seat stadium map is a complex visual. Rendering it from scratch per-request would be prohibitively expensive. Instead, the base seat map (venue layout, section boundaries, seat positions) is a pre-rendered static asset served from CDN — think of it as the empty blueprint.

The availability overlay is a lightweight JSON payload fetched separately (~200KB for 50K seats), cacheable for 2–5 seconds. The browser loads the cached image from CDN and overlays colored dots based on the fresh JSON. This turns a heavy rendering problem into a lightweight data-fetch problem.

Request Flow — Step Through

Client→CDN→Event Svc→Queue Svc→Inventory Svc→Redis (NX)→PostgreSQL→Payment Svc→Kafka→Notification

Click Next Step to walk through the request flow.

06

Deep Dive — Preventing Double-Selling Under Extreme Concurrency

This is THE interesting problem in this design. When 10,000 users click on the same seat within a second, exactly one must win and 9,999 must be told "seat taken" — instantly, with no race conditions.

The Naive Approach (and Why It Fails)

        -- Thread 1 and Thread 2 both run this simultaneously
SELECT status FROM seats WHERE seat_id = 'A1' AND event_id = 'E1';
-- Both see: AVAILABLE  ← race condition window

UPDATE seats SET status = 'HELD', user_id = 'U1' WHERE seat_id = 'A1';
-- Thread 1 wins
UPDATE seats SET status = 'HELD', user_id = 'U2' WHERE seat_id = 'A1';
-- Thread 2 ALSO succeeds → DOUBLE SOLD
      

Two reads happen before either write. Both see "available" and both proceed. This is a classic read-then-write race condition.

The Two-Layer Hold Pattern

The solution uses Redis as a fast gate and PostgreSQL as the authority. Redis rejects 99% of contention without ever touching the database.

sequenceDiagram participant U as User participant IS as Inventory Service participant R as Redis participant PG as PostgreSQL U->>IS: POST /hold (seat A1) IS->>R: SET seat:E1:A1 user_123 NX EX 480 alt Key already exists R-->>IS: nil (FAIL) IS-->>U: 409 Seat Taken else Key set successfully R-->>IS: OK IS->>PG: UPDATE seats SET status='HELD' WHERE status='AVAILABLE' AND version=N alt rows_affected = 1 PG-->>IS: 1 row updated IS-->>U: 200 Hold Confirmed (8 min) else rows_affected = 0 PG-->>IS: 0 rows IS->>R: DEL seat:E1:A1 IS-->>U: 409 Seat Taken end end

Why SET NX Instead of Redlock?

SET NX — Simple Claim

Single atomic command. 1 network round-trip. The NX flag means "only set if not exists." The EX 480 gives an 8-minute TTL. We're claiming, not locking — the hold itself is the state.

Redlock — Overkill

Requires 5 independent Redis masters, 5 round-trips, clock sync assumptions. Designed for mutual exclusion (lock → work → unlock), but we don't have a critical section. Known issues with GC pauses and clock drift (see Kleppmann's critique).

What if Redis Succeeds but DB Fails?

The Redis SET NX succeeds, but the PostgreSQL UPDATE fails (timeout, crash, disk full). Now Redis thinks the seat is held, but the DB thinks it's available — split-brain state.

        async def hold_seat(event_id, seat_id, user_id):
    redis_key = f"seat:{event_id}:{seat_id}"
    
    # Phase 1: Fast gate
    acquired = await redis.set(redis_key, user_id, nx=True, ex=480)
    if not acquired:
        return HoldResult.SEAT_TAKEN
    
    # Phase 2: Authoritative write
    try:
        rows = await db.execute("""
            UPDATE seats SET status = 'HELD', user_id = %s, 
                   held_until = NOW() + INTERVAL '8 min',
                   version = version + 1
            WHERE seat_id = %s AND event_id = %s
              AND status = 'AVAILABLE'
        """, [user_id, seat_id, event_id])
        
        if rows == 0:
            await redis.delete(redis_key)  # Clean up
            return HoldResult.SEAT_TAKEN
        return HoldResult.SUCCESS
        
    except Exception:
        await redis.delete(redis_key)  # Roll back Redis
        return HoldResult.RETRY
      

Defense in Depth — 4 Safety Layers

The Hierarchy of Truth

Layer 1 — PostgreSQL is the source of truth (survives restarts, is ACID).
Layer 2 — Redis is the performance optimization (fast filter, may be stale).
Layer 3 — TTL is the self-healing mechanism (bounds duration of any inconsistency to 8 min).
Layer 4 — Reconciliation job is the safety net (scans every 30s, catches anything TTL hasn't fixed).

"Best Available" — FOR UPDATE SKIP LOCKED

Many users don't pick specific seats — they request "2 best available in Section B." PostgreSQL's FOR UPDATE SKIP LOCKED is perfect:

        SELECT seat_id FROM seats
WHERE event_id = ? AND status = 'AVAILABLE' AND price_tier = ?
ORDER BY row_number ASC, seat_position ASC  -- front-center is "best"
LIMIT ?
FOR UPDATE SKIP LOCKED  -- Skip rows locked by other transactions
      

If 10 people request "best available" simultaneously, they each get different seats without blocking each other. No deadlocks, no waiting.

General Admission — Atomic Counter

For events without reserved seating, per-seat locking is unnecessary. Instead, use a single Redis DECR:

        remaining = DECR event:E1:remaining
if remaining >= 0:
    # Purchase succeeds — write to DB async
else:
    INCR event:E1:remaining  # Roll back
    # Sold out
      

07

Feels "fair" intuitively. But in practice, rewards people with faster internet, refresh-spamming, and bot scripts. Not actually fair — just fast. Late arrivals (after on-sale) still get FIFO as a tail.

08

What Can Go Wrong

🔴 Payment Service Goes Down

Hold is already in place, so the seat is safe. The system retries payment within the hold window. If the hold expires before payment succeeds, the seat is released. User must re-select. Mitigation: hold is the safety net — worst case is lost sale, never a double-sell.

🔴 Redis Goes Down

Fall back to PostgreSQL optimistic locking only. Higher latency (5ms vs. <1ms) but still correct. This is why PostgreSQL is the source of truth, not Redis. Feature-flag the Redis layer so it degrades gracefully.

🔴 Hold Expires During Payment Processing

The scariest edge case. User submits payment at minute 7, hold expires at minute 8, payment completes at minute 8:15 — but the seat was released and re-held by someone else. Solution: the confirm_purchase call does an atomic WHERE check — if the seat is no longer held by this user, the payment is refunded immediately.

🔴 Bot Attacks

Bots try to grab hundreds of seats using multiple accounts. Defenses: CAPTCHA at queue entry, device fingerprinting, rate limiting on token generation (one per IP per event), random queue position assignment (bots can't gain speed advantage), second CAPTCHA at admission.

🔴 Hot Partition

A single popular event means all requests hit the same DB shard. Redis absorbs most contention (99% of rejections happen at the SET NX layer). The DB only sees successful holds (~50K writes over 15 minutes). Shard by event_id so other events are unaffected.

🔴 Split-Brain: Redis Says Held, DB Says Available

Redis SET NX succeeds, DB write fails. The rollback logic immediately DELs the Redis key. If even the DEL fails, the 8-minute TTL self-heals — the key auto-expires and the seat becomes available again. Background reconciliation job catches stragglers every 30 seconds.

⚠

Anti-patterns

🚫

Optimistic concurrency on seat row

100k people all try to grab seat A12 simultaneously → 99,999 retries.

✓ Better: Pessimistic lock + queue-based waiting room; users enter sequentially.

🚫

Cache seat availability aggressively

TTL of seconds means 10k users see the same seat as available.

✓ Better: Real-time availability; SSE/WebSocket updates; cache only static (event, venue) data.

🚫

One monolithic DB transaction from reserve → payment

Transaction holds seat locks for minutes while user types card info.

✓ Better: Two-phase: soft hold (5 min TTL) then payment; explicit release on timeout.

09

Interview Tips

💡

Lead with the constraint, not the components.
"The core challenge here is preventing double-selling under extreme concurrency. Let me design around that." This immediately shows you understand what makes this problem unique — it's not a generic CRUD app.

⚡

Clarify the seating model early.
Ask: "Are we designing for reserved seating (pick your seat) or general admission?" This fundamentally changes the concurrency approach — reserved needs per-seat locking (SET NX), GA needs an atomic counter (DECR).

🎯

The virtual queue is your scaling secret weapon.
Proactively say "we need to protect downstream systems by controlling the admission rate." Interviewers love this — it shows you think about production realities, not just component diagrams.

🧠

Don't forget: the happy path is boring.
The interesting design is in edge cases: hold expiration during payment, Redis/DB inconsistency, queue fairness under bot attacks. Volunteer these — "let me talk about what happens when things go wrong."

🔑

Know the magic words: SET NX, FOR UPDATE SKIP LOCKED, idempotency key.
These three techniques — Redis atomic set-if-not-exists, PostgreSQL row-level skip-locking, and payment idempotency — are the concrete implementation details that turn a hand-wavy design into a credible one.

📊

State the read:write ratio.
"During a hot sale, reads outnumber writes ~1000:1. This means I can serve reads from a cache with 2–5 second staleness, while writes go through an atomic path." This single sentence justifies the entire caching strategy.

10

Evolution

How this design grows from a single-server prototype to a planet-scale ticketing platform.

1

MVP — Single Server, Small Venues

Single PostgreSQL database with optimistic locking (version column). No Redis, no queue. Direct seat selection → payment. Works for venues up to ~5,000 seats where concurrent demand is manageable. Simple, correct, and easy to reason about.

2

Growth — Redis + Virtual Queue

Add Redis as a fast gate for seat holds (SET NX). Introduce the virtual waiting room for events with expected demand > 10× seat count. Add read replicas for seat map queries. Elasticsearch for event search. CDN for static assets and seat map images. Handles events up to ~50,000 seats.

3

Scale — Sharding + Global Reach

Shard PostgreSQL by event_id so hot events don't affect others. Async payment processing with idempotency keys. Multi-region deployment (CDN edge + regional API servers). Kafka event bus for decoupled notification and analytics. Dynamic queue batch sizing based on real-time inventory service load. Handles multiple simultaneous hot sales globally.

4

Platform — Secondary Market + Dynamic Pricing

Add verified resale marketplace (separate service layer on top of primary). Dynamic pricing service that adjusts prices based on demand signals from the queue. Mobile-first ticket delivery with NFC/Apple Wallet. Analytics pipeline for venue operators. Fraud detection ML for bot prevention. Transfer and gifting capabilities.

📺

References & Videos

Design Ticketmaster — System Design

Exponent · 20 min

Design a Ticket Booking System

Arpit Bhayani · 25 min

Problem Statement

The Two Fundamental Tensions

Availability vs. Correctness

UX vs. Inventory Accuracy

Requirements

Functional Requirements

Non-Functional Requirements

Scale Estimation

The Spike That Drives the Design

Major On-Sale Event (Taylor Swift Scale)

Derivation

API Design

High-Level Architecture

Component Responsibilities

Seat Map: CDN + Lightweight Data Overlay

Deep Dive — Preventing Double-Selling Under Extreme Concurrency

The Naive Approach (and Why It Fails)

The Two-Layer Hold Pattern

Why SET NX Instead of Redlock?

SET NX — Simple Claim

Redlock — Overkill

What if Redis Succeeds but DB Fails?

Defense in Depth — 4 Safety Layers

The Hierarchy of Truth

"Best Available" — FOR UPDATE SKIP LOCKED

General Admission — Atomic Counter

Key Design Decisions & Tradeoffs

1. Consistency Model

Strong Writes + Eventual Reads

Fully Real-Time Reads

2. Queue Activation

Conditional Queue (per-event flag)

Always-On Queue

3. Hold Duration

8-Minute Hold with TTL

15-Minute Hold

4. Data Store for Inventory

PostgreSQL (ACID)

DynamoDB / NoSQL

5. Queue Position Assignment

Random Shuffle at On-Sale Time

FIFO (First-Come-First-Served)

What Can Go Wrong

🔴 Payment Service Goes Down

🔴 Redis Goes Down

🔴 Hold Expires During Payment Processing

🔴 Bot Attacks

🔴 Hot Partition

🔴 Split-Brain: Redis Says Held, DB Says Available

Anti-patterns

Interview Tips

Similar Problems

Rate Limiter

Uber / Ride Sharing

Notification System

Design a Cache (Redis)

Evolution

MVP — Single Server, Small Venues

Growth — Redis + Virtual Queue

Scale — Sharding + Global Reach

Platform — Secondary Market + Dynamic Pricing

References & Videos

Rate Limiter

Uber / Ride Sharing

Back-of-Envelope Estimation