System Design — 06

LinkedIn MutualConnection Search

Design a system that finds shared connections between any two users on a billion-user professional network — in under 50ms, at 100K+ queries per second.

Graph IntersectionBloom FiltersRedisProbabilistic Data StructuresSet Operations
01

Problem Statement

Given any two users on a professional social network like LinkedIn, efficiently find and display their shared (mutual) connections. It sounds simple — a set intersection. User A's connections ∩ User B's connections. You could write it in one line of SQL.

But at LinkedIn's scale, that one line becomes one of the hardest problems in social networking infrastructure. Before the page finishes rendering, the system has already answered: "You and this person share 47 connections — here are their faces." That query ran in under 50ms, against a graph with over 1 billion nodes and 250 billion edges.

Mutual connections don't just appear on profiles. They show up on search result cards (10–20 per page), in "People You May Know" feeds, in connection request screens, in messaging previews. A single user session might trigger 50–100 mutual connection lookups without the user even thinking about it.

Core question: How do you make set intersection feel instant at billion-user scale, when the underlying graph is constantly changing and every page view demands multiple intersection queries?

The Four Sub-Problems

The Intersection Problem

Computing the overlap between two potentially large sets (power users have 30,000+ connections) without scanning everything at query time.

The Freshness Problem

Connections change constantly (~5M new connections/day). Stale mutual counts erode user trust in the product.

The Fan-out Problem

A single page load can require mutual computation against dozens of profiles simultaneously — search results, PYMK, feed cards.

The Asymmetry Problem

The viewer's connection list is reused across all lookups in a session, but each target profile's list is different. This reuse is the key optimization.

02

Requirements

Functional Requirements

  • FR1 — Get Mutual Connections: Given viewer + target, return the list of shared connections with profile data, ranked by relevance
  • FR2 — Get Mutual Count: Lightweight count-only query for bulk surfaces (search cards, PYMK tiles)
  • FR3 — Paginated Retrieval: Cursor-based pagination over potentially hundreds of mutual connections
  • FR4 — Real-Time-ish Updates: New connections reflected in mutual counts within 30 seconds
  • FR5 — Bulk Mutual Resolution: Batch API for 10–50 profiles in a single request, reusing viewer's connection list

Non-Functional Requirements

  • NFR1 — Latency: sub-50ms p99 for count, sub-100ms for full list
  • NFR2 — Throughput: ~120K–140K peak QPS (80% count-only, 20% full list)
  • NFR3 — Consistency: Eventual, with read-your-own-writes for the active user
  • NFR4 — Availability: 99.99% — graceful degradation (show profile without mutuals vs. fail)
  • NFR5 — Storage: ~5TB adjacency data for 1B users, must fit in memory for hot path
  • NFR6 — Scalability: Horizontal — adding users solved by adding machines

Out of Scope

2nd-degree connections (friends-of-friends), PYMK ranking algorithm, connection suggestions explanations, privacy controls, and blocking/restricting.

03

Scale Estimation

The numbers here directly force the architecture. By the end, we'll know whether precomputation is feasible and whether adjacency lists fit in memory.

1B
Total Users
60M
Daily Active Users
~500
Avg Connections/User
250B
Total Graph Edges
~140K
Peak QPS
5M/day
New Connections
~1.2 TB
Active User Adj. Data
1018
Possible Pairs (impossible)

Key Derivation

60M DAU × 50 lookups/session = 3B lookups/day = ~35K avg QPS. Peak 3–4× = ~140K QPS. Split: 110K count-only + 28K full-list. Precomputing all pairs: 1018 entries — impossible. Even friend-of-friend pairs: 250 trillion. Query-time intersection is mandatory.

Critical conclusion: Adjacency lists must live in memory (200K+ list fetches/sec would crush disk). Redis cluster of ~20–40 nodes for active users is non-negotiable.

04

API Design

Three public endpoints reflecting the three access patterns. Viewer identity is implicit (auth token) — you can only ask "what are MY mutuals with target", preventing graph scraping.

GET — Full Mutual List (Profile View)
GET /v1/connections/mutual?target_user_id={id}&cursor={cursor}&limit=10&sort=relevance

Response: {
  "mutual_connections": [
    { "user_id": "u_83a2f1", "name": "Priya Sharma",
      "headline": "Staff Engineer at Google",
      "profile_photo_url": "https://...",
      "mutual_context": { "shared_company": "Google" } }
  ],
  "total_count": 47,
  "next_cursor": "eyJsYX...",
  "has_more": true
}
GET — Count Only (Search Cards, PYMK)
GET /v1/connections/mutual/count?target_user_id={id}

Response: {
  "target_user_id": "u_83a2f1",
  "mutual_count": 47,
  "sample_mutuals": [                    // 2–3 faces for social proof
    { "user_id": "u_7cd312", "name": "Priya Sharma", "profile_photo_url": "..." }
  ]
}
POST — Batch Counts (Search Results, Feed)
POST /v1/connections/mutual/batch
Body: { "target_user_ids": ["u_001", ..., "u_020"], "include_samples": true }

// Viewer's adjacency list loaded ONCE — intersected against all 20 targets
// Bloom filter pre-check eliminates ~70% of full list loads
Response: {
  "results": [
    { "target_user_id": "u_001", "mutual_count": 47, "sample_mutuals": [...] },
    { "target_user_id": "u_002", "mutual_count": 0, "sample_mutuals": [] }
  ]
}
Internal Event — Connection Changed
Topic: connection_events (Kafka)
{ "event_type": "connection_created",
  "user_a_id": "u_83a2f1", "user_b_id": "u_44e891",
  "timestamp": "2025-06-15T14:32:01Z" }

→ Invalidate adj + bloom cache for both users
→ Proactively warm cache for both users
05

High-Level Architecture

The read path is optimized around two insights: the viewer's adjacency list is reused across batch queries, and most profile pairs share zero mutuals (skippable via Bloom filter).

Client Web / Mobile API Gateway Auth · Rate Limit Mutual Connection Service Stateless · Batch Intersection Engine Bloom Filters Redis · ~185 GB Adjacency Cache Redis · ~1.2 TB Profile Service Name · Photo Kafka Connection Events Connection Svc Create · Remove Cache Invalidator Invalidate + Warm Graph DB PostgreSQL Sharded Relevance Ranker Score · Sort HTTPS Batch MGET blooms MGET adj lists Enrich samples Events Consume DEL + Warm cache miss
Request Flow — Step Through
ClientAPI GatewayMutual ServiceRedis (Bloom)Bloom CheckRedis (Adj)IntersectProfile SvcResponse
Click Next Step to walk through the request flow.
06

Deep Dive — Bloom Filter Pre-check & Set Intersection

The most impactful optimization isn't a faster intersection algorithm — it's avoiding the intersection entirely for the ~70% of profile pairs that share zero mutuals.

The Adaptive Intersection Engine

The system picks the best algorithm at runtime based on set sizes:

ScenarioAlgorithmWhyTime
Zero mutuals (70%)Bloom pre-checkSkip intersection entirely~0.0001ms
Small × Small (300×300)Sorted mergeSequential access, cache-friendly~0.002ms
Small × Large (300×25K)Galloping searchO(|S| × log|L|) beats O(S+L)~0.01ms
Large × Large (25K×25K)Roaring bitmap ANDSIMD bitwise ops, 3× faster~0.05ms
Batch (1 viewer × 20 targets)Hash-set probeBuild viewer set once, probe N times~0.02ms total

Bloom Filter — The Zero-Mutual Fast Path

A Bloom filter is a probabilistic data structure that answers "is X possibly in this set?" If it says no, the element is definitely not present. If it says yes, the element is probably present (with a tunable false positive rate).

The Compound FPR Trap

With 1% per-element FPR and 500 viewer IDs probed against a target bloom, the chance of at least one false positive is 1 − 0.99⁵⁰⁰ = 99.3%. The bloom is nearly useless as a yes/no gate! Solution: count hits vs. expected false positives. If hits ≤ expected_FP × 1.5, declare zero mutuals. This gives ~90% skip rate on true-zero pairs.

Statistical Threshold Approach
bloom_check(viewer_list, target_bloom, fpr=0.01):
    hits = sum(1 for id in viewer_list if target_bloom.might_contain(id))
    expected_fp = len(viewer_list) × fpr        // e.g. 500 × 0.01 = 5
    
    if hits ≤ expected_fp × 1.5:                // ≤ 7.5 → likely zero mutuals
        return ZERO_MUTUALS                     // Skip full intersection
    else:
        return NEEDS_INTERSECTION               // Load full list, intersect

Bloom Filter Sizing

ConnectionsFPRFilter SizeHash Functionsvs Full List
2001%240 bytes715× smaller
5001%598 bytes76.7× smaller
5,0001%5.9 KB76.7× smaller
25,0001%29.3 KB76.7× smaller

Batch Pipeline with Bloom Pre-check

sequenceDiagram participant C as Client participant M as Mutual Service participant RB as Redis (Bloom) participant RA as Redis (Adj) participant P as Profile Service C->>M: POST /batch (20 targets) M->>RA: GET viewer adjacency list RA-->>M: 500 IDs (4KB) M->>RB: MGET 20 target bloom filters RB-->>M: 20 blooms (~12KB total) Note over M: Probe viewer IDs against
each bloom. 14 return ZERO.
6 return NEEDS_INTERSECTION. M->>RA: MGET 6 target adj lists RA-->>M: 6 lists (~24KB) Note over M: Hash-set intersect viewer
set against 6 targets. M->>P: Enrich 12 sample mutuals P-->>M: Names + photos M-->>C: 20 results (14 zero, 6 with counts)

Impact: Bloom Saves 60% of Redis Operations

Without bloom: 21 Redis calls (1 viewer + 20 target lists), 84KB transferred. With bloom: 8 Redis calls (1 viewer + 1 MGET blooms + 6 selected lists), 40KB transferred. At 100K+ batch QPS, that's hundreds of thousands of saved Redis calls per second.

07

Key Design Decisions & Tradeoffs

1. Query-Time Intersection vs. Precomputed Tables

✓ Chosen

Query-Time Intersection

Pull both sorted lists from Redis, intersect in-memory. No precomputed state to maintain when graph changes. Sub-50ms with cached lists.

✗ Alternative

Precomputed Mutual Table

Trillions of possible pairs. Each new connection invalidates up to 250K entries. 5M connections/day → 1.25 trillion recomputations/day. Impossible.

2. Serialized Sorted Arrays vs. Redis ZSET

✓ Chosen

Sorted Byte Arrays

500 IDs × 8 bytes = 4KB per user. 310M active users = 1.2TB. Dense, cache-friendly, always read as full list.

✗ Alternative

Redis ZSET

~70 bytes overhead per entry → 35KB per user → 10.8TB total. 9× more memory for data structures we never query individually.

3. Application-Layer vs. Redis-Side Intersection

✓ Chosen

Application Layer (Mutual Service)

Enables viewer-list reuse across batch. Supports adaptive algorithm selection. No cross-shard constraints.

✗ Alternative

Redis SINTER

Both sets must be on same shard — violated 99% of time with user-ID sharding. Can't exploit batch viewer reuse.

4. Bloom Filter: Statistical Threshold vs. Ultra-Low FPR

✓ Chosen

Statistical Threshold (1% FPR)

598-byte filters, 7 hash functions. Count hits vs. expected FP. ~90% skip rate. 185GB total storage.

✗ Alternative

Ultra-Low FPR (0.01%)

1.2KB filters, 13 hash functions. 95% skip rate. 370GB storage. Marginal improvement doesn't justify doubled cost.

5. Split Storage vs. Combined Keys

✓ Chosen

Separate Keys (adj + bloom)

Batch pipeline fetches 20 blooms (12KB) first, then selectively fetches ~6 adj lists (24KB). 70% of the time we never load the adj list.

✗ Alternative

Combined Key

Pulls adjacency list every time — even for zero-mutual cases. Wastes 308 MB/s bandwidth at scale.

6. Exact vs. Approximate Counts

✓ Chosen

Exact for Display, MinHash for Ranking

"47 mutual connections" must be exact — it's social proof. PYMK pipeline uses MinHash (~9% error) to score 5,000 candidates cheaply, exact only for top 200.

✗ Alternative

Approximate Everywhere

Users notice wrong counts. "~50" undermines trust. Exact intersection is fast enough (sub-50ms) for display surfaces.

08

What Can Go Wrong

Redis Shard Failure

One shard down → 5% of single queries affected, but 40% of batch requests (probability at least 1 of 20 targets is on dead shard). Mitigation: Replica promotion in 5–15s, DB fallback with rate limiting, client renders profiles without mutuals for affected targets.

Kafka Consumer Lag — Stale Mutual Counts

30-minute lag → 208K users with stale adjacency lists (0.35% of DAU). Most won't notice, but "just connected" users will. Mitigation: 1-hour TTL safety net bounds staleness. Freshness tokens for read-your-own-writes bypass. Consumer lag alerting.

Bloom Filter Corruption

If bloom becomes a subset of reality (missing bits), it returns false "zero mutuals" — the dangerous case. Mitigation: Atomic rebuild (adj + bloom together via Redis pipeline). Bloom TTL shorter than adj TTL (45min vs 60min). FP-rate monitoring detects corrupted filters.

Power-User Thundering Herd

Influencer's 240KB adjacency list TTL expires. 500 requests simultaneously miss cache, all hit Graph DB. Mitigation: Singleflight (request coalescing) — 500 requests collapse into 1 DB read. VIP cache warming pre-refreshes before TTL expiry. Stale-while-revalidate.

Graph DB Overload from Cache Miss Storm

Redis eviction event → thousands of cache misses → DB receives 50K QPS (sized for 1K). Mitigation: Rate-limit DB fallback at 5K QPS. Tiered degradation: return nulls for uncached targets. Token-bucket controlled cache backfill.

Celebrity Hot-Key in Redis

Viral CEO profile → 1M reads/sec on single Redis shard (capacity: 100K). Latency spikes for all keys on that shard. Mitigation: Local LRU cache (30s TTL) on service instances. Hot-key detection → auto-replicate key to 8 shards. Redis read replicas.

Event Pipeline Data Loss

Kafka event lost → connection exists in DB but cache never invalidated. Mitigation: Transactional outbox pattern (event written in same DB transaction as connection). Hourly reconciliation job samples 10K users, rebuilds mismatched caches.

09

Interview Tips

💡
Lead with the combinatorial impossibility.
Within 2 minutes, before drawing any box: "Precomputing all pairs is 10¹⁸ entries. Each new connection invalidates thousands. So query-time intersection is the only option — the entire design follows from that constraint."
Name the three access patterns unprompted.
Single profile view (100ms budget), bulk count for search cards (50ms, batch of 20), PYMK ranking pipeline (seconds, thousands of candidates). Each needs a different optimization. When you say "the viewer's list is reused across all 20," you've revealed the key insight.
🎯
Don't say "just use a graph database."
Neo4j can express the query elegantly but can't hit 100K QPS at sub-50ms. It's the persistence layer, not the query engine. The hot path runs through Redis with app-layer intersection.
🔢
Know the Bloom filter compound FPR trap.
1% per-element FPR × 500 probes = 99.3% compound FP rate. The bloom is useless as a yes/no gate. Bring up the statistical threshold approach proactively — if the interviewer catches this before you do, it's a red flag.
📊
Derive scale from product behavior, not thin air.
"60M DAU × 50 lookups/session × 3–4× peak = ~140K QPS." Trace the logic so they can verify. Split count vs. list — shows you understand the different performance profiles.
🏗️
Show the single-machine baseline first.
"1M users × 500 connections × 8 bytes = 4GB. Fits in one machine. Sorted merge takes 5μs. We distribute for storage (5TB), not CPU." Demonstrates you add complexity only when data forces it.
🧪
Prepare for "now add 2nd-degree connections."
"That's union of 500 connection lists (250K candidates), not intersection of 2. Can't be real-time — needs batch precomputation. Our mutual system handles the real-time layer; 2nd-degree is a batch layer on top."
11

Evolution

How this design grows from MVP to planet-scale. Each transition is triggered by a specific bottleneck — not arbitrary growth.

1

MVP — SQL JOIN, No Caching (<500K users)

Single PostgreSQL database. SELECT friend_id FROM connections c1 JOIN connections c2 ON c1.friend_id = c2.friend_id. Works in <10ms for small lists. Build and ship in a day. Trigger: P95 latency exceeds 100ms as connection density grows.

2

Cache Layer — Redis + Batch API (500K–10M users)

Redis caches adjacency lists. App-layer intersection. Batch endpoint (viewer-list reuse). Direct cache invalidation on connection changes. 2–3 Redis nodes. Trigger: DB fallback overload from growing cache-miss traffic.

3

Events + Bloom Filters (10M–100M users)

Kafka decouples invalidation. Bloom filters skip 70% of list loads. Adaptive intersection algorithm. Profile Service for enrichment. Trigger: Power-user latency spikes, Graph DB needs sharding.

4

Sharding + Power-User Optimizations (100M–500M users)

Sharded Graph DB (5–10 primaries). VIP cache warming. Singleflight for thundering herds. Roaring bitmaps for large-large intersections. Local LRU cache for hot keys. Trigger: PYMK pipeline cost, multi-region latency requirements.

5

Planet Scale — Multi-Region + Approximate Methods (500M–1B+ users)

Regional Redis clusters (US, Europe, APAC, India). MinHash signatures for PYMK scoring. Tiered storage (hot/warm/cold). Community-based graph partitioning. CSR in-memory graph index for hottest users. You're LinkedIn.

Next up