Problem Statement
Given any two users on a professional social network like LinkedIn, efficiently find and display their shared (mutual) connections. It sounds simple — a set intersection. User A's connections ∩ User B's connections. You could write it in one line of SQL.
But at LinkedIn's scale, that one line becomes one of the hardest problems in social networking infrastructure. Before the page finishes rendering, the system has already answered: "You and this person share 47 connections — here are their faces." That query ran in under 50ms, against a graph with over 1 billion nodes and 250 billion edges.
Mutual connections don't just appear on profiles. They show up on search result cards (10–20 per page), in "People You May Know" feeds, in connection request screens, in messaging previews. A single user session might trigger 50–100 mutual connection lookups without the user even thinking about it.
Core question: How do you make set intersection feel instant at billion-user scale, when the underlying graph is constantly changing and every page view demands multiple intersection queries?
The Four Sub-Problems
The Intersection Problem
Computing the overlap between two potentially large sets (power users have 30,000+ connections) without scanning everything at query time.
The Freshness Problem
Connections change constantly (~5M new connections/day). Stale mutual counts erode user trust in the product.
The Fan-out Problem
A single page load can require mutual computation against dozens of profiles simultaneously — search results, PYMK, feed cards.
The Asymmetry Problem
The viewer's connection list is reused across all lookups in a session, but each target profile's list is different. This reuse is the key optimization.