Kafka Internals

01

Why this matters

Kafka is the backbone of event streaming at every serious company. 500k+ messages/sec per broker, years of retention, replayable. In interviews, "we'll use Kafka" isn't an answer — how Kafka works is. Partitions, consumer groups, offsets, ISR, leader election — understand these and you can reason about throughput, ordering, delivery, and failover without hand-waving.

02

The core abstraction — a distributed log

A Kafka topic is a partitioned, distributed, replicated commit log. Producers append; consumers read by offset; both scale horizontally.

Topic — named stream of messages (e.g. orders).
Partition — one topic splits into N partitions. Each partition is a single append-only log file on disk. Messages within a partition are strictly ordered.
Offset — a monotonically-increasing integer. Consumer reads from an offset; commits progress.
Broker — a Kafka server. A cluster has many; each hosts some partitions.
Replication — each partition replicated to N brokers (e.g., RF=3). One is leader; others are followers.

03

The write path

Producer wants to send message M with key K to topic T:

Partition selection. Producer computes hash(K) mod num_partitions(T). Same key → same partition → order preserved per key.
Batching. Producer batches messages by partition (up to linger.ms or batch.size). Sent as one network call.
Leader write. Leader broker for the partition appends to its log file, returns an offset.
Replication. Followers pull from leader, append to their own logs. Once followers in the ISR (in-sync replica set) ack, message is committed.
Producer ACK modes. acks=0 (don't wait — fastest, can lose), acks=1 (leader ack only — can lose on leader crash), acks=all (all ISR — durable but slower).

Top-K with heap + CMS

import heapq
import mmh3

class TopK:
    """Track top K heavy hitters in a stream using CMS + min-heap."""
    def __init__(self, k=10, cms_width=10**4, cms_depth=4):
        self.k = k
        self.cms = CountMinSketch(cms_width, cms_depth)
        self.heap = []  # min-heap of (count, item)
        self.in_heap = set()

    def add(self, item):
        self.cms.add(item)
        count = self.cms.estimate(item)
        if item in self.in_heap:
            # Rebuild heap with updated count (simple)
            self.heap = [(c, i) for c, i in self.heap if i != item]
            heapq.heappush(self.heap, (count, item))
        elif len(self.heap) < self.k:
            heapq.heappush(self.heap, (count, item))
            self.in_heap.add(item)
        elif count > self.heap[0][0]:
            _, removed = heapq.heapreplace(self.heap, (count, item))
            self.in_heap.remove(removed); self.in_heap.add(item)

    def top(self):
        return sorted(self.heap, reverse=True)

~500k

msgs/sec per broker

1 KB

typical message size

100s–1000s

partitions per topic (heavy-use)

RF=3

standard replication factor

7 days

default retention

Partition Layout — RF=3 SVG

04

Consumer groups

A consumer group is a set of consumers sharing the workload. Kafka's coordinator assigns each partition to exactly one consumer in the group.

10 partitions, 10 consumers in a group → each consumer owns 1 partition.
10 partitions, 5 consumers → each owns 2.
10 partitions, 15 consumers → 5 consumers idle (partitions are the unit of parallelism).
Different groups are independent. Group analytics and group billing both read every message — each group has its own offsets. Pub/sub for free.
Rebalance. When a consumer joins or leaves the group, partitions are re-divided. Causes a brief pause; the main operational pain.

Partition count = maximum parallelism per consumer group. Too few partitions = workers idle. Too many = rebalance cost + broker overhead. 100–1000 partitions per heavily-used topic is typical.

05

Deep dive — why Kafka is fast

Kafka's throughput comes from four design choices, each exploiting the OS rather than fighting it:

1. Sequential disk I/O. Appending to the end of a log file is the fastest thing a spinning disk or SSD can do. ~600 MB/s sequential on a modern SSD. Random I/O is 100× slower. Kafka only appends.

2. Page cache as buffer. Writes go to the OS page cache; reads served from it. Linux flushes to disk in the background. Kafka doesn't maintain its own memory cache — it trusts the OS to do that. This is counter-intuitive; most DBs do the opposite and lose.

3. Zero-copy sends. When a consumer fetches, Kafka uses sendfile() to transfer bytes from page cache directly to the socket, bypassing user space. No CPU copy; no serialization cost.

4. Batching everywhere. Producers batch. Brokers batch replication. Consumers batch fetches. Per-message overhead amortizes toward zero at high throughput.

Combined: a Kafka broker on commodity hardware does 500k msgs/sec — roughly 50× what a naive DB could manage. No magic, just alignment with what the kernel is already good at.

The interview answer

"Partitioned append-only log. Producers hash-partition; messages within a partition are strictly ordered. Replication via pull (followers fetch from leader). Consumer groups share partitions one-per-consumer. Throughput from sequential I/O + page cache + zero-copy sendfile."

06

Real-world

Originator

Built Kafka for activity streams. Now runs 1T+ messages/day across thousands of brokers.

Uber

Message bus

Trip events, driver locations, pricing updates — all on Kafka. Trillion messages per day.

Netflix

Event backbone

All user interaction events, 8M+ events/sec at peak. Powers recommendation, A/B testing, operational metrics.

Confluent Cloud

Managed Kafka

What most teams pick in 2025 to avoid the ops burden. Same API, SLA-backed, auto-scaling.

07

Used in problems

News feed publishes post-events to Kafka for fan-out processing. Notification system uses Kafka per-channel topics. Distributed logging ingests via Kafka before landing in storage. Distributed queue patterns model themselves on Kafka.

📺

References & Videos

Apache Kafka Architecture Deep Dive

Hussein Nasser · 25 min

Kafka Explained in 10 Minutes

ByteByteGo · 10 min

Kafka Internals — How Kafka Works

Arpit Bhayani · 30 min

Apache Kafka Documentation

Apache Kafka

Using Apache Kafka

Cloudflare Blog