Message Queue vs Pub/Sub

01

Why this matters

Two teams both say "we use a queue." One means point-to-point work distribution (one message, one consumer takes it, runs a job). The other means broadcast (one event, N subscribers each process it). These have different correctness properties, different failure modes, different products. Confusing them has wrecked many systems.

In interviews: "would you use SQS or Kafka for this?" is really asking if you understand the difference.

02

The two shapes

Message queue (point-to-point)

One consumer wins each message

Producer enqueues, N consumers compete. Each message is delivered to exactly one consumer. Queue is a work distribution mechanism. SQS, RabbitMQ (classic queues), Redis BLPOP. Used for: job queues, email sending, image processing.

Pub/Sub (broadcast)

Every subscriber gets every message

Publisher emits, all N subscribers receive. Each subscriber processes independently. Pub/Sub is a fan-out mechanism. Kafka (topic + consumer groups), SNS, Redis pub/sub, NATS. Used for: event streaming, cache invalidation, audit logs.

The confusing middle

Kafka is technically pub/sub, but a "consumer group" acts like a queue (each partition consumed by one member of the group). So Kafka does both — which is why it's everywhere. RabbitMQ also does both via exchange types. The difference is per-usage, not per-product.

03

Queue semantics

Durability — messages persist (disk) until ACKed. Crashed consumer → message redelivered.
Visibility timeout — when a consumer picks up a message, it's invisible to others for N seconds. If the consumer ACKs, gone; if not, re-queued. Prevents duplicate work unless the consumer crashes.
At-least-once delivery — on retries, a message may be delivered twice. Consumers must be idempotent.
FIFO vs unordered — SQS has both (standard = unordered, higher throughput; FIFO = strict order, lower throughput). RabbitMQ is FIFO within a queue.
Dead letter queue — messages that fail N times move to a DLQ for inspection. Critical for production.

Simple in-memory pub/sub with backpressure

import asyncio
from collections import defaultdict

class PubSub:
    def __init__(self, buffer_size=1000):
        self.channels = defaultdict(list)  # topic -> [asyncio.Queue]

    def subscribe(self, topic):
        q = asyncio.Queue(maxsize=1000)
        self.channels[topic].append(q)
        return q

    async def publish(self, topic, msg):
        dead = []
        for q in self.channels[topic]:
            try:
                q.put_nowait(msg)   # drop on backpressure, don't block
            except asyncio.QueueFull:
                dead.append(q)       # slow consumer → disconnect
        for q in dead: self.channels[topic].remove(q)

# Real systems (Redis pub/sub, Kafka) follow this shape but with durability
# + cluster-wide routing. In-memory version is the mental model.

04

Pub/Sub semantics

Topics. Producers publish to a topic name. Consumers subscribe by topic.
Retention. Kafka retains messages by time (7 days default). Subscribers can replay from any offset. SNS + most pub/sub systems deliver and forget.
Consumer groups (Kafka). Members of the same group share a topic's load (partition affinity). Two different groups each get all messages — the fan-out property.
Ordering — guaranteed within a partition (Kafka) or within a topic for one subscriber (RabbitMQ topic exchange). Not cross-partition.

05

When to reach for each

Use case	Shape	Product
Background jobs (thumbnails)	Queue	SQS, RabbitMQ, Sidekiq
Event streaming, audit log	Pub/Sub	Kafka
Email dispatch	Queue	SQS, RabbitMQ
Cache invalidation fan-out	Pub/Sub	Redis Pub/Sub, NATS
Multiple services react to an order	Pub/Sub	Kafka (per-consumer-group semantics)
Low-latency chat delivery	Pub/Sub	NATS, Redis Pub/Sub
Payment processing	Queue + idempotency keys	SQS FIFO, Kafka compacted

06

Deep dive — why Kafka ate the world

Kafka unified queue and pub/sub by storing messages in a distributed commit log. Every message appended to a partition file, retained by time. Consumers read by offset — they can rewind, replay, or skip.

Four properties this gives you for free:

Replay. A new service can consume messages from 7 days ago. Classic queues delete on ACK — no replay possible.
Multiple independent consumers. Add a second analytics pipeline; it reads from the same topic without affecting the first.
High throughput. Sequential append to disk is fast. A single Kafka broker handles ~500k messages/sec. Partitioning scales horizontally.
Exactly-once semantics (with transactional API) — because consumers track offsets, and Kafka supports atomic offset + message writes.

The cost: Kafka is heavy operationally. Zookeeper (or KRaft), brokers, topic planning, consumer lag monitoring. Don't reach for Kafka if SQS fits — SQS is fully managed and you pay per message. Reach for Kafka when you need replay, high throughput, or many independent consumers of the same stream.

07

Real-world

Stripe

SQS for jobs, Kafka for events

Webhook delivery uses SQS (point-to-point retry). Event audit log uses Kafka (long retention, many consumers).

Uber

Kafka for everything streaming

Billions of events/day — trip updates, driver locations, analytics. Kafka's throughput and replay are non-negotiable at Uber's scale.

Shopify

Kafka for order processing

Pub/Sub model: order placed → cart, inventory, shipping, analytics, email services all consume independently.

Discord

Pub/Sub for chat delivery

Message written → published to a channel topic → all connected user sessions subscribed to the channel receive it.

08

Used in problems

News feed uses Kafka for fan-out events. Notification system uses queue (per-channel delivery). Distributed job scheduler uses Kafka as the task bus. WhatsApp uses pub/sub for broadcasting messages to active sessions.

📺

References & Videos

Message Queue vs Pub/Sub

ByteByteGo · 10 min

Kafka, RabbitMQ, SQS — Messaging Compared

Hussein Nasser · 25 min

Message Queues — System Design

Gaurav Sen · 12 min

Message Queue vs Pub/Sub

ByteByteGo Blog

What is a Message Queue?

AWS

Why this matters

The two shapes

One consumer wins each message

Every subscriber gets every message

Queue semantics

Pub/Sub semantics

When to reach for each

Deep dive — why Kafka ate the world

Real-world

SQS for jobs, Kafka for events

Kafka for everything streaming

Kafka for order processing

Pub/Sub for chat delivery

Used in problems

References & Videos

Kafka Internals

Delivery Guarantees

News Feed