Concept · Networking & Delivery

Compression & Encoding

01

Why this matters

Sending 100 KB to a user costs 100ms over a 8 Mbps link. Compress it to 20 KB → 20ms. Compression is the cheapest performance win in your stack: zero infrastructure changes, ~80% of the bytes saved, milliseconds of CPU. Yet thousands of APIs still ship uncompressed JSON.

The choice between gzip, brotli, zstd, snappy, lz4, protobuf, and msgpack hinges on workload: storage vs network, hot path vs cold archive, request body vs response body, server vs CDN edge.

02

The compression algorithm matrix

AlgorithmRatioSpeed (compress / decompress)Best for
gzip / deflate~70% size reduction on text~100 MB/s / ~400 MB/sUniversal default; HTTP compatibility
brotli~75-80% (best for text)~30 MB/s / ~400 MB/sPre-compressed static assets at the CDN
zstd (Zstandard)~70-78% tunable~500 MB/s / ~1500 MB/sModern default — fast and good ratio
snappy~50% — speed-prioritized~500 MB/s / ~1700 MB/sInternal RPC, on-disk LSM blocks
lz4~50%~750 MB/s / ~5000 MB/sFastest decompression — backups, RocksDB
xz / lzma~85% (densest)~5 MB/s / ~80 MB/sCold archives — Linux package mirrors, scientific data
03

Compression vs serialization

Two distinct savings:

  • Better serialization picks a denser format. JSON → Protobuf / MessagePack / Avro saves 50-80% before any compression. The data layout itself is more compact (binary integers, no field names per record, schema-aware).
  • Compression applies an algorithm to whatever bytes you have. Compresses repetition, common substrings.

Combine them: Protobuf + zstd on a wire payload typically lands 90-95% smaller than JSON without compression. Used by every high-performance internal RPC system (gRPC, Cap'n Proto, Thrift).

04

Where to compress

  • HTTP responses — gzip or brotli via Content-Encoding header. Browsers all support both. Set Accept-Encoding negotiation. Static asset CDNs (Cloudflare, Fastly) compress automatically.
  • HTTP request bodies — same headers, but most clients don't compress requests. Worth doing for large POST bodies (file uploads, batch APIs).
  • Internal RPC — gRPC compresses per-message via grpc-encoding. Snappy or zstd typical inside datacenters where CPU is cheaper than NIC bandwidth.
  • Database storage — Postgres TOAST, MySQL InnoDB row compression, Cassandra/RocksDB SSTable compression. Trade CPU for disk + cache.
  • Logs + metrics — structured logs compress 10:1 with zstd. Pipe-and-compress before shipping to S3 / log aggregator.
  • Backups + archives — xz or zstd-19 for max ratio. Restore is rare; size matters more than compression speed.
05

Deep dive — zstd's underrated win

Facebook open-sourced zstd in 2016. It's nearly Pareto-optimal: at any compression level, zstd matches or beats gzip on both ratio and speed. Yet most stacks still default to gzip out of inertia.

What zstd does well:

  • Tunable — levels 1 (fastest) to 22 (densest). Choose per-workload.
  • Streaming — works on chunks; no need to buffer whole file.
  • Trained dictionaries — pre-train a dictionary on representative data; new compression of similar data is 2-5× denser. Killer for compressing many small JSON payloads with similar shape.
  • Hardware acceleration — Intel/AMD instructions for zstd; throughput exceeds memory bandwidth in some configurations.

Brotli still beats zstd slightly on web text (HTML, CSS, JS) at high levels — Brotli's dictionary is hand-tuned for the web. For everything else, zstd wins.

2025 defaults

Static web assets: brotli (precompressed at build, served with Content-Encoding). API responses: gzip (universal compatibility) or brotli (modern clients). Internal RPC: snappy or zstd. Logs/backups: zstd. Cold archives: xz or zstd-19.

06

Real-world

Cloudflare / Vercel

Brotli + gzip auto

Edge precompresses both formats; serves whichever the client accepts.

Kafka

Snappy / lz4 / zstd per topic

Producer compresses message batches. Storage and network savings; CPU paid once per batch.

Cassandra / RocksDB

Snappy / lz4 / zstd per SSTable

Disk savings 2-5×; read overhead minimal. Default snappy or lz4 for hot, zstd for cold.

Linux kernel / btrfs / zfs

Native zstd compression

Filesystem-level. Transparent to apps. 30% less disk for typical workloads at minimal CPU cost.

07

Used in problems

News feed serves gzip + brotli responses at the edge. YouTube/Netflix uses brotli for manifests, codec-specific compression for media. Distributed logging compresses log batches with zstd before shipping. Video conferencing uses real-time codecs (a different compression class entirely).

Next up