Databases Compared — Comparison — System Design Portfolio

01

Why this matters

"Which database?" is the single most common design decision in every system design interview. Pick wrong and your design either can't scale, can't query, or can't stay consistent. PostgreSQL, MySQL, Cassandra, DynamoDB, and MongoDB each dominate a different workload shape.

This comparison arms you with a crisp mental model so you never wave your hands when the interviewer asks "Why not just use Postgres?"

02

Head-to-head comparison

Database	Consistency	Scaling model	Query flexibility	Ops complexity	Sweet spot
PostgreSQL	Strong (ACID, serializable)	Vertical + read replicas; Citus for horizontal	Full SQL, JSON, CTEs, window functions	Moderate (mature tooling)	General OLTP, complex queries, ACID-critical
MySQL	Strong (InnoDB ACID)	Vertical + read replicas; Vitess for sharding	Standard SQL, decent JSON support	Low-moderate	Web apps, read-heavy OLTP, proven at scale (Meta, GitHub)
Cassandra	Tunable (ONE to ALL quorum)	Horizontal — linear write scaling, shared-nothing	CQL (no joins, no aggregations)	High (compaction tuning, repair, tombstones)	Write-heavy time-series, IoT, activity feeds
DynamoDB	Eventual by default; strong reads optional	Horizontal — fully managed, auto-partitioned	Key-value + GSI; limited query patterns	Very low (serverless)	Serverless KV, session stores, high-scale simple access
MongoDB	Strong within replica set; tunable for sharded	Horizontal via sharding (config servers + mongos)	Rich document queries, aggregation pipeline	Moderate-high (shard key selection critical)	Flexible schema prototyping, content management

Key insight: Relational databases (Postgres, MySQL) give you query flexibility and strong consistency at the cost of harder horizontal scaling. NoSQL databases (Cassandra, DynamoDB, MongoDB) give you horizontal scaling at the cost of query constraints or consistency trade-offs.

03

Decision flowchart

Walk through these questions in order during your interview:

Do you need ACID transactions + complex queries (joins, aggregations)? → PostgreSQL. Default choice for most OLTP. Battle-tested, extensible, rich ecosystem.
Write-heavy time-series / append-only workload at massive scale? → Cassandra. Linear write scaling. Model data by query (denormalize aggressively). No joins.
Serverless, simple key-value / key-sort access patterns? → DynamoDB. Zero ops. Single-digit ms latency. Design your access patterns around partition key + sort key.
Rapidly evolving schema, document-shaped data? → MongoDB. Flexible schema for prototyping. Good aggregation pipeline. Careful with shard key selection for production.
Read-heavy web app, proven MySQL ecosystem (Vitess, ProxySQL)? → MySQL. Simpler than Postgres, massive community, proven at Meta/GitHub scale.

The interview answer: "Postgres for most OLTP. Cassandra for write-heavy time-series. DynamoDB for serverless KV. MongoDB for flexible schema prototyping."

04

When to pick what — deeper context

PostgreSQL — the default choice

If you don't have a specific reason NOT to use Postgres, use Postgres. It handles JSON (JSONB), full-text search, geospatial (PostGIS), and time-series (TimescaleDB) via extensions. Horizontal scaling via Citus or read replicas covers most workloads up to ~10TB. Beyond that, consider sharding or moving hot paths to a purpose-built store.

Cassandra — built for writes

Cassandra's LSM-tree storage engine is optimized for writes. It achieves linear write scaling by partitioning data across nodes with consistent hashing. The trade-off: you must model data by query pattern (no ad-hoc joins). Reads are fast only when you query by partition key. Time-series data (metrics, IoT, activity logs) is the canonical use case.

DynamoDB — zero ops at any scale

DynamoDB's single-table design pattern packs multiple entity types into one table using composite keys. This feels unnatural to SQL developers but enables single-digit-ms reads at any scale. Watch out for hot partitions (adaptive capacity helps but doesn't eliminate the problem) and GSI cost (each GSI is essentially a full table copy).

MongoDB — schema flexibility has a cost

MongoDB's flexible schema is a double-edged sword. Great for prototyping and evolving data models, but production systems need schema validation to prevent data rot. The aggregation pipeline is surprisingly powerful. Shard key selection is make-or-break: a bad shard key creates hot spots that are painful to fix post-deployment.

The polyglot persistence pattern

Real systems often use multiple databases: Postgres for transactional core + Redis for caching + Elasticsearch for search + Cassandra for analytics. Mention this pattern in interviews to show real-world awareness, but keep your primary store choice simple and justified.