Database Types

01

Why this matters

"We'll use a database" is not an architecture decision — it's punting. Every system design interview eventually asks: Postgres or Cassandra? Redis or DynamoDB? Mongo or Elasticsearch? The answer is never "it depends" alone — it's "this workload needs X because Y."

This page is the decision matrix. Learn the shape of each, the failure modes of each, and which real workloads fit each.

02

The seven families

Family	Exemplars	Best for	Hard no
Relational (SQL)	Postgres, MySQL, CockroachDB	Transactions, complex queries, strong consistency	Huge write throughput on one table without sharding
Key-value	Redis, Memcached, DynamoDB	Caches, sessions, hot lookups by single key	Any query that's not by key
Document	MongoDB, Couchbase, DynamoDB	Flexible schema, nested objects, per-object CRUD	Heavy relational queries, multi-doc ACID
Wide-column	Cassandra, HBase, ScyllaDB	Time-series, massive writes, known access patterns	Ad-hoc queries, joins, strong consistency
Graph	Neo4j, JanusGraph, Amazon Neptune	Relationship-heavy data (social graph, fraud detection)	Simple CRUD, huge scale
Search (inverted index)	Elasticsearch, OpenSearch, Solr	Full-text search, faceting, relevance ranking	Source-of-truth storage (use as secondary only)
Time-series	InfluxDB, TimescaleDB, Prometheus	Metrics, monitoring, IoT telemetry	Transactional workloads, low cardinality

03

The ten databases you must know

PostgreSQL

Relational — the default

ACID, mature, JSON support, full-text search built in, extensions for everything. Scale limit ≈ single-box writes (~10k–50k/sec). Shard or switch when you outgrow.

MySQL

Relational — operational simplicity

Simpler than Postgres, slightly less featureful. Huge operator knowledge. Vitess shards it at YouTube scale.

Redis

In-memory KV + data structures

0.5ms reads, 1M ops/sec per node. Sessions, caches, leaderboards (sorted sets), pub/sub, rate limiting. Data in RAM — expensive for huge datasets.

Memcached

Pure KV cache

No persistence, no data structures, no replication. Just a cache. Sub-millisecond. Still relevant when you need simplicity.

DynamoDB

Managed KV/document, any scale

Single-digit ms at any throughput. Primary/sort keys drive data model. Expensive at large scale but zero ops.

Cassandra

Wide-column, write-heavy

Leaderless, linearly scalable writes. Discord, Netflix, Instagram. Eventual consistency by default. Hard to operate well.

MongoDB

Document, flexible schema

BSON documents, rich query language, secondary indexes. Sharded clusters. ACID on single doc; multi-doc since 4.0 (slower).

Elasticsearch

Inverted index, search

Full-text + structured filtering + aggregations. NOT your source of truth — use alongside Postgres.

Neo4j

Graph — relationship-first

Traversals over millions of edges in milliseconds. Fraud detection, recommendation, social networks. Not a general-purpose DB.

Kafka

Distributed log

Technically not a DB — but often used as the system of record. Event sourcing, audit logs, stream processing. Retention from hours to years.

04

The decision tree

Do you need ACID transactions across multiple rows? → Postgres/MySQL. Nothing else does this well.
Do you need single-key lookups in < 1ms? → Redis (in-memory) or DynamoDB (managed).
Do you need > 100k writes/sec on one logical dataset? → Cassandra or DynamoDB. Avoid SQL sharding unless the team has scar tissue.
Do you need full-text search with relevance scoring? → Elasticsearch. Alongside, never instead of, a source-of-truth DB.
Is your data graph-shaped (traversals > joins)? → Neo4j or a graph extension.
Is your data timestamped metrics? → Prometheus (metrics) or TimescaleDB (analytics).
Default for greenfield apps? → Postgres. Can handle most workloads up to mid-scale and the migration path is well-trodden.

LSM-tree sketch (memtable + SSTables)

# LSM = Log-Structured Merge tree. Writes: append to memtable + WAL.
# Reads: check memtable → bloom filters → SSTables from newest to oldest.

class LSM:
    def __init__(self, memtable_limit=10**6):
        self.memtable = {}  # sorted map (in-mem B-tree in real engines)
        self.wal = WAL("lsm.wal")
        self.sstables = []  # list of on-disk immutable files, newest first

    def put(self, k, v):
        self.wal.append((k, v))
        self.memtable[k] = v
        if len(self.memtable) >= 10**6:
            self._flush()

    def _flush(self):
        sst = SSTable.from_dict(self.memtable)
        self.sstables.insert(0, sst)
        self.memtable = {}
        self.wal.rotate()

    def get(self, k):
        if k in self.memtable: return self.memtable[k]
        for sst in self.sstables:
            if k in sst.bloom:  # cheap probabilistic check
                v = sst.get(k)
                if v is not None: return v
        return None

# Compaction merges older SSTables; Cassandra, RocksDB, LevelDB all do this.

05

Deep dive — polyglot persistence, the reality

At scale, you don't pick one DB. You pick many. A realistic stack for a consumer social app:

Postgres — users, payments, account settings (relational + ACID)
Cassandra — feed posts, activity events (write volume + eventual OK)
Redis — session tokens, feed caches, rate limiters (sub-ms reads)
Elasticsearch — search (posts, users, hashtags)
S3 — media (images, video) with CDN in front
Kafka — event bus between all of the above

Each has one job it does 10× better than the alternatives. Trying to force Postgres to do search, or Elasticsearch to hold the ledger, loses ~90% of the specialized DB's value.

Operational reality

Running 6 databases means 6 on-call rotations, 6 failure modes, 6 backup strategies. Polyglot isn't free. Most companies pick 2–3 (Postgres + Redis + one of {Cassandra, Elasticsearch, Kafka}) and push each hard.

06

Used in problems

Every problem in this portfolio picks 2–5 DBs. The decision table in §4 is the framework; the specific choices are justified on each problem page.

📺

References & Videos

7 Database Paradigms

ByteByteGo · 10 min

SQL vs NoSQL — Which Database?

Hussein Nasser · 22 min

Which Database Model to Choose?

Why this matters

The seven families

The ten databases you must know

Relational — the default

Relational — operational simplicity

In-memory KV + data structures

Pure KV cache

Managed KV/document, any scale

Wide-column, write-heavy

Document, flexible schema

Inverted index, search

Graph — relationship-first

Distributed log

The decision tree

Deep dive — polyglot persistence, the reality

Used in problems

References & Videos

SQL vs NoSQL

Sharding

URL Shortener