Concept · Databases

SQL vs NoSQL

01

Why this matters

Picking the wrong database is the most expensive architectural mistake you can make early. You can change a cache vendor in a weekend; you can't change your database in a quarter. Interviewers want to see a justified choice, not "Postgres because I know it" or "Cassandra because it's web-scale."

The real answer is almost never "pure SQL" or "pure NoSQL" — it's pick the right tool per workload. A checkout flow needs ACID (SQL). A viral post counter needs horizontal writes (NoSQL). Most real systems run both.

02

The real dividing line

"SQL vs NoSQL" is a misleading name. The actual tradeoffs are:

  • Schema flexibility — SQL: strict, enforced at write time. NoSQL: flexible, interpreted at read time.
  • Transactions — SQL: multi-row, multi-table ACID transactions. NoSQL: typically single-row atomicity only.
  • Joins — SQL: the engine joins for you. NoSQL: you pre-join at write time (denormalize) or join in application code.
  • Scale model — SQL: scale reads with replicas, writes via sharding (hard). NoSQL: horizontal scaling built-in from day one.
  • Query language — SQL: one language, 50 years of tooling. NoSQL: per-vendor APIs.

None of this is "better." They're different tradeoff choices. If you need multi-entity transactions, you need SQL. If you need 500k writes/sec on one table, you need a distributed NoSQL.

03

The comparison matrix

DimensionSQL (Postgres, MySQL)NoSQL (Cassandra, Dynamo, Mongo)
SchemaFixed, migrations requiredFlexible, per-document
Consistency defaultStrong (ACID)Eventual (BASE) — tunable in some
Multi-row txnYes, freeNo, or expensive (multi-partition)
JoinsYes, engine does itNo — you denormalize
Write throughput (single node)~10k /secSame per node
Horizontal write scaleHard (sharding is an ordeal)Easy, built-in
Query flexibilityAny query, any index, ad-hocQuery must match your key design; ad-hoc is slow or impossible
Operational maturity50 years; every tool exists10–15 years; tooling catching up
04

When to pick each

Pick SQL when

Relationships + transactions matter

Orders ↔ items ↔ users ↔ inventory — you need multi-table atomic writes. Payments, inventory, bookings, subscriptions, anything finance-adjacent. Your data model has clear entities. Your queries change unpredictably over time. You're under ~10k writes/sec on the hot table.

Pick NoSQL when

Huge volume + simple access patterns

Time-series data, event logs, feed entries, session state, IoT telemetry, user activity. Known access patterns (always look up by user ID, always fetch newest 100). Write throughput exceeds a single SQL node. Reads don't need joins. You're okay with eventual consistency for this data.

The polyglot answer

"Postgres for the order ledger (ACID, relational). Cassandra for the event stream (volume, write throughput). Redis for session state (speed). Elasticsearch for search (inverted index). One datastore never fits all." Interviewers like this.

05

Deep dive — the two fallacies

Fallacy 1: "NoSQL scales better." Per-node write throughput is roughly the same (~10k/sec) on both. What differs is how they shard. SQL sharding is manual, often painful, and loses some features (cross-shard joins, cross-shard transactions). NoSQL sharding is automatic. So NoSQL scales more easily, not inherently faster.

Fallacy 2: "SQL is slow." Postgres with good indexes handles 50k+ reads/sec. MySQL with proper caching goes higher. The "slow" reputation comes from naive usage — missing indexes, N+1 queries, long-running transactions — not the database itself.

Real advantage of NoSQL: known, narrow access patterns. Cassandra's data model forces you to design your tables around your queries ("query-driven modeling"). If your queries never change and performance must be predictable, this is a superpower. If your queries change weekly as product evolves, it's a straitjacket.

Bulk insert via Postgres COPY
-- Single-row INSERTs: ~5,000/sec
-- Batched INSERT multi-VALUES: ~50,000/sec
-- COPY FROM STDIN: ~500,000/sec (100x faster)

-- Client side (psycopg):
with conn.cursor() as cur:
    with cur.copy("COPY events (user_id, ts, type) FROM STDIN") as copy:
        for row in rows:
            copy.write_row((row.user_id, row.ts, row.type))

-- Caveats:
-- * No triggers or constraints fire per-row as they do with INSERT
-- * Entire COPY is single transaction; huge copy = huge undo log
-- * Common technique: COPY to staging table, then INSERT INTO main FROM staging
06

Real-world choices

Stripe

Postgres everywhere

Payments, ledger, customer data. They shard manually for scale. Transactions matter more than write throughput — Postgres wins.

Netflix

Cassandra for viewing history

Billions of events/day. Simple access pattern (fetch by user). Never joins. Eventual consistency acceptable.

Uber

Hybrid — Postgres + MySQL + Cassandra + Redis

Postgres for trip records (ACID required). Cassandra for driver location history (volume). Redis for active driver index (speed).

Instagram

Sharded Postgres

~1B users on sharded Postgres. Proves SQL can scale — but required significant investment in custom sharding infra.

07

Used in problems

URL shortener (Postgres + Redis), News feed (Cassandra + Postgres + Redis), E-commerce (Postgres for orders, Redis for session), Reddit (sharded Postgres for posts/comments), Uber (all of them).

Next up