05
Deep dive — the two fallacies
Fallacy 1: "NoSQL scales better." Per-node write throughput is roughly the same (~10k/sec) on both. What differs is how they shard. SQL sharding is manual, often painful, and loses some features (cross-shard joins, cross-shard transactions). NoSQL sharding is automatic. So NoSQL scales more easily, not inherently faster.
Fallacy 2: "SQL is slow." Postgres with good indexes handles 50k+ reads/sec. MySQL with proper caching goes higher. The "slow" reputation comes from naive usage — missing indexes, N+1 queries, long-running transactions — not the database itself.
Real advantage of NoSQL: known, narrow access patterns. Cassandra's data model forces you to design your tables around your queries ("query-driven modeling"). If your queries never change and performance must be predictable, this is a superpower. If your queries change weekly as product evolves, it's a straitjacket.
Bulk insert via Postgres COPY
-- Single-row INSERTs: ~5,000/sec
-- Batched INSERT multi-VALUES: ~50,000/sec
-- COPY FROM STDIN: ~500,000/sec (100x faster)
-- Client side (psycopg):
with conn.cursor() as cur:
with cur.copy("COPY events (user_id, ts, type) FROM STDIN") as copy:
for row in rows:
copy.write_row((row.user_id, row.ts, row.type))
-- Caveats:
-- * No triggers or constraints fire per-row as they do with INSERT
-- * Entire COPY is single transaction; huge copy = huge undo log
-- * Common technique: COPY to staging table, then INSERT INTO main FROM staging