03
Why sequential writes are a superpower
Random disk writes (updating a B-tree leaf in place) are slow because the disk arm (HDD) or SSD controller has to seek. Sequential writes are fast — just append to the end.
An HDD does ~100 random IOPS but ~100 MB/s sequential. A modern SSD does ~100k random IOPS but ~3 GB/s sequential. Every generation of storage has this gap.
WAL exploits this: writes are always sequential. The expensive random updates to the main data structures happen in big batches, asynchronously, when the system is idle. Commit latency = one fsync + sequential append. Orders of magnitude faster than "update the B-tree in place and fsync."
Write-ahead log append + flush
import json, os, time
class WAL:
def __init__(self, path, sync_every_ms=5):
self.f = open(path, "ab")
self.sync_every_ms = sync_every_ms
self.last_sync = time.monotonic() * 1000
def append(self, record):
line = json.dumps(record) + "\n"
self.f.write(line.encode())
now = time.monotonic() * 1000
if now - self.last_sync >= self.sync_every_ms:
self.f.flush()
os.fsync(self.f.fileno())
self.last_sync = now
# Group commit: batch records between fsyncs. Trade durability window
# (bounded by sync_every_ms) for dramatically higher write throughput.