Concept · Machine Learning Systems

Vector Databases

01

Why this matters

The query is "movies like Inception." Postgres can't help — there's no SQL operator for "semantically similar." With embeddings (vectors of 384-1536 floats representing meaning), the answer is: nearest neighbors of Inception's vector. But finding the nearest among 100 million vectors with brute force is ~30 GB of math per query. Vector databases use approximate-nearest-neighbor (ANN) indexes to do it in 5-50ms.

Every LLM-powered app uses one — RAG (retrieval-augmented generation), semantic search, recommendation systems, dedup, image search. Pinecone, Weaviate, pgvector, Milvus all implement the same core idea differently.

02

What an embedding is

An embedding is a dense vector that represents the meaning of something — a word, a sentence, a product, an image. Trained so that similar things have similar vectors (cosine distance close to 1 / L2 distance close to 0).

  • "king" + "woman" - "man" ≈ "queen" (the famous word2vec example).
  • OpenAI text-embedding-3-small outputs 1536-dim vectors for any text.
  • CLIP outputs vectors that encode both images and text in the same space — "a red bicycle" and a photo of a red bike land near each other.

"Find similar" reduces to "find nearest neighbors in vector space." That's the whole game.

03

ANN — approximate nearest neighbor

Exact nearest neighbor over 100M vectors = compute distance to every vector = tens of seconds per query. Useless.

ANN indexes pre-organize vectors so most aren't scanned at query time. Three main families:

AlgorithmHowBuild costRecall
HNSWHierarchical graph; query walks down layers narrowing inO(N log N), high RAM~95-99% (default for most VDBs)
IVF (inverted file)Cluster into K buckets via k-means; query scans nearest bucketO(N), tunable~85-95%
Product Quantization (PQ)Compress vectors to 8-bit codes per sub-spaceBig RAM savingsSome recall loss; often combined with IVF/HNSW
ScaNN (Google)Anisotropic vector quantization + treeBest recall/latency tradeoffUsed inside Google Search
04

The vector-DB landscape

Pinecone

Managed, simplest

SaaS only. HNSW under the hood. Pay per pod-hour. Tight integration with LangChain/LlamaIndex.

Weaviate / Qdrant / Milvus

Self-hosted OSS

Run your own. HNSW + filtering, multi-tenancy, GraphQL APIs (Weaviate). Most flexible.

pgvector

Postgres extension

Stores vectors as Postgres column. IVF + HNSW. Underrated — for < 10M vectors, no separate DB needed.

Elasticsearch / OpenSearch dense_vector

Existing search engine + vectors

Add HNSW alongside inverted-index search. Use when you already run Elasticsearch.

05

Deep dive — RAG, the killer use case

Retrieval-Augmented Generation is why every team is suddenly buying a vector DB:

  1. Your company has 10,000 internal documents.
  2. Embed each doc (or chunks of it) → 10k vectors stored in a vector DB.
  3. User asks: "What's our refund policy?"
  4. Embed the question → query vector.
  5. Vector DB returns top-5 nearest doc chunks (~10ms).
  6. Build LLM prompt: "Given these docs: [chunks], answer: [question]"
  7. LLM generates an answer grounded in your docs, not in its training data.

This is why every chatbot ever shipped in 2024-2025 has a vector DB behind it. The pattern: vector DB for retrieval, LLM for synthesis. Costs: ~$0.0001 per query for the embedding, ~$0.001 per query for the LLM call. Cheap enough to be free for most use cases.

Interview answer

"For semantic search and RAG we use a vector DB with HNSW indexing. Documents embedded once via OpenAI embeddings; queries embedded at request time and ANN-searched in ~10ms. Top-5 chunks injected into the LLM prompt for grounded answers."

06

Real-world systems

Spotify Annoy

2014-era ANN

Random projection trees. Used internally for music recommendation candidate generation. Open-sourced.

Facebook FAISS

The reference library

C++ ANN library with every algorithm under the sun. Most vector DBs use FAISS or its derivatives internally.

Google MUVERA / ScaNN

Production ANN

Powers Search + YouTube candidate generation. ScaNN is the open-source library; MUVERA the internal serving stack.

OpenAI File Search

Vectors-as-a-service

Hosted vector store with chunking + embedding + RAG built in. Skip ops entirely.

07

Used in problems

Typeahead can use vector search for semantic-similarity completions. Recommendation algorithm uses ANN over user/item embeddings as candidate generation. Web crawler can dedup near-identical pages via embedding similarity.

Next up