The query is "movies like Inception." Postgres can't help — there's no SQL operator for "semantically similar." With embeddings (vectors of 384-1536 floats representing meaning), the answer is: nearest neighbors of Inception's vector. But finding the nearest among 100 million vectors with brute force is ~30 GB of math per query. Vector databases use approximate-nearest-neighbor (ANN) indexes to do it in 5-50ms.
Every LLM-powered app uses one — RAG (retrieval-augmented generation), semantic search, recommendation systems, dedup, image search. Pinecone, Weaviate, pgvector, Milvus all implement the same core idea differently.
02
What an embedding is
An embedding is a dense vector that represents the meaning of something — a word, a sentence, a product, an image. Trained so that similar things have similar vectors (cosine distance close to 1 / L2 distance close to 0).
"king" + "woman" - "man" ≈ "queen" (the famous word2vec example).
OpenAI text-embedding-3-small outputs 1536-dim vectors for any text.
CLIP outputs vectors that encode both images and text in the same space — "a red bicycle" and a photo of a red bike land near each other.
"Find similar" reduces to "find nearest neighbors in vector space." That's the whole game.
03
ANN — approximate nearest neighbor
Exact nearest neighbor over 100M vectors = compute distance to every vector = tens of seconds per query. Useless.
ANN indexes pre-organize vectors so most aren't scanned at query time. Three main families:
Algorithm
How
Build cost
Recall
HNSW
Hierarchical graph; query walks down layers narrowing in
O(N log N), high RAM
~95-99% (default for most VDBs)
IVF (inverted file)
Cluster into K buckets via k-means; query scans nearest bucket
O(N), tunable
~85-95%
Product Quantization (PQ)
Compress vectors to 8-bit codes per sub-space
Big RAM savings
Some recall loss; often combined with IVF/HNSW
ScaNN (Google)
Anisotropic vector quantization + tree
Best recall/latency tradeoff
Used inside Google Search
04
The vector-DB landscape
Pinecone
Managed, simplest
SaaS only. HNSW under the hood. Pay per pod-hour. Tight integration with LangChain/LlamaIndex.
Weaviate / Qdrant / Milvus
Self-hosted OSS
Run your own. HNSW + filtering, multi-tenancy, GraphQL APIs (Weaviate). Most flexible.
pgvector
Postgres extension
Stores vectors as Postgres column. IVF + HNSW. Underrated — for < 10M vectors, no separate DB needed.
Elasticsearch / OpenSearch dense_vector
Existing search engine + vectors
Add HNSW alongside inverted-index search. Use when you already run Elasticsearch.
05
Deep dive — RAG, the killer use case
Retrieval-Augmented Generation is why every team is suddenly buying a vector DB:
Your company has 10,000 internal documents.
Embed each doc (or chunks of it) → 10k vectors stored in a vector DB.
User asks: "What's our refund policy?"
Embed the question → query vector.
Vector DB returns top-5 nearest doc chunks (~10ms).
Build LLM prompt: "Given these docs: [chunks], answer: [question]"
LLM generates an answer grounded in your docs, not in its training data.
This is why every chatbot ever shipped in 2024-2025 has a vector DB behind it. The pattern: vector DB for retrieval, LLM for synthesis. Costs: ~$0.0001 per query for the embedding, ~$0.001 per query for the LLM call. Cheap enough to be free for most use cases.
Interview answer
"For semantic search and RAG we use a vector DB with HNSW indexing. Documents embedded once via OpenAI embeddings; queries embedded at request time and ANN-searched in ~10ms. Top-5 chunks injected into the LLM prompt for grounded answers."
06
Real-world systems
Spotify Annoy
2014-era ANN
Random projection trees. Used internally for music recommendation candidate generation. Open-sourced.
Facebook FAISS
The reference library
C++ ANN library with every algorithm under the sun. Most vector DBs use FAISS or its derivatives internally.
Google MUVERA / ScaNN
Production ANN
Powers Search + YouTube candidate generation. ScaNN is the open-source library; MUVERA the internal serving stack.
OpenAI File Search
Vectors-as-a-service
Hosted vector store with chunking + embedding + RAG built in. Skip ops entirely.
07
Used in problems
Typeahead can use vector search for semantic-similarity completions. Recommendation algorithm uses ANN over user/item embeddings as candidate generation. Web crawler can dedup near-identical pages via embedding similarity.