Table of Contents
Quick Answer
A vector database is a system that indexes high-dimensional vectors and returns the nearest neighbors of a query vector in milliseconds.
- Stores millions to billions of vectors
- Uses approximate nearest neighbor (ANN) algorithms like HNSW or IVF
- Common choices: pgvector, Pinecone, Weaviate, Qdrant, Milvus
What Does Vector Database Mean?
A traditional database answers "find rows where email = 'x'". A vector database answers "find the 10 rows whose meaning is closest to this query." The query itself is an embedding — a vector of numbers — not a text string (Pinecone docs, 2024).
How It Works
- Embed each item with an embedding model
- Insert the vector plus metadata (id, text, tags) into the index
- The DB builds an ANN index structure (graph or inverted file)
- Query: embed the query text, run nearest-neighbor search, get top-K results
- Optionally filter by metadata (tag = "billing")
ANN sacrifices perfect accuracy for 100x-10000x speed. Typical recall: 95-99%.
Examples
- ChatGPT custom GPT: uploaded PDFs stored as vectors for retrieval
- E-commerce: "find products similar to this item"
- Legal research: retrieve cases with similar arguments
- Customer support: match new tickets to past resolved ones
- Fraud detection: flag transactions far from normal user pattern
Vector DB vs Traditional DB
Feature
Traditional DB
Vector DB
Primary query
Exact match / range
Nearest neighbor
Index
B-tree, hash
HNSW, IVF, PQ
Data type
Structured rows
Float arrays
Use case
Transactions
Semantic search
Many teams combine both — pgvector adds vector search to PostgreSQL without a new system.
When to Use a Vector Database
- RAG (retrieval-augmented generation)
- Semantic site search replacing Elasticsearch
- Image / video / audio similarity search
- Recommendation engines
- Duplicate detection across millions of documents
FAQs
Do I need a dedicated vector DB? Not always — pgvector in Postgres handles up to ~10M vectors comfortably.
What is HNSW? Hierarchical Navigable Small World — a graph-based ANN algorithm used by most modern vector DBs.
Can vector DBs do filtering? Yes — most support pre-filter or post-filter on metadata.
Are they accurate? ANN is approximate; recall is tunable (higher recall = slower).
How much does it cost? pgvector is free. Managed services run $70-1000+/month depending on scale.
Do they support hybrid search? Yes — modern ones combine vector + keyword (BM25) scores.
How do I pick one? Start with pgvector. Move to a dedicated DB only at 10M+ vectors or strict latency SLAs.
Conclusion
Vector DBs are the plumbing of the AI era. Most apps need them. Learn more on Misar Blog↗.