A Practical Guide to AI Embeddings — Beyond the Hype
·ScaledByDesign·
aiembeddingsvector-databasesemantic-searchmachine-learning
What Embeddings Actually Are
An embedding is a list of numbers that captures the meaning of text, images, or other data. Similar items have similar numbers. That's it — the concept is simple even if the math behind it is complex.
// Two sentences with similar meaning
const embedding1 = await embed("How do I return a product?");
// → [0.023, -0.41, 0.87, 0.12, ... ] (1536 numbers)
const embedding2 = await embed("What's your return policy?");
// → [0.025, -0.39, 0.85, 0.14, ... ] (similar numbers!)
const embedding3 = await embed("The weather is nice today");
// → [-0.71, 0.33, -0.12, 0.56, ...] (very different numbers)
// Cosine similarity
cosineSimilarity(embedding1, embedding2); // 0.94 (very similar)
cosineSimilarity(embedding1, embedding3); // 0.11 (not related)Choosing an Embedding Model
| Model | Dimensions | Speed | Quality | Cost |
|--------------------------|-----------|----------|---------|---------------|
| text-embedding-3-small | 1536 | Fast | Good | $0.02/1M tok |
| text-embedding-3-large | 3072 | Medium | Better | $0.13/1M tok |
| Cohere embed-v3 | 1024 | Fast | Good | $0.10/1M tok |
| all-MiniLM-L6-v2 (local)| 384 | Very fast| OK | Free (self-hosted)|
| nomic-embed-text (local) | 768 | Fast | Good | Free (self-hosted)|
Recommendation:
→ Start with text-embedding-3-small (best cost/quality for most use cases)
→ Use local models if you process >10M documents (cost savings)
→ Use text-embedding-3-large for legal/medical (accuracy-critical)
Chunking Strategy
Before embedding documents, you need to split them into chunks. This is where most implementations fail:
// ✗ Bad: fixed-size chunks that split sentences
function badChunking(text: string, size: number = 500): string[] {
const chunks = [];
for (let i = 0; i < text.length; i += size) {
chunks.push(text.slice(i, i + size)); // Might cut mid-sentence
}
return chunks;
}
// ✓ Good: semantic chunking that respects document structure
function semanticChunking(markdown: string): Chunk[] {
const chunks: Chunk[] = [];
// Split by headers first (natural document boundaries)
const sections = markdown.split(/\n#{1,3}\s/);
for (const section of sections) {
if (section.length < 200) {
// Too small — merge with previous chunk
if (chunks.length > 0) {
chunks[chunks.length - 1].content += "\n" + section;
}
} else if (section.length > 1000) {
// Too large — split by paragraphs
const paragraphs = section.split(/\n\n/);
let currentChunk = "";
for (const para of paragraphs) {
if ((currentChunk + para).length > 800) {
chunks.push({ content: currentChunk.trim() });
currentChunk = para;
} else {
currentChunk += "\n\n" + para;
}
}
if (currentChunk.trim()) chunks.push({ content: currentChunk.trim() });
} else {
chunks.push({ content: section.trim() });
}
}
return chunks;
}Chunk Overlap
Add overlap between chunks so context isn't lost at boundaries:
function chunkWithOverlap(paragraphs: string[], targetSize: number = 600, overlap: number = 100): string[] {
const chunks: string[] = [];
let current = "";
for (const para of paragraphs) {
if ((current + para).length > targetSize && current.length > 0) {
chunks.push(current);
// Start next chunk with overlap from end of current
current = current.slice(-overlap) + "\n\n" + para;
} else {
current += (current ? "\n\n" : "") + para;
}
}
if (current) chunks.push(current);
return chunks;
}Storing and Querying Embeddings
Vector Database Options
Managed services:
→ Pinecone: Easiest to start, good for < 10M vectors
→ Weaviate Cloud: Full-featured, good hybrid search
→ Qdrant Cloud: Best performance/cost ratio
Self-hosted:
→ pgvector (PostgreSQL): Good for < 1M vectors, already have Postgres
→ Qdrant: Best for large scale self-hosted
→ Chroma: Good for prototyping, not for production scale
Recommendation:
→ Already use PostgreSQL? Start with pgvector
→ Need scale (>5M vectors)? Use Qdrant or Pinecone
→ Prototyping? Use Chroma locally
pgvector Example
-- Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table with vector column
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding vector(1536), -- Match your model's dimensions
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create index for fast similarity search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100); -- Tune: sqrt(row_count) is a good starting point
-- Query: find similar documents
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE metadata->>'category' = 'returns' -- Filter first, then vector search
ORDER BY embedding <=> $1::vector
LIMIT 5;Production Pipeline
// Complete embedding pipeline for a knowledge base
async function indexDocuments(documents: Document[]) {
for (const doc of documents) {
// 1. Chunk the document
const chunks = semanticChunking(doc.content);
// 2. Generate embeddings in batches
const embeddings = await batchEmbed(chunks.map(c => c.content), {
batchSize: 100, // Most APIs accept batches
});
// 3. Store with metadata
await vectorDB.upsert(
chunks.map((chunk, i) => ({
id: `${doc.id}-chunk-${i}`,
embedding: embeddings[i],
metadata: {
documentId: doc.id,
title: doc.title,
chunkIndex: i,
category: doc.category,
updatedAt: doc.updatedAt,
},
content: chunk.content,
}))
);
}
}
// Search with metadata filtering
async function search(query: string, filters?: SearchFilters) {
const queryEmbedding = await embed(query);
return vectorDB.search(queryEmbedding, {
topK: 5,
minScore: 0.7, // Only return relevant results
filter: filters ? buildFilter(filters) : undefined,
});
}Common Mistakes
Mistake 1: Embedding entire documents
→ Problem: Long text dilutes the embedding signal
→ Fix: Chunk into 200-800 token segments
Mistake 2: Not cleaning text before embedding
→ Problem: HTML tags, navigation text pollute embeddings
→ Fix: Strip HTML, remove boilerplate, keep content only
Mistake 3: Mixing embedding models
→ Problem: Vectors from different models aren't comparable
→ Fix: Re-embed everything if you change models
Mistake 4: No metadata filtering
→ Problem: Vector search across all docs when you know the category
→ Fix: Filter by metadata FIRST, then vector search within results
Mistake 5: Not evaluating retrieval quality
→ Problem: You don't know if search results are actually good
→ Fix: Build a test set of queries with expected results, measure recall
Embeddings are the foundation, not the product. Get chunking, storage, and retrieval right, and everything built on top — semantic search, RAG, recommendations — will work better. Get them wrong, and no amount of prompt engineering will save you.