TL;DR
Embedding drift is when the same text produces different vectors over time because of model updates, preprocessing changes, or partial re-embedding. It degrades RAG retrieval quality without throwing errors. Detect it by comparing cosine distances on known documents and tracking nearest-neighbor stability. Prevent it by pinning your pipeline, never mixing embedding generations, and versioning your vector data.
Your RAG pipeline shipped three months ago. Evaluations looked great, stakeholders were happy, and you moved on to the next project.
Then the answers started slipping. Not wrong, exactly, but less right. Users say the system feels “dumber” lately. You check the prompt, the model version, the retrieval config. Nothing changed.
Turns out the LLM is fine. The problem is further upstream: your embeddings have drifted.
Drift doesn't throw errors. It doesn't trip alerts. It just slowly erodes retrieval quality until someone finally notices the answers have gotten worse.
What Embedding Drift Actually Is
Here's the core issue: semantically identical text starts producing structurally different vectors over time. The text hasn't changed meaning. But the embedding has changed shape.
Vector search works by geometric proximity. When you query, you're asking “which stored vectors are closest to this query vector?” That only works if the stored vectors and the query vector were produced under the same conditions. When they weren't, cosine similarity stops reflecting semantic similarity.
The frustrating part is that the system keeps returning results. It looks like it's working. But relevant chunks that used to show up at position 2 are now buried at position 15. Recall drops from 0.92 to 0.74, and there's nothing in the logs to explain why.
The Five Causes That Actually Bite You
Most discussions about drift focus on model updates, which is the obvious cause. But the things that actually break production systems tend to be less visible.
1. Partial Re-embedding
This is the most common cause we see in production. A team re-embeds 20% of their corpus, maybe some updated docs or a new data source backfill. Now the vector store holds embeddings from two different runs.
Even if you're using the same model version, small differences in preprocessing or floating-point non-determinism can put vectors in slightly different regions of the space.
A document that ranked #2 last week might now rank #8, not because it became less relevant, but because the geometry around it shifted.
2. Preprocessing Pipeline Changes
A developer fixes a bug in the HTML stripper. Another adds Unicode normalization. Someone changes the chunk window from 512 to 480 tokens.
Each change is small and reasonable. Together, they mean the text being embedded today is structurally different from six months ago, even when the source document is identical. Because models use sub-word tokenization, changing a single space or punctuation mark can alter the entire token sequence for a sentence.
3. Model Version Bumps
Vectors from text-embedding-ada-002 and text-embedding-3-small are not in the same space. You cannot compare them with cosine similarity.
The real danger is switching models for new documents while old documents stay on the previous version. A mixed-model vector store will produce unreliable neighbor rankings because a v3 query cannot find v2 documents.
4. Chunk Boundary Drift
Same text, same model, but segmentation changed. A chunk that used to include the end of paragraph A and the start of paragraph B now only covers B. Different context window, different embedding, different neighbors.
5. Infrastructure and Index Changes
HNSW parameters (like ef_construction) vary between index rebuilds. A database migration changes vector precision from float32 to bfloat16. These don't always change the raw vectors, but they alter the approximate nearest neighbor graph. None of these show up in a code diff. All of them produce measurably different retrieval behavior.
Detecting Drift
The good news is that drift is straightforward to detect once you know what to look for. The bad news is that most teams aren't measuring any of this.
Check 1: Cosine Distance on Identical Text
Re-embed a sample document with your current pipeline. Compare against the stored vector. (Note: Thresholds below are heuristics based on OpenAI's models; exact values vary by provider).
| Distance | Status |
|---|---|
| < 0.001 | Stable (float math variance) |
| 0.001 – 0.02 | Minor drift, investigate preprocessing |
| 0.02 – 0.05 | Significant, retrieval affected |
| > 0.05 | Severe, likely model or chunking change |
Check 2: Nearest-Neighbor Stability
Run the same benchmark queries a week apart. Record top-k results each time.
- Healthy: 85–95% overlap between runs
- Degrading: 70–85% overlap, drift is starting
- Broken: <70% overlap, active quality loss
Check 3: Vector Count Divergence
Compare vectors in your database vs. your source of truth. Count mismatches mean ingestion failed, duplicates crept in, or vectors were deleted externally. Zero tolerance for unexplained deltas.
Check 4: Distribution Shift
Track L2 norm distribution over time. If the shape changes (higher variance, shifted mean, new outliers) the embedding process changed, even if you can't identify the cause yet.
Prevention: Think of Your Pipeline as a Build System
It helps to stop thinking of your index as a static file. Instead, think of it as a series of immutable states, each produced by a pinned pipeline. When the pipeline changes, you produce a new state and compare it against the old one before promoting.
Pin Everything
Model version, preprocessing dependencies, chunking config. If any of these change, it should be a deliberate decision to re-embed, not something you discover three weeks later when retrieval quality tanks.
Never Mix Embedding Generations
If you change any part of the pipeline, re-embed the entire corpus. Your vector store should contain vectors from exactly one pipeline configuration. Partial re-embedding is how most drift starts in the first place.
Store provenance with every vector: model version, preprocessing hash, chunking config, timestamp. When something breaks, you can trace exactly what changed instead of guessing.
Version Your Embeddings
This is what makes everything else practical. With versioned embeddings you can:
- Compare any two versions to see exactly what changed
- Roll back to a known-good state in seconds instead of re-embedding for hours
- Test retrieval against a pinned version while evaluating a new one
- Diff production against a fresh embedding run to catch drift early
Recovering from Drift with Decompressed
Drift is a pipeline problem, not a storage problem. But once you detect it, you need a way to recover quickly. That's where versioned vector storage comes in.
Decompressed stores embeddings as immutable, versioned datasets. Every upload or modification creates a new version, and old versions stick around. This gives you a few capabilities that matter when drift happens:
1. Instant Rollback
If retrieval quality drops and you suspect drift, you can roll back to a previous version without re-embedding:
# Roll back to the version that was working$ dcp sync push my-dataset pinecone-prod --version 3 --mode full✓ Full sync complete. 12,000 vectors pushed.
The old vectors are still there, so rollback is just a version checkout. No multi-hour re-embedding job required.
2. Version Diffing
When you sync a new version to your vector database, Decompressed computes the diff between the last synced version and the new one. You see exactly what changed:
$ dcp sync push my-dataset pinecone-prod --version 5⠋ Computing diff v2 → v5...+342 added, -18 deleted, ~27 updated, =11,613 unchangedPushing 387 changes (incremental sync)...✓ Sync complete.
This tells you how many vectors were added, deleted, or updated between versions. If you see unexpected changes (thousands of vectors marked as “updated” when you only changed a few documents), that's a signal that something in your pipeline shifted.
3. Destination Sync Checks
Before syncing, Decompressed checks if your destination database (Pinecone, Qdrant, etc.) was modified outside of Decompressed:
$ dcp sync push my-dataset pinecone-prod⠋ Checking destination for external changes...⚠ Destination has been modified externallyExpected: 12,000 vectorsActual: 12,847 vectorsThis sync may overwrite external changes. Continue? [y/N]
This catches cases where someone manually added or deleted vectors in the destination. Without this check, those changes would cause inconsistencies that are hard to track down.
The Quarterly Drift Audit
If you take one thing from this post, take this checklist:
Drift is the kind of problem where everything looks fine. The code hasn't changed, the model is the same, but retrieval keeps getting worse. It's maddening until you realize the vectors your system depends on are no longer consistent with each other.
The fix is discipline, not complexity. Pin your pipeline, version your embeddings, and measure drift regularly. When you need to change something, change everything at once and verify the results before you ship.