LearnArchitectureHow to Design a Reusable RAG Pipeline (Without Rewriting Everything)
🏗️ ArchitectureFeatured

How to Design a Reusable RAG Pipeline (Without Rewriting Everything)

Hardcoding chunking, embedding, and retrieval into a single function means every config change is a code change. Here's the strategy abstraction that fixes it: separate configuration from execution, test configs independently, and save the ones that work.

8 min readMarch 28, 2026Decompressed

TL;DR

Hardcoding chunking method, embedding model, and retrieval mode into your pipeline means every config change is a code change. The fix is a strategy abstraction: define and benchmark configs in RAG Lab, save the ones that score well, then reference them by name in production via the SDK. The pipeline code does not change when the strategy does.

The first version of a RAG pipeline usually looks like a single function that does everything. It chunks the documents, embeds them with a specific model, runs vector search, and returns results. It works. And then you need to change something.

Maybe you want to try semantic chunking instead of recursive. Maybe you want to test whether hybrid search scores better than pure vector. Maybe you want to compare two embedding models side by side. In a hardcoded pipeline, each of those changes means editing the pipeline code itself.

You end up with commented-out lines, if model == "small" branches, and a function that nobody wants to touch. Testing requires running the whole pipeline. Reusing a config that worked on a previous project means copy-pasting code.

There is a better structure. It requires one mental shift: separate configuration from execution.

The Hardcoded Pattern

Here is the pipeline most teams end up with after the first working prototype:

python
def run_rag(query, documents):
# Chunking hardcoded
chunks = recursive_chunk(documents, size=256, overlap=20)
# Model hardcoded
embeddings = openai_embed(chunks, model="text-embedding-3-small")
# Retrieval mode hardcoded
results = vector_search(embeddings, query, top_k=5)
return results

This works for one config. The problems appear when you want a second one:

Changing chunk sizeEdit the function, re-run everything
Swapping the modelEdit the function, incompatible vector space, full re-embed
Adding a rerankerEdit the function, add a dependency, re-test end-to-end
Comparing two configsDuplicate the function or add branching logic
Reusing a config laterCopy-paste code or reconstruct from memory

Every change is a code change. Every test is a full pipeline run. The config and the logic are fused together, and pulling them apart later is painful.

The Strategy Abstraction

The fix is to treat your pipeline configuration as a named, reusable artifact that lives outside your code. Define what chunking method, embedding model, and retrieval mode to use. Benchmark it against your actual documents. Save the configs that score well. Reference them by name in production.

RAG Lab is where the config lives. The SDK is how you execute against it. The two layers are deliberately separate.

LayerWhereWhat it does
ConfigRAG LabDefine chunking, model, retrieval mode. Benchmark with a gold set. Save the winner.
ExecutionSDKReference the saved strategy by name. Embed texts. Get vectors back.

Built-in Presets as a Starting Point

Four preset strategies are available without any configuration. Each covers a different point on the cost-quality curve:

Economy
text-embedding-3-small · 256d · vector
ghost
Balanced
text-embedding-3-large · 3072d · vector
balanced
High Accuracy
text-embedding-3-large · 3072d · hybrid + rerank
scholar
Hybrid Search
gte-large · 1024d · hybrid + rerank
hybrid

You can use any preset immediately with the SDK. No setup, no config file, no saved strategy required:

python
from decompressed_sdk import DecompressedClient
dc = DecompressedClient(api_key="dck_your_key_here")
# Use a preset by ID
result = dc.lab.embed(
texts=["Document 1", "Document 2"],
preset_id="balanced" # ghost=Economy | balanced=Balanced | scholar=High Accuracy | hybrid=Hybrid Search
)
print(f"Model: {result.model}")
print(f"Dimensions: {result.dimensions}")
print(f"Tokens used: {result.usage['token_count']}")

Start with Economy (ghost) for fast, cheap iteration and Balanced (balanced) when you need higher precision. Run both against your gold set in RAG Lab before committing to either in production.

Saving and Reusing Custom Strategies

Presets cover common cases. For production use, you want a strategy tuned to your specific corpus. The process is: benchmark in RAG Lab, save the config that wins on your Recall@K and MRR numbers, then reference it by name in your application code.

Once a strategy is saved in RAG Lab, you can reference it by name, by display name, or by its UUID:

python
# Reference a saved strategy by name
result = dc.lab.embed(
texts=["Document 1", "Document 2"],
strategy="My High-Accuracy Config"
)
# Or by UUID for exact, unambiguous reference
result = dc.lab.embed(
texts=["Document 1", "Document 2"],
strategy_id="abc-123-def-456"
)
print(f"Strategy used: {result.strategy_name}")
print(f"Base cost: ${result.usage['base_cost_usd']:.6f}")
print(f"Remaining tokens: {result.usage['remaining_tokens']}/{result.usage['token_limit']}")

The pipeline code does not change when you switch strategies. You change the name passed to embed(), not the pipeline logic. That is the separation that matters.

Listing Available Strategies

To see all presets and your saved strategies at any point:

python
available = dc.lab.list_strategies()
# Built-in presets
for preset in available["presets"]:
print(f"{preset['id']}: {preset['name']} ({preset['model']}, {preset['search_type']})")
# Your saved strategies
for strategy in available["saved_strategies"]:
print(f"{strategy['name']} — used {strategy['usage_count']} times")
print(f" model: {strategy['model']}, search: {strategy['search_type']}")

The usage_count on a saved strategy is how many times it has been called via the SDK. It gives you a signal about which configs are actually being used in production versus which ones were tested and abandoned.

The Pipeline Wrapper Pattern

The separation between config and execution makes a clean pipeline wrapper possible. The pipeline function takes a strategy reference and a list of texts. It delegates config resolution to the SDK. Your application code never needs to know what model or chunking method is in use:

python
from decompressed_sdk import DecompressedClient
dc = DecompressedClient(api_key="dck_your_key_here")
def embed_documents(texts, strategy_name):
"""Embed a list of texts using a named strategy from RAG Lab."""
result = dc.lab.embed(
texts=texts,
strategy=strategy_name,
)
return result.embeddings
def embed_with_preset(texts, preset_id="balanced"):
"""Embed using a built-in preset. Default: balanced."""
result = dc.lab.embed(
texts=texts,
preset_id=preset_id,
)
return result.embeddings
# Application code references the strategy name, not the config
embeddings = embed_documents(chunks, strategy_name="Legal Doc Retrieval v2")

When you want to swap to a new strategy, you update the name string. The embedding logic, error handling, and billing are all handled by the SDK. The only thing that changes is which config gets resolved.

Strategy names are case-sensitive when referenced by name. Use strategy_id for exact, stable references in production code. Names can change; UUIDs do not.

Test strategies and save the ones that work

Upload a document, compare strategies, save the winner, embed in production with one line

Open RAG Lab

What This Unlocks at Scale

For small projects, swapping a name string in one place is a minor convenience. At scale, the separation between config and execution becomes more important.

Different document types in the same application can use different strategies. Support tickets might perform best with ghost (fast, cheap, good enough for general text). Legal contracts might need scholar (hybrid search with reranking for precision). You reference each strategy by name where it is relevant. The pipeline code for both paths looks identical.

When you improve a strategy through re-evaluation in RAG Lab, production automatically uses the updated config on the next embed call. No deploy required for a config change.

The teams that rewrite their pipelines every time they want to try something new are the ones who delayed separating config from logic. The ones who move fast at scale separated them early, often before they fully understood why it mattered.

Run your strategies, save the ones that work

RAG Lab is the benchmarking layer. The SDK is the execution layer. Test chunking methods, embedding models, and retrieval modes side by side, save the winning config, and reference it by name in production. No pipeline rewrites required.

Build your first strategy