RAG architecture for B2B SaaS: 5 patterns that actually work

Working draft — Sancto AI is filling in the full version. The outline below is the structure we'll ship next week.

Why naive RAG fails in production

The tutorial version (embed everything, top-k by cosine, stuff into context, prompt) ships great demos and breaks at scale. It fails on three axes: recall (the right chunk isn't in the top-k), precision (the top-k is full of noise), and citation (you can't trace why the LLM said what it said).

Pattern 1: Query expansion

Before the embedding lookup, expand the user's query with related terms, synonyms, and clarifications via a cheap LLM call. Doubles recall on technical domains. Costs ~$0.001 per query. Worth it.

Pattern 2: Hybrid search (BM25 + dense)

Pure embeddings miss exact matches (product codes, names, regulations). Pure keyword search misses semantic equivalents. Run both, merge results with reciprocal rank fusion. The default for any RAG you're putting in front of a paying customer.

Pattern 3: Re-ranking

Retrieve 40 candidates, re-rank to 5 with a cross-encoder (Cohere Rerank, BAAI bge-reranker). Adds 80–200ms. Dramatically reduces hallucination. The cheapest big win.

Pattern 4: Agentic retrieval

Instead of one retrieval pass, the agent decides what to look up — and can issue follow-up retrievals based on what's missing. More expensive (3–8x token cost), but the answer quality is in a different league for complex queries.

Pattern 5: Structured outputs with citations

Force the LLM to return JSON with explicit citations to retrieved chunks. Makes the output auditable, lets you build "why did it say that?" UX. The trick most B2B teams skip until their first wrong answer in front of a customer.

If you're picking one to add today, pick re-ranking. Cheapest, biggest jump, no architectural change.

Full version of this article (with code samples and benchmarks from three of our production deployments) drops next week. Want the early draft? Email us.

RAG architecture for B2B SaaS: 5 patterns that actually work

Why naive RAG fails in production

Pattern 1: Query expansion

Pattern 2: Hybrid search (BM25 + dense)

Pattern 3: Re-ranking

Pattern 4: Agentic retrieval

Pattern 5: Structured outputs with citations

Building RAG and stuck?

Read next

What is an AI agent?

Voice AI: build vs buy in 2026

SOC 2 readiness for AI startups