Where the baseline breaks
The embed → retrieve → generate pipeline has three failure modes that appear only under production load.
Compound queries. A question like "compare the Q3 2024 guidance with what the CFO said in the earnings call and flag any divergence" is not one retrieval. It is at least three. A single-shot nearest-neighbor lookup collapses the sub-questions into one vector and returns chunks that partially address each — none well. The generator then synthesizes across fragments that were never selected to cohere.
Contradictory sources. When two retrieved chunks assert incompatible facts — a policy document from February and a superseding amendment from August — the generator has no signal about which to trust. Similarity scoring does not encode recency, authority, or version lineage. Both chunks land in context with equal weight, and the model averages them.
Vector similarity answers "what is near this query?" It does not answer "what should a system trust, and in what order?"
Long-horizon tasks. A task that requires building up context across multiple retrievals — a due-diligence sweep, a regulatory gap analysis, a contract comparison — cannot be served by a stateless top-k call. Each retrieval step needs to know what the prior step found, what is still missing, and what retrieval strategy to apply next.
Retrieval as structured computation
Super RAG v2.1 reframes retrieval not as a lookup but as a computation with explicit stages, each producing a typed artifact that feeds the next.
Query decomposition. An incoming query is parsed into a directed acyclic graph of sub-queries. Each node carries a retrieval intent — lookup, comparison, temporal scan, aggregation. The graph determines the retrieval order and how results are merged before generation.
Structural ranking before similarity. Before similarity scoring is consulted, a structural filter applies hard constraints: document authority tier, content type (policy, transcript, amendment), effective date range, and version lineage. Chunks that fail a structural constraint are excluded regardless of embedding proximity. Similarity operates within the surviving candidate set, not across the full corpus.
Citation-trace binding. Every chunk selected for context is tagged with a provenance record: source identifier, version, retrieval stage, and the structural rules that admitted it. The generator receives context with its selection rationale attached. The final output can be traced back, chunk by chunk, to a deterministic retrieval decision.
Why auditability is the forcing function
In a general-purpose chatbot, a retrieval failure is an inconvenience. In a regulated domain — clinical decision support, loan underwriting, compliance review — it is a liability event. The question is not only whether the answer is correct. It is whether the retrieval choices that produced it are defensible to a supervisor, an auditor, or a court.
Vector similarity is not defensible in that sense. It produces a ranked list with no explicit rationale. A structural retrieval pipeline — with decomposed queries, typed filters, and citation-trace records — produces a decision log. Every context chunk has an admission reason. Every sub-query has a result. The generation step operates on a reconstructible input.
That traceability is not a reporting convenience. In settings where an adverse output triggers regulatory review, the retrieval log is the evidence. Systems that cannot produce it are not compliant by design, regardless of how accurate their average output is.
What the in-depth covers
This brief names the failure modes and sketches the architecture. The in-depth companion works through each component in full: the query decomposition grammar, the structural ranking schema, the citation-trace data model, and the integration points where v2.1 differs from v1.x pipelines already in production.
It also covers the cases where simpler retrieval is still correct — not every RAG deployment needs a decomposition layer — and the cost model for adding structural ranking to an existing embedding-based system.
Read the full analysis: Super RAG v2.1 — In Depth.