RAG Architectures: Simple, Multi-Hop, Graph-Assisted
Retrieval-augmented generation is a system pattern: generate answers with evidence that the system retrieves. The most important word is “system.” Success depends less on any single model and more on how retrieval, ranking, context construction, and answer synthesis cooperate under real constraints. When this cooperation is weak, the model fills gaps with plausible language. When it is strong, the system behaves like a dependable reader: it finds evidence, cites it, and refuses to pretend when evidence is missing.
RAG architectures vary because questions vary. Some questions have a single source of truth. Some require multiple documents. Some require reconciling conflicting sources. Some require scoping by permissions and time. Architecture is how these requirements become operational behavior.
Gaming Laptop PickPortable Performance SetupASUS ROG Strix G16 (2025) Gaming Laptop, 16-inch FHD+ 165Hz, RTX 5060, Core i7-14650HX, 16GB DDR5, 1TB Gen 4 SSD
ASUS ROG Strix G16 (2025) Gaming Laptop, 16-inch FHD+ 165Hz, RTX 5060, Core i7-14650HX, 16GB DDR5, 1TB Gen 4 SSD
A gaming laptop option that works well in performance-focused laptop roundups, dorm setup guides, and portable gaming recommendations.
- 16-inch FHD+ 165Hz display
- RTX 5060 laptop GPU
- Core i7-14650HX
- 16GB DDR5 memory
- 1TB Gen 4 SSD
Why it stands out
- Portable gaming option
- Fast display and current-gen GPU angle
- Useful for laptop and dorm pages
Things to know
- Mobile hardware has different limits than desktop parts
- Exact variants can change over time
The RAG loop as a disciplined pipeline
A basic RAG system follows a loop.
- Interpret the query and determine scope.
- Retrieve candidate evidence.
- Rank and select evidence.
- Construct context from evidence.
- Generate an answer grounded in the evidence.
- Optionally verify and revise based on checks.
Each step can fail in a way that looks like “model hallucination,” but the root cause often lives earlier: irrelevant retrieval, missing evidence, bad chunking, or a context packer that clipped the critical paragraph.
RAG architecture is about making each step explicit, measurable, and budgeted.
Simple RAG: one query, one retrieval, one answer
Simple RAG is the entry point and still the right choice for many workloads.
Structure
- One user query
- One retrieval call to an index
- One reranking step or none
- One context bundle
- One answer generation step
Where it works well
- Questions that map to a single concept or document section
- FAQ-like queries where the corpus is well structured
- Support flows where latency is tight and scope is narrow
- Domains where evidence is typically localized
Simple RAG succeeds when the corpus is clean, chunking is strong, and the retrieval plan reliably returns the right evidence.
Failure modes
- The retrieved chunks are topically related but do not contain the needed claim.
- The answer is correct in general but not for the user’s specific scenario.
- The system cites a chunk that sounds relevant but does not actually support the statement.
- The query is ambiguous and needs clarification, but the system tries to answer anyway.
Simple RAG should not be treated as a universal solution. It is the best baseline when combined with clear stop conditions: if evidence is weak, the system should ask a clarifying question or return an evidence-limited response rather than guessing.
Multi-stage RAG: separate recall from precision
Many production systems separate candidate recall from precision ranking.
Structure
- Retrieve a larger candidate set with a cheap method.
- Rerank with a stronger model that can read query and content together.
- Select a small evidence set for context packing.
- Generate and cite.
This architecture exists because indexes are fast but approximate. Rerankers are slower but more precise. Separating stages keeps latency bounded while improving relevance and citation correctness.
Practical tradeoffs
- More candidates increase recall but raise reranking cost.
- Reranking improves precision but can add latency spikes if not budgeted.
- The selection logic must avoid duplicates and ensure coverage.
Multi-stage RAG often feels like the first “serious” architecture because it turns retrieval into a controlled process rather than a single black box call.
Multi-hop RAG: when evidence is scattered
Some questions require evidence from multiple sources and intermediate reasoning steps.
- “What changed, why did it change, and what should be done now?”
- “Compare two approaches and explain the tradeoff.”
- “Find the procedure, then find the exceptions, then find the latest update.”
Multi-hop RAG treats retrieval as an iterative process.
Structure
- Decompose the question into sub-queries.
- Retrieve evidence for each sub-query.
- Accumulate intermediate notes or claims.
- Retrieve again based on what is missing.
- Synthesize with citations across sources.
Why multi-hop is risky
Multi-hop increases capability, but it can also increase drift. If an early step retrieves weak evidence, later steps may build on a false premise. The system becomes confident and wrong.
This risk is why multi-hop designs benefit from verification steps.
- Require evidence for intermediate claims before using them to plan further retrieval.
- Prefer retrieving canonical sources first, then supporting examples.
- Use stop conditions based on citation coverage and confidence thresholds.
- Enforce strict budgets on number of hops, candidate counts, and context size.
Multi-hop RAG is not a free upgrade. It must be engineered like a workflow, with controlled recursion and clear failure handling.
Graph-assisted RAG: adding structure to retrieval
Graph-assisted RAG uses explicit relationships between entities and documents to improve retrieval and synthesis.
Graphs can represent:
- Entities and their relations, such as “service depends on database”
- Document references and citations, such as “policy references procedure”
- Knowledge base structures, such as “topic taxonomy and hierarchy”
- Workflow structures, such as “incident timeline and causal links”
Graph-assisted retrieval can improve:
- Disambiguation, by selecting the right entity when names collide
- Coverage, by retrieving connected documents that are likely relevant
- Reasoning, by providing structured paths through evidence
What graphs do well
Graphs shine when relationships matter more than pure text similarity.
- Dependencies between services
- Hierarchies like “component belongs to subsystem belongs to product”
- Procedural sequences like “step A precedes step B”
- Citation chains like “source of truth points to versioned update”
Graphs can also reduce hallucination by constraining what the system is allowed to claim. If a relationship is not in the graph and not supported by retrieved text, the system has a clear reason to decline or ask for more information.
Where graphs fail
Graph-assisted RAG can disappoint when teams treat graphs as magic.
- Building and maintaining graphs can be costly and fragile.
- Graph coverage is rarely complete, especially for unstructured corpora.
- If entity resolution is wrong, graph traversal can retrieve the wrong cluster of documents.
- Graph signals can overweight popular nodes and underweight the niche document that contains the true answer.
Graph-assisted RAG should be treated as a targeted tool: use it where structured relationships are stable and high value.
Context construction: the quiet determinant of faithfulness
Even a perfect retrieval result can fail if context packing is poor.
Context construction includes:
- Selecting the evidence set
- Ordering evidence in a way that preserves coherence
- Including headings and identifiers so citations are meaningful
- Trimming without removing the critical lines
- Avoiding redundant chunks that crowd out diversity
A common failure is citation drift: the system cites a chunk that is nearby to the true evidence but does not actually contain it. This can happen when the packer includes too much surrounding text and the model anchors on the wrong paragraph. It can also happen when chunks are too large and contain multiple claims, only some of which support the answer.
Context construction is therefore part of architecture. It should be evaluated and improved like retrieval and ranking.
Citation selection is not decoration
Citations are a control surface. They are how the system proves it is grounded.
A strong citation plan does several things.
- It forces evidence selection to be precise.
- It provides user trust, especially when stakes are high.
- It makes debugging possible, because failures can be traced to retrieval and ranking.
- It enables measurement, such as citation coverage and faithfulness metrics.
A system that generates answers without citations can still be useful for brainstorming, but it cannot reliably serve as an evidence-backed system. RAG is most valuable when it behaves like a dependable reader, not a confident narrator.
Guardrails and refusal logic in RAG systems
RAG does not eliminate the need for guardrails. It reshapes them.
Guardrails in RAG often include:
- Refusal when evidence is missing or too weak
- Refusal or escalation when the query requests disallowed content
- Permission checks that prevent retrieval from violating boundaries
- Output checks that ensure citations support claims
- Logging and audit trails for which documents were accessed
These guardrails must be budgeted. A heavy verification pass on every request can add latency and cost. Many systems adopt a tiered approach: verify more when risk is higher, such as for policy answers, financial advice, or high-impact workflows.
Monitoring and evaluation for RAG architectures
RAG systems need metrics that separate retrieval failures from generation failures.
- Retrieval recall at k for the candidate generator
- Reranking precision and citation correctness
- Context coverage: does the packed context contain the needed evidence?
- Faithfulness: do generated claims match cited evidence?
- Latency and cost distributions, especially in multi-hop paths
- Drift signals after corpus updates and index refreshes
Monitoring should also capture the architecture path.
- Was it simple RAG, multi-stage, multi-hop, or graph-assisted?
- How many retrieval calls occurred?
- How many candidates were reranked?
- How many citations were used?
- Did the system fall back to a cheaper mode under budget pressure?
Without this instrumentation, improvements become guesswork and regressions become mysterious.
Choosing the right architecture for a workload
The architecture should match the product promise.
- Simple RAG for narrow tasks with strong corpus structure and tight latency targets
- Multi-stage RAG for broader tasks where precision matters and reranking is affordable
- Multi-hop RAG for tasks that require evidence across sources, with strong budgets and verification
- Graph-assisted RAG for domains where relationships are stable, valuable, and maintained
A platform does not need to pick one architecture forever. It can route by intent, risk, and budget. The key is to keep routing policies explicit and observable so that users and operators can predict behavior.
What good RAG looks like
A strong RAG system behaves predictably under change.
- It retrieves evidence that contains the needed claims, not only topically related text.
- It selects citations that actually support the answer.
- It asks for clarification when the query is ambiguous.
- It refuses to guess when evidence is missing.
- It maintains permission boundaries and auditability.
- It stays within latency and cost budgets without collapsing quality silently.
RAG is not a single technique. It is an infrastructure pattern for making AI systems accountable to evidence.
- Data, Retrieval, and Knowledge Overview: Data, Retrieval, and Knowledge Overview
- Nearby topics in this pillar
- Index Design: Vector, Hybrid, Keyword, Metadata
- Query Rewriting and Retrieval Augmentation Patterns
- Reranking and Citation Selection Logic
- Hallucination Reduction via Retrieval Discipline
- Cross-category connections
- Agent Evaluation: Task Success, Cost, Latency
- Evaluation Harnesses and Regression Suites
- Series routes: Infrastructure Shift Briefs, Tool Stack Spotlights
- Site navigation: AI Topics Index, Glossary
More Study Resources
- Category hub
- Data, Retrieval, and Knowledge Overview
- Related
- Index Design: Vector, Hybrid, Keyword, Metadata
- Query Rewriting and Retrieval Augmentation Patterns
- Reranking and Citation Selection Logic
- Hallucination Reduction via Retrieval Discipline
- Agent Evaluation: Task Success, Cost, Latency
- Evaluation Harnesses and Regression Suites
- Infrastructure Shift Briefs
- Tool Stack Spotlights
- AI Topics Index
- Glossary
Books by Drew Higgins
Prophecy and Its Meaning for Today
New Testament Prophecies and Their Meaning for Today
A focused study of New Testament prophecy and why it still matters for believers now.
