🚀

M11

Architecture & capstone

M11 · Architecture and Capstone

Module 11 — Week 10. Integrates the full curriculum: cross-cutting patterns, flow.json design, anti-patterns, evaluation as system testing, and the capstone (rebuild templates + design + exam).

RAGorbit nodes: all (13 categories). Capstone templates: 09 → 02 → 01.

Cross-cutting patterns
How to read a flow.json
How to design a flow.json
Anti-patterns
Design checklist
AI system testing (eval as test)
Reconstruction path 09 → 02 → 01
Checkpoint
Layer ③ explained: how to rebuild a template with a framework

1. Cross-cutting patterns

These patterns appear across multiple templates and modules. Mastering them is the expert criterion for the course (PLAN.md §1).

1.1 RAG-as-tool

What it is: retrieval is not a fixed pipeline step — it is a tool the agent invokes when it needs documentary evidence.

Pipeline fijo (09 RRHH):     io.input → retrieval.vector → prompt → output
RAG-as-tool (01 Aerolínea):  agent.react ──invoca──▶ tool.retriever (PolicyRAG)

When to use:

The agent must decide whether and when to consult documents (flight change: only after obtaining fare_class from the PNR).
There are multiple sources and the agent chooses filters dynamically.

When NOT to:

Question always about the same corpus (HR) → linear pipeline is simpler and more auditable.

RAGorbit node: tool.retriever. Templates: 01, 03, 10.

Scratch implementation: function policy_rag(query, fare_class, route_type) wrapped in a TOOLS dict.

Framework implementation: LangChain @tool that internally calls retriever.invoke() with filters.

1.2 Hard-filter-as-guardrail

What it is: metadata filters applied in the store/SQL, not as a suggestion to the LLM.

❌ Mal:  "Busca solo documentos del plan PPO" en el system prompt
✅ Bien: retrieval.vector con hardFilters: [plan, condition] → SQL WHERE plan='PPO'

Why it matters: semantic embedding can retrieve chunks from another plan if the text is similar. Hard-filters are a precision guardrail — the system cannot violate them through hallucination.

When to use: regulated domains (healthcare, banking, aviation, legal) where mixing contexts has consequences.

Node: retrieval.vector with hardFilters, or prior ingest.metadata.

Templates: 02 (doc_type/period), 01 (fare_class/route_type), 03, 08.

1.3 Deterministic vs LLM

Principle: the LLM reasons; business rules decide.

Task	Who does it	Node
Score credit risk	LLM (score 0–100)	`logic.structured`
Approve/reject by threshold	Deterministic rule	`logic.rules`
Classify P1/P2/P3 in disruption	Deterministic rule	`logic.rules`
Draft justification	LLM	`logic.structured` / `logic.prompt`
Charge with idempotency	Guardrail (code)	`guardrail.idempotency`

Anti-pattern: asking the LLM for "decision": "aprobar" and trusting it without logic.rules.

Templates: 02 (credit thresholds), 10 (simple vs complex track), 04 (deductible).

1.4 Fan-out at scale

What it is: process thousands of events in parallel with stateless sub-agents, not a single conversational agent.

io.event-source (Kafka)
    → logic.rules (segmentar)
    → logic.router (simple | complex)
    → agent.fanout (concurrency: 16)
        ├── tool.retriever (PolicyRAG)
        ├── tool.service (Alternatives)
        └── auto-confirm (reglas) vs LLM (solo complex)

When to use: high volume, independent events (logistics, mass notifications).

When NOT to: multi-turn conversation with one user → agent.react.

Template: 10-logistics. Module: M7.

1.5 Mandatory citations

What it is: post-processor that verifies the response anchors claims to retrieved chunks.

Flow:

logic.prompt → Message (respuesta del LLM)
                    ↓
logic.citations ← Chunks (del retriever)
    mode: enforce → rechaza/regenera si no hay cita
                    ↓
               io.output

Why it goes AFTER the LLM: the LLM may omit citing; enforce is code that does not negotiate.

Templates: 09, 02, 03, 05, 07, 08.

1.6 Agentic RAG

What it is: combine ReAct agent with RAG-as-tool + post-agent guardrails.

agent.react
  ├── tool.retriever (guías clínicas)
  ├── tool.service (historial paciente)
  └── ...
       ↓
logic.citations (enforce)
       ↓
hitl.escalate (casos críticos — estructural, no decidido por LLM)

Difference from linear RAG: the agent decides the order of queries and can combine APIs + documents.

Templates: 01, 03. Module: M6.

1.7 Summary table: patterns → templates

Pattern	Templates that illustrate it	Main module
Linear RAG	09	M1
Batch + structured + rules	02, 04	M2–M5
RAG-as-tool	01, 03, 10	M6, M7
Hard-filters	02, 01, 03, 08	M4
Fan-out	10	M7
Citations enforce	09, 02, 03, 05	M5
Transactional guardrails	01, 06	M6, M9
Multi-index + router	05, 07	M4
HITL	03, 08	M9
MCP	01 (exportable PolicyRAG)	M8

Full map: referencia/plantillas-mapeadas.md.

2. How to read a flow.json

A flow.json is the Flow IR — the source of truth that RAGorbit translates to Python. Contract: docs/01-concepts.md.

2.1 High-level structure

{
  "irVersion": "1.0",
  "flow": { "id", "name", "deploymentTarget", "defaults" },
  "nodes": [ { "id", "type", "label", "config" } ],
  "edges": [ { "source", "sourcePort", "target", "targetPort", "loop?" } ],
  "secrets": [ { "name", "required", "usedBy" } ]
}

2.2 Reading in 5 steps

deploymentTarget — defines the runtime: chat-service, batch, event-worker.
Implicit entry node — io.input, io.batch, or io.event-source.
Ingestion pipeline (if any) — loaders → chunker → metadata → store (offline).
Runtime pipeline — from entry to io.output, following edges.
secrets — which credentials you need (never values in the JSON).

2.3 Ports and types

Ports must be compatible. Common error when reading:

Port	Data type	Connects with
`Documents`	List of unindexed documents/chunks	`loader` → `ingest` → `store`
`Embeddings`	Embedding function/model	`model.embedding` → `store`
`Retriever`	Searchable object	`store` → `retrieval` / `tool.retriever`
`Query` / `Message`	User text	`io.input` → `retrieval` / `agent`
`Chunks`	Retrieved fragments	`retrieval` → `logic`
`Model`	Configured LLM	`model.llm` → `logic` / `agent`
`Tool`	Invocable function	`tool.*` → `agent`
`Decision`	Structured JSON	`logic.structured` → `logic.rules`
`Any`	Passthrough	`observability` → `io.output`

2.4 Example: trace template 09 HR

chat_input:Message ──▶ retriever:Query
hr_store:Retriever ──▶ retriever:Retriever
retriever:Chunks ──▶ prompt:Chunks
chat_input:Message ──▶ prompt:Message
llm:Model ──▶ prompt:Model
prompt:Message ──▶ citations:Message
retriever:Chunks ──▶ citations:Chunks
citations:Message ──▶ chat_output:Any

Offline ingestion (same design session):

hr_docs:Documents → chunker → hr_store
embedder:Embeddings → hr_store

3. How to design a flow.json

3.1 Recommended process

Brief de negocio
    ↓
① Definir deploymentTarget (¿chat? ¿batch? ¿eventos?)
    ↓
② Listar entradas/salidas (io.*)
    ↓
③ Diseñar ingesta offline (si hay RAG)
    ↓
④ Diseñar runtime (pipeline o agente)
    ↓
⑤ Añadir guardrails, HITL, observabilidad
    ↓
⑥ Validar contrato en RAGorbit
    ↓
⑦ Probar con mocks → eval con dataset

3.2 Decisions by category

Question	Options	Criterion
Agent or pipeline?	`agent.react` vs fixed chain	Is step order unpredictable? → agent
Which store?	Chroma vs pgvector vs multi-index	Chroma: prototype; pgvector: production; multi-index: multiple KBs
When to rerank?	Yes/No	Legal, medical, high precision → yes (M4)
Structured or free prompt?	`logic.structured` vs `logic.prompt`	Does output go to another system? → structured
Where do rules go?	`logic.rules` after the LLM	Whenever there is a business threshold

3.3 Mental template by complexity

Level 1 — Conversational RAG (09): io.input → retrieval → prompt → citations → io.output

Level 2 — Auditable batch (02): io.batch → loaders → chunker → metadata → store → retrieval → structured → rules → io.output

Level 3 — Transactional agent (01): ingesta → tool.retriever + io.input → agent.react ← tools ← guardrails → audit → io.output

4. Anti-patterns

4.1 Delegating thresholds to the LLM

// ❌ Anti-patrón
"logic.structured": { "schema": { "decision": "aprobar|rechazar" } }
// Sin logic.rules — el LLM decide aprobar con score 45

Fix: deterministic logic.rules after logic.structured.

4.2 Filters in the prompt instead of hard-filters

// ❌ "Filtra por plan PPO en tu búsqueda"
// ✅
"retrieval.vector": { "hardFilters": ["plan", "condition"] }

4.3 Citations only in the system prompt

// ❌ "Siempre cita tus fuentes" en system — el LLM puede ignorarlo
// ✅ logic.citations con mode: enforce DESPUÉS del LLM

4.4 HITL decided by the LLM

// ❌ "Si el caso es grave, di que escalarás a un humano"
// ✅ hitl.escalate con condiciones estructurales (severidad, criterio_no_encontrado)

4.5 One monolithic index for everything

Mixing legal playbook + regulations + precedents in a single store.pgvector without router → cross-category noise.

Fix: store.multi-index + retrieval.router (template 05, 07).

4.6 Conversational agent for high volume

Using agent.react for mass rebookings (thousands of shipments) → unsustainable latency and cost.

Fix: agent.fanout + logic.rules (template 10).

4.7 Guardrails in the agent prompt

// ❌ "Pide confirmación si el monto supera 500" en system del agente
// ✅ guardrail.confirm envolviendo el tool Payment

The agent consumes maxSteps and may "forget" the instruction. Structural guardrails are code.

4.8 No audit on transactional actions

Charges, flight changes, refunds without observability.audit → regulatory non-compliance.

5. Design checklist

Use this list before considering a flow.json complete:

Business and deployment

deploymentTarget matches the use case (chat / batch / event-worker)
Entry and exit (io.*) defined with correct format (markdown/json/streaming)
Secrets declared without embedded values

Ingestion and retrieval

Chunking strategy justified (by-section, by-clause, etc.)
Sufficient metadata for domain hard-filters
topK and store chosen with documented trade-off

Generation and logic

Low temperature (0.0–0.2) for factual responses
logic.structured if output goes to another system
logic.rules for business thresholds (not delegated to the LLM)
logic.citations enforce if there are consequences for hallucination

Agents and tools

Bounded maxSteps in agent.react
Tools with clear description for the LLM
RAG-as-tool only when the agent must decide when to retrieve

Production and security

Guardrails on transactional tools (idempotency, confirm, resilience)
hitl.escalate in critical cases (structural)
observability.audit on regulated actions
Eval plan (dataset + minimum metrics)

Technical validation

0 errors when validating in RAGorbit
Testing with mocks responds coherently
Edges with loop: true only where there is a valid ReAct cycle

6. AI system testing (eval as test)

In AI systems, evaluation is system testing — there is no single deterministic output, but there are verifiable properties.

6.1 Test pyramid for RAG/agentic

                    ┌─────────────────┐
                    │  Eval end-to-end │  ← RAGAS, casos de negocio
                    │  (lento, caro)   │
                    ├─────────────────┤
                    │  Integration      │  ← grafo completo con mocks
                    │  (pytest + MOCK)  │
                    ├─────────────────┤
                    │  Unit / nodos     │  ← rules, filters, guardrails
                    │  (determinista)   │
                    └─────────────────┘

6.2 What to test deterministically

Component	Test	Example
`logic.rules`	Fixed input → output	score=72 → "aprobar"
`guardrail.idempotency`	2nd call → deduplicated	M9 lab
hard-filters	Query without chunks from another plan	top-k only from the file
`logic.citations` enforce	Response without citation → rejected	M5 lab
chunker	N chunks with expected metadata	M2 expected

6.3 Eval metrics (upper layer)

Metric	What it measures	Tool
Faithfulness	Is the response anchored to context?	RAGAS
Context precision	Are retrieved chunks relevant?	RAGAS
Context recall	Was everything necessary retrieved?	RAGAS
Answer relevance	Does it answer the question?	RAGAS / DeepEval
Tool success rate	Did the agent complete the task?	Custom + LangSmith

6.4 Eval as CI

# Pseudocódigo — patrón recomendado
DATASET = [
    {"query": "¿Vacaciones 3 años?", "must_contain": ["18 días"], "must_cite": "§3"},
    {"query": "¿Precio acciones?", "must_contain": ["no está disponible"]},
]

def test_rag_properties():
    for case in DATASET:
        result = pipeline(case["query"])
        assert all(s in result.text for s in case["must_contain"])
        if "must_cite" in case:
            assert case["must_cite"] in result.citations

Rule: deterministic tests (rules, guardrails, filters) go in CI on every commit; eval with real LLM goes in nightly or pre-release.

6.5 Capstone system testing

For your 09→02→01 reconstructions:

09: assert indices and similarities (like expected.md).
02: assert JSON schema + decision from rules, not from the LLM.
01: assert tool call sequence + idempotency + audit events.

7. Reconstruction path 09 → 02 → 01

Capstone order (plantillas-mapeadas.md § Ruta):

09 RRHH        02 Banca           01 Aerolínea
~10 nodos      ~12 nodos          ~18 nodos
RAG lineal     batch+rules        agente+guardrails+RAG-tool

7.1 Template 09 — what it validates

Full RAG cycle: Documents → Retriever → Chunks → Message.
Mandatory citations post-LLM.
Chroma/in-memory store for prototype.

Lab: lab/solucion_scratch.py — executable reference.

7.2 Template 02 — complexity jump

Adds on top of 09:

io.batch + two loaders
ingest.metadata + hardFilters
logic.structured + logic.rules
pgvector (production)

Key skill: the LLM scores; the rule decides.

7.3 Template 01 — full integration

Adds on top of 02:

agent.react with loop
tool.retriever (PolicyRAG)
4× tool.service
3× guardrails in chain
observability.audit

Key skill: end-to-end transactional agentic design.

8. Checkpoint

You know it if you can:

Draw from memory the flow of 09, 02, and 01 with labeled ports.
Explain when to use linear RAG vs RAG-as-tool vs fan-out.
Design a new flow.json that passes Validate in RAGorbit.
Name 5 anti-patterns and their structural fix.
Define what goes in deterministic CI vs nightly eval for a RAG system.
Rebuild 09 in scratch in < 2 hours without looking at the solution.

12. Layer ③ explained: how to rebuild a template with a framework

This section closes the arc of layers ③ from M1–M6. It does not re-explain each API from scratch — it links and shows how to combine them to rebuild an entire template.

12.1 Map of previous ③ layers

Module	Section	What you learned	Template piece
M1	§11	TextLoader, splitter, Chroma, retriever, LCEL	09: RAG core
M2	§10	CharacterTextSplitter, metadata in Document	09/02: chunking
M3	§15	Chroma, FAISS, sentence-transformers	09: store
M4	§13	BM25, hybrid, filters, rerankers	02/01: hard-filters
M5	§10	Structured output, RAGAS, citations	02: JSON + eval
M6	§8	LangGraph, ReAct, tools, memory	01: agent

12.2 Block-by-block walkthrough: `solucion_framework.py` (template 09)

Open lab/solucion_framework.py and follow this map:

Block 1 — Loader (M1 §11.4, M2 §10)

loader = TextLoader("datos/politicas_rrhh.txt", encoding="utf-8")
documentos_raw = loader.load()

Equivalent to loader.pdf when the document is already text. In production with real PDFs: PyPDFLoader or UnstructuredPDFLoader (M2).

Block 2 — Splitter (M2 §10)

splitter = CharacterTextSplitter(separator="\n---\n", ...)
chunks = splitter.split_documents(documentos_raw)

Equivalent to ingest.chunker with strategy: by-section. For by-clause (01, 05): RecursiveCharacterTextSplitter with legal separators.

Block 3 — Embeddings + Store (M1 §11.6–11.7, M3 §15)

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, collection_name="hr_policies")

Equivalent to model.embedding → store.chroma. For 02/01 in production: PGVector with connection_string (M3).

Block 4 — Retriever (M1 §11.8, M4 §13)

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

Equivalent to retrieval.vector with topK: 4. For hard-filters (02):

retriever = vectorstore.as_retriever(
    search_kwargs={"k": 6, "filter": {"doc_type": "financial_data", "period": "2023"}}
)

Block 5 — Prompt + LLM (M1 §11.9)

prompt = ChatPromptTemplate.from_messages([("system", SYSTEM), ("human", HUMAN)])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)

Equivalent to model.llm + logic.prompt.

Block 6 — LCEL chain (M1 §11.11)

rag_chain = {"contexto": retriever | formatear_chunks, "pregunta": RunnablePassthrough()} | prompt | llm | StrOutputParser()

This is the linear chain of 09 without an agent.

Block 7 — Citations enforce (M5 §4)

def enforce_citations(respuesta, docs): ...

In RAGorbit it is a separate logic.citations node. In LangChain you can implement it as a post-processor or as a node in LangGraph.

12.3 Extend to template 02 (banking)

On the 09 skeleton, add:

Second loader — CSVLoader for tabular data (M2).
Metadata — doc.metadata["doc_type"] = ... before indexing (M2).
PGVector — PGVector.from_documents(...) (M3).
Structured output — llm.with_structured_output(CreditDecision) (M5 §10).
Rules — pure Python function post-LLM (M5 §3):

def apply_rules(decision: CreditDecision) -> CreditDecision:
    if decision.score >= 70:
        decision.decision = "aprobar"
    elif decision.score >= 40:
        decision.decision = "revisar"
    else:
        decision.decision = "rechazar"
    return decision

12.4 Extend to template 01 (airline)

On top of 02, replace the linear chain with an agent:

PolicyRAG as tool (M6 §4):

@tool
def policy_rag(query: str, fare_class: str, route_type: str) -> str:
    docs = retriever.invoke(query, filter={"fare_class": fare_class, "route_type": route_type})
    return format_docs(docs)

Service tools — functions that call mock APIs (M6).
LangGraph (M6 §8) — StateGraph with agent and tools nodes, conditional edge should_continue.
Guardrails — wrappers before registering the tool with the agent (M9):

payment_tool = with_resilience(with_confirm(with_idempotency(raw_payment_tool)))

Audit — callback or node that logs each tool call (M9).

12.5 Composition diagram

M1 (LCEL, Chroma) ──────┐
M2 (splitters, meta) ───┼──▶ Template 09 (RAG lineal)
M3 (stores) ────────────┘
                        │
M4 (filters) ───────────┼──▶ Template 02 (+ structured + rules)
M5 (structured, eval) ──┘
                        │
M6 (LangGraph, tools) ──┼──▶ Template 01 (+ guardrails + audit)
M9 (producción) ────────┘

12.6 When to use LangChain vs LangGraph vs LlamaIndex

Framework	Best for	Example template
LangChain LCEL	Linear RAG chains	09
LangGraph	Agents with cycle and state	01, 03
LlamaIndex	Query engines, complex indexes	05 (alternative)
CrewAI / AutoGen	Collaborative multi-agent	10 (M7)

Full table: tecnologias-comparadas.md.

⬅️ Course plan · Lab · Exercises

← Back to course View on GitHub →