🚀
M11

Architecture & capstone

M11 · Architecture and Capstone

Module 11 — Week 10. Integrates the full curriculum: cross-cutting patterns, flow.json design, anti-patterns, evaluation as system testing, and the capstone (rebuild templates + design + exam).

RAGorbit nodes: all (13 categories). Capstone templates: 090201.


Table of Contents

  1. Cross-cutting patterns
  2. How to read a flow.json
  3. How to design a flow.json
  4. Anti-patterns
  5. Design checklist
  6. AI system testing (eval as test)
  7. Reconstruction path 09 → 02 → 01
  8. Checkpoint
  9. Layer ③ explained: how to rebuild a template with a framework

1. Cross-cutting patterns

These patterns appear across multiple templates and modules. Mastering them is the expert criterion for the course (PLAN.md §1).

1.1 RAG-as-tool

What it is: retrieval is not a fixed pipeline step — it is a tool the agent invokes when it needs documentary evidence.

Pipeline fijo (09 RRHH):     io.input → retrieval.vector → prompt → output
RAG-as-tool (01 Aerolínea):  agent.react ──invoca──▶ tool.retriever (PolicyRAG)

When to use:

  • The agent must decide whether and when to consult documents (flight change: only after obtaining fare_class from the PNR).
  • There are multiple sources and the agent chooses filters dynamically.

When NOT to:

  • Question always about the same corpus (HR) → linear pipeline is simpler and more auditable.

RAGorbit node: tool.retriever. Templates: 01, 03, 10.

Scratch implementation: function policy_rag(query, fare_class, route_type) wrapped in a TOOLS dict.

Framework implementation: LangChain @tool that internally calls retriever.invoke() with filters.


1.2 Hard-filter-as-guardrail

What it is: metadata filters applied in the store/SQL, not as a suggestion to the LLM.

❌ Mal:  "Busca solo documentos del plan PPO" en el system prompt
✅ Bien: retrieval.vector con hardFilters: [plan, condition] → SQL WHERE plan='PPO'

Why it matters: semantic embedding can retrieve chunks from another plan if the text is similar. Hard-filters are a precision guardrail — the system cannot violate them through hallucination.

When to use: regulated domains (healthcare, banking, aviation, legal) where mixing contexts has consequences.

Node: retrieval.vector with hardFilters, or prior ingest.metadata.

Templates: 02 (doc_type/period), 01 (fare_class/route_type), 03, 08.


1.3 Deterministic vs LLM

Principle: the LLM reasons; business rules decide.

Task Who does it Node
Score credit risk LLM (score 0–100) logic.structured
Approve/reject by threshold Deterministic rule logic.rules
Classify P1/P2/P3 in disruption Deterministic rule logic.rules
Draft justification LLM logic.structured / logic.prompt
Charge with idempotency Guardrail (code) guardrail.idempotency

Anti-pattern: asking the LLM for "decision": "aprobar" and trusting it without logic.rules.

Templates: 02 (credit thresholds), 10 (simple vs complex track), 04 (deductible).


1.4 Fan-out at scale

What it is: process thousands of events in parallel with stateless sub-agents, not a single conversational agent.

io.event-source (Kafka)
    → logic.rules (segmentar)
    → logic.router (simple | complex)
    → agent.fanout (concurrency: 16)
        ├── tool.retriever (PolicyRAG)
        ├── tool.service (Alternatives)
        └── auto-confirm (reglas) vs LLM (solo complex)

When to use: high volume, independent events (logistics, mass notifications).

When NOT to: multi-turn conversation with one user → agent.react.

Template: 10-logistics. Module: M7.


1.5 Mandatory citations

What it is: post-processor that verifies the response anchors claims to retrieved chunks.

Flow:

logic.prompt → Message (respuesta del LLM)
                    ↓
logic.citations ← Chunks (del retriever)
    mode: enforce → rechaza/regenera si no hay cita
                    ↓
               io.output

Why it goes AFTER the LLM: the LLM may omit citing; enforce is code that does not negotiate.

Templates: 09, 02, 03, 05, 07, 08.


1.6 Agentic RAG

What it is: combine ReAct agent with RAG-as-tool + post-agent guardrails.

agent.react
  ├── tool.retriever (guías clínicas)
  ├── tool.service (historial paciente)
  └── ...
       ↓
logic.citations (enforce)
       ↓
hitl.escalate (casos críticos — estructural, no decidido por LLM)

Difference from linear RAG: the agent decides the order of queries and can combine APIs + documents.

Templates: 01, 03. Module: M6.


1.7 Summary table: patterns → templates

Pattern Templates that illustrate it Main module
Linear RAG 09 M1
Batch + structured + rules 02, 04 M2–M5
RAG-as-tool 01, 03, 10 M6, M7
Hard-filters 02, 01, 03, 08 M4
Fan-out 10 M7
Citations enforce 09, 02, 03, 05 M5
Transactional guardrails 01, 06 M6, M9
Multi-index + router 05, 07 M4
HITL 03, 08 M9
MCP 01 (exportable PolicyRAG) M8

Full map: referencia/plantillas-mapeadas.md.


2. How to read a flow.json

A flow.json is the Flow IR — the source of truth that RAGorbit translates to Python. Contract: docs/01-concepts.md.

2.1 High-level structure

{
  "irVersion": "1.0",
  "flow": { "id", "name", "deploymentTarget", "defaults" },
  "nodes": [ { "id", "type", "label", "config" } ],
  "edges": [ { "source", "sourcePort", "target", "targetPort", "loop?" } ],
  "secrets": [ { "name", "required", "usedBy" } ]
}

2.2 Reading in 5 steps

  1. deploymentTarget — defines the runtime: chat-service, batch, event-worker.
  2. Implicit entry nodeio.input, io.batch, or io.event-source.
  3. Ingestion pipeline (if any) — loaders → chunker → metadata → store (offline).
  4. Runtime pipeline — from entry to io.output, following edges.
  5. secrets — which credentials you need (never values in the JSON).

2.3 Ports and types

Ports must be compatible. Common error when reading:

Port Data type Connects with
Documents List of unindexed documents/chunks loaderingeststore
Embeddings Embedding function/model model.embeddingstore
Retriever Searchable object storeretrieval / tool.retriever
Query / Message User text io.inputretrieval / agent
Chunks Retrieved fragments retrievallogic
Model Configured LLM model.llmlogic / agent
Tool Invocable function tool.*agent
Decision Structured JSON logic.structuredlogic.rules
Any Passthrough observabilityio.output

2.4 Example: trace template 09 HR

chat_input:Message ──▶ retriever:Query
hr_store:Retriever ──▶ retriever:Retriever
retriever:Chunks ──▶ prompt:Chunks
chat_input:Message ──▶ prompt:Message
llm:Model ──▶ prompt:Model
prompt:Message ──▶ citations:Message
retriever:Chunks ──▶ citations:Chunks
citations:Message ──▶ chat_output:Any

Offline ingestion (same design session):

hr_docs:Documents → chunker → hr_store
embedder:Embeddings → hr_store

3. How to design a flow.json

3.1 Recommended process

Brief de negocio
    ↓
① Definir deploymentTarget (¿chat? ¿batch? ¿eventos?)
    ↓
② Listar entradas/salidas (io.*)
    ↓
③ Diseñar ingesta offline (si hay RAG)
    ↓
④ Diseñar runtime (pipeline o agente)
    ↓
⑤ Añadir guardrails, HITL, observabilidad
    ↓
⑥ Validar contrato en RAGorbit
    ↓
⑦ Probar con mocks → eval con dataset

3.2 Decisions by category

Question Options Criterion
Agent or pipeline? agent.react vs fixed chain Is step order unpredictable? → agent
Which store? Chroma vs pgvector vs multi-index Chroma: prototype; pgvector: production; multi-index: multiple KBs
When to rerank? Yes/No Legal, medical, high precision → yes (M4)
Structured or free prompt? logic.structured vs logic.prompt Does output go to another system? → structured
Where do rules go? logic.rules after the LLM Whenever there is a business threshold

3.3 Mental template by complexity

Level 1 — Conversational RAG (09): io.input → retrieval → prompt → citations → io.output

Level 2 — Auditable batch (02): io.batch → loaders → chunker → metadata → store → retrieval → structured → rules → io.output

Level 3 — Transactional agent (01): ingesta → tool.retriever + io.input → agent.react ← tools ← guardrails → audit → io.output


4. Anti-patterns

4.1 Delegating thresholds to the LLM

// ❌ Anti-patrón
"logic.structured": { "schema": { "decision": "aprobar|rechazar" } }
// Sin logic.rules — el LLM decide aprobar con score 45

Fix: deterministic logic.rules after logic.structured.


4.2 Filters in the prompt instead of hard-filters

// ❌ "Filtra por plan PPO en tu búsqueda"
// ✅
"retrieval.vector": { "hardFilters": ["plan", "condition"] }

4.3 Citations only in the system prompt

// ❌ "Siempre cita tus fuentes" en system — el LLM puede ignorarlo
// ✅ logic.citations con mode: enforce DESPUÉS del LLM

4.4 HITL decided by the LLM

// ❌ "Si el caso es grave, di que escalarás a un humano"
// ✅ hitl.escalate con condiciones estructurales (severidad, criterio_no_encontrado)

4.5 One monolithic index for everything

Mixing legal playbook + regulations + precedents in a single store.pgvector without router → cross-category noise.

Fix: store.multi-index + retrieval.router (template 05, 07).


4.6 Conversational agent for high volume

Using agent.react for mass rebookings (thousands of shipments) → unsustainable latency and cost.

Fix: agent.fanout + logic.rules (template 10).


4.7 Guardrails in the agent prompt

// ❌ "Pide confirmación si el monto supera 500" en system del agente
// ✅ guardrail.confirm envolviendo el tool Payment

The agent consumes maxSteps and may "forget" the instruction. Structural guardrails are code.


4.8 No audit on transactional actions

Charges, flight changes, refunds without observability.audit → regulatory non-compliance.


5. Design checklist

Use this list before considering a flow.json complete:

Business and deployment

  • deploymentTarget matches the use case (chat / batch / event-worker)
  • Entry and exit (io.*) defined with correct format (markdown/json/streaming)
  • Secrets declared without embedded values

Ingestion and retrieval

  • Chunking strategy justified (by-section, by-clause, etc.)
  • Sufficient metadata for domain hard-filters
  • topK and store chosen with documented trade-off

Generation and logic

  • Low temperature (0.0–0.2) for factual responses
  • logic.structured if output goes to another system
  • logic.rules for business thresholds (not delegated to the LLM)
  • logic.citations enforce if there are consequences for hallucination

Agents and tools

  • Bounded maxSteps in agent.react
  • Tools with clear description for the LLM
  • RAG-as-tool only when the agent must decide when to retrieve

Production and security

  • Guardrails on transactional tools (idempotency, confirm, resilience)
  • hitl.escalate in critical cases (structural)
  • observability.audit on regulated actions
  • Eval plan (dataset + minimum metrics)

Technical validation

  • 0 errors when validating in RAGorbit
  • Testing with mocks responds coherently
  • Edges with loop: true only where there is a valid ReAct cycle

6. AI system testing (eval as test)

In AI systems, evaluation is system testing — there is no single deterministic output, but there are verifiable properties.

6.1 Test pyramid for RAG/agentic

                    ┌─────────────────┐
                    │  Eval end-to-end │  ← RAGAS, casos de negocio
                    │  (lento, caro)   │
                    ├─────────────────┤
                    │  Integration      │  ← grafo completo con mocks
                    │  (pytest + MOCK)  │
                    ├─────────────────┤
                    │  Unit / nodos     │  ← rules, filters, guardrails
                    │  (determinista)   │
                    └─────────────────┘

6.2 What to test deterministically

Component Test Example
logic.rules Fixed input → output score=72 → "aprobar"
guardrail.idempotency 2nd call → deduplicated M9 lab
hard-filters Query without chunks from another plan top-k only from the file
logic.citations enforce Response without citation → rejected M5 lab
chunker N chunks with expected metadata M2 expected

6.3 Eval metrics (upper layer)

Metric What it measures Tool
Faithfulness Is the response anchored to context? RAGAS
Context precision Are retrieved chunks relevant? RAGAS
Context recall Was everything necessary retrieved? RAGAS
Answer relevance Does it answer the question? RAGAS / DeepEval
Tool success rate Did the agent complete the task? Custom + LangSmith

6.4 Eval as CI

# Pseudocódigo — patrón recomendado
DATASET = [
    {"query": "¿Vacaciones 3 años?", "must_contain": ["18 días"], "must_cite": "§3"},
    {"query": "¿Precio acciones?", "must_contain": ["no está disponible"]},
]

def test_rag_properties():
    for case in DATASET:
        result = pipeline(case["query"])
        assert all(s in result.text for s in case["must_contain"])
        if "must_cite" in case:
            assert case["must_cite"] in result.citations

Rule: deterministic tests (rules, guardrails, filters) go in CI on every commit; eval with real LLM goes in nightly or pre-release.

6.5 Capstone system testing

For your 09→02→01 reconstructions:

  1. 09: assert indices and similarities (like expected.md).
  2. 02: assert JSON schema + decision from rules, not from the LLM.
  3. 01: assert tool call sequence + idempotency + audit events.

7. Reconstruction path 09 → 02 → 01

Capstone order (plantillas-mapeadas.md § Ruta):

09 RRHH        02 Banca           01 Aerolínea
~10 nodos      ~12 nodos          ~18 nodos
RAG lineal     batch+rules        agente+guardrails+RAG-tool

7.1 Template 09 — what it validates

  • Full RAG cycle: Documents → Retriever → Chunks → Message.
  • Mandatory citations post-LLM.
  • Chroma/in-memory store for prototype.

Lab: lab/solucion_scratch.py — executable reference.

7.2 Template 02 — complexity jump

Adds on top of 09:

  • io.batch + two loaders
  • ingest.metadata + hardFilters
  • logic.structured + logic.rules
  • pgvector (production)

Key skill: the LLM scores; the rule decides.

7.3 Template 01 — full integration

Adds on top of 02:

  • agent.react with loop
  • tool.retriever (PolicyRAG)
  • tool.service
  • 3× guardrails in chain
  • observability.audit

Key skill: end-to-end transactional agentic design.


8. Checkpoint

You know it if you can:

  • Draw from memory the flow of 09, 02, and 01 with labeled ports.
  • Explain when to use linear RAG vs RAG-as-tool vs fan-out.
  • Design a new flow.json that passes Validate in RAGorbit.
  • Name 5 anti-patterns and their structural fix.
  • Define what goes in deterministic CI vs nightly eval for a RAG system.
  • Rebuild 09 in scratch in < 2 hours without looking at the solution.

12. Layer ③ explained: how to rebuild a template with a framework

This section closes the arc of layers ③ from M1–M6. It does not re-explain each API from scratch — it links and shows how to combine them to rebuild an entire template.

12.1 Map of previous ③ layers

Module Section What you learned Template piece
M1 §11 TextLoader, splitter, Chroma, retriever, LCEL 09: RAG core
M2 §10 CharacterTextSplitter, metadata in Document 09/02: chunking
M3 §15 Chroma, FAISS, sentence-transformers 09: store
M4 §13 BM25, hybrid, filters, rerankers 02/01: hard-filters
M5 §10 Structured output, RAGAS, citations 02: JSON + eval
M6 §8 LangGraph, ReAct, tools, memory 01: agent

12.2 Block-by-block walkthrough: solucion_framework.py (template 09)

Open lab/solucion_framework.py and follow this map:

Block 1 — Loader (M1 §11.4, M2 §10)

loader = TextLoader("datos/politicas_rrhh.txt", encoding="utf-8")
documentos_raw = loader.load()

Equivalent to loader.pdf when the document is already text. In production with real PDFs: PyPDFLoader or UnstructuredPDFLoader (M2).

Block 2 — Splitter (M2 §10)

splitter = CharacterTextSplitter(separator="\n---\n", ...)
chunks = splitter.split_documents(documentos_raw)

Equivalent to ingest.chunker with strategy: by-section. For by-clause (01, 05): RecursiveCharacterTextSplitter with legal separators.

Block 3 — Embeddings + Store (M1 §11.6–11.7, M3 §15)

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, collection_name="hr_policies")

Equivalent to model.embeddingstore.chroma. For 02/01 in production: PGVector with connection_string (M3).

Block 4 — Retriever (M1 §11.8, M4 §13)

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

Equivalent to retrieval.vector with topK: 4. For hard-filters (02):

retriever = vectorstore.as_retriever(
    search_kwargs={"k": 6, "filter": {"doc_type": "financial_data", "period": "2023"}}
)

Block 5 — Prompt + LLM (M1 §11.9)

prompt = ChatPromptTemplate.from_messages([("system", SYSTEM), ("human", HUMAN)])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)

Equivalent to model.llm + logic.prompt.

Block 6 — LCEL chain (M1 §11.11)

rag_chain = {"contexto": retriever | formatear_chunks, "pregunta": RunnablePassthrough()} | prompt | llm | StrOutputParser()

This is the linear chain of 09 without an agent.

Block 7 — Citations enforce (M5 §4)

def enforce_citations(respuesta, docs): ...

In RAGorbit it is a separate logic.citations node. In LangChain you can implement it as a post-processor or as a node in LangGraph.

12.3 Extend to template 02 (banking)

On the 09 skeleton, add:

  1. Second loaderCSVLoader for tabular data (M2).
  2. Metadatadoc.metadata["doc_type"] = ... before indexing (M2).
  3. PGVectorPGVector.from_documents(...) (M3).
  4. Structured outputllm.with_structured_output(CreditDecision) (M5 §10).
  5. Rules — pure Python function post-LLM (M5 §3):
def apply_rules(decision: CreditDecision) -> CreditDecision:
    if decision.score >= 70:
        decision.decision = "aprobar"
    elif decision.score >= 40:
        decision.decision = "revisar"
    else:
        decision.decision = "rechazar"
    return decision

12.4 Extend to template 01 (airline)

On top of 02, replace the linear chain with an agent:

  1. PolicyRAG as tool (M6 §4):
@tool
def policy_rag(query: str, fare_class: str, route_type: str) -> str:
    docs = retriever.invoke(query, filter={"fare_class": fare_class, "route_type": route_type})
    return format_docs(docs)
  1. Service tools — functions that call mock APIs (M6).

  2. LangGraph (M6 §8) — StateGraph with agent and tools nodes, conditional edge should_continue.

  3. Guardrails — wrappers before registering the tool with the agent (M9):

payment_tool = with_resilience(with_confirm(with_idempotency(raw_payment_tool)))
  1. Audit — callback or node that logs each tool call (M9).

12.5 Composition diagram

M1 (LCEL, Chroma) ──────┐
M2 (splitters, meta) ───┼──▶ Template 09 (RAG lineal)
M3 (stores) ────────────┘
                        │
M4 (filters) ───────────┼──▶ Template 02 (+ structured + rules)
M5 (structured, eval) ──┘
                        │
M6 (LangGraph, tools) ──┼──▶ Template 01 (+ guardrails + audit)
M9 (producción) ────────┘

12.6 When to use LangChain vs LangGraph vs LlamaIndex

Framework Best for Example template
LangChain LCEL Linear RAG chains 09
LangGraph Agents with cycle and state 01, 03
LlamaIndex Query engines, complex indexes 05 (alternative)
CrewAI / AutoGen Collaborative multi-agent 10 (M7)

Full table: tecnologias-comparadas.md.


⬅️ Course plan · Lab · Exercises