⚖️

Technologies compared

Compared technologies — RAG & Agentic AI course reference

Decision tables for choosing models, stores, frameworks, and tools against competing alternatives. Each section follows the course polarity: why it exists, when to use it, when NOT to, and what replaces it. Price figures and context window sizes are approx. 2025 — verify in the provider documentation before budgeting.

Audience: Python programmers, no prior RAG/AI knowledge. Anchor each decision to a RAGorbit node (catalogo-nodos.md) and the module where it is covered in depth.

1. LLM models

Table: closed providers (API)

	Claude (Anthropic)	GPT (OpenAI)	Gemini (Google)
Main models	Opus 4.8, Sonnet 4.6, Haiku 4.5	GPT-4o, GPT-4o-mini	Gemini 1.5 Pro, Flash
Context window (approx. 2025)	200K tokens	128K tokens	1M tokens
Strengths	Long reasoning, instruction following, safety	Broad ecosystem, mature tool calling	Huge window, multimodal, Google integration
Output price (approx. 2025)	Opus: ~$15/MTok	GPT-4o: ~$10/MTok	Pro: ~$7/MTok
How to run them	API (`ANTHROPIC_API_KEY`)	API (`OPENAI_API_KEY`)	API (Google AI / Vertex)
Privacy	Data leaves to provider cloud	Same	Same
Offline mode	No	No	No
RAGorbit default	`anthropic:claude-opus-4-8`	configurable	configurable

Table: open-weights models (public weights)

	Llama (Meta)	Mistral	Gemma (Google)
License	Llama 3 Community	Apache 2.0 (Mistral 7B)	Gemma Terms
Models	Llama 3.1 8B / 70B / 405B	Mistral 7B, Mixtral 8x7B, Mistral Large	Gemma 2 2B / 9B / 27B
Window (approx. 2025)	128K (3.1 70B)	128K (Large)	Variable by size
How to run them	Ollama, Hugging Face, vLLM	Ollama, Mistral API, HF	Ollama, Hugging Face
Cost	Infrastructure only (GPU/CPU)	Infrastructure only	Infrastructure only
Privacy	Total if local	Total if local	Total if local

Table: deployment forms

Form	What it is	When	Limitations
Provider API	HTTP call to Claude/OpenAI/Gemini	Fast prototype, maximum quality without GPU	Cost per token, data in cloud
Ollama	Local runtime with one command (`ollama run llama3.1`)	Development without network, confidential data	Lower quality than frontier; GPU recommended
Hugging Face	Model hub + Inference API or self-host	Experiment with open models, embeddings	Self-host requires DevOps; API with limits
vLLM / TGI	High-performance inference server	On-premise production at scale	Requires GPU and operations

RAGorbit node: model.llm · Module: M1 — Fundamentals

How to choose / when to use each

For prototypes where cost does not matter, Claude Opus 4.8 or GPT-4o offer the best reasoning quality with minimal integration effort (one model field in the model.llm node). In production with a flexible budget, Sonnet 4.6 or GPT-4o balance quality and cost; for high volume, Haiku 4.5 or GPT-4o-mini reduce the per-token bill. If data cannot leave your infrastructure — banking contracts, medical records, air-gapped environments — Llama 3.1 70B via Ollama or vLLM is the natural option, assuming you accept lower reasoning quality and GPU cost. Gemini 1.5 Pro stands out when you need a huge context window (1M tokens) for long-context or multimodal, but remember that RAG is usually cheaper and more precise than "stuffing the whole document" into the prompt.

2. Embedding models

Model	Dim	Max tokens	Multilingual	Cost (approx. 2025)	Privacy	Symmetric / asymmetric
`text-embedding-3-small` (OpenAI)	1 536	8 191	Yes	$0.02/1M tokens	External API	Symmetric (with optional prefixes)
`text-embedding-3-large` (OpenAI)	3 072 (reducible)	8 191	Yes	$0.13/1M tokens	External API	Symmetric
`text-embedding-ada-002` (OpenAI)	1 536	8 191	Yes	$0.10/1M tokens	External API	Legacy
`embed-english-v3.0` (Cohere)	1 024	512	No (English)	$0.10/1M tokens	External API	Asymmetric (`search_query` / `search_document`)
`embed-multilingual-v3.0` (Cohere)	1 024	512	Yes (~100 languages)	$0.10/1M tokens	External API	Asymmetric
`BAAI/bge-large-en-v1.5` (local)	1 024	512	No	Free	Total if local	Asymmetric (query/passage prefixes)
`BAAI/bge-m3` (local)	1 024	8 192	Yes	Free	Total if local	Asymmetric
`intfloat/e5-large-v2` (local)	1 024	512	No	Free	Total if local	Asymmetric (`query:` / `passage:`)
`intfloat/multilingual-e5-large` (local)	1 024	512	Yes	Free	Total if local	Asymmetric
`nomic-embed-text-v1` (local)	768	8 192	No	Free	Total if local	Symmetric

RAGorbit node: model.embedding · Module: M3 — Embeddings and stores

How to choose / when to use each

RAG retrieval is almost always asymmetric: the query is short ("vacation days?") and the document is a long paragraph. That is why E5 and BGE with task prefixes (query: / passage:) often beat purely symmetric embeddings on retrieval benchmarks. If you already have an OpenAI API key and want the shortest development time, text-embedding-3-large is the RAGorbit default and works well multilingual. Cohere fits if you already use its reranker or need the API's explicit asymmetric mode. For total privacy or offline mode, local BGE or E5 (via sentence-transformers, Ollama nomic-embed-text) eliminate external calls; you need a GPU to index at scale. Critical rule: the same model must be used at ingest and query; if you change it, re-index everything.

3. Vector stores

Store	Type	Metadata filters	CRUD	Persistence	Practical scale	On-prem	Cloud managed	Main strength
ChromaDB	Open source, embedded	Rich (operators)	add/update/delete native	Local disk	~10M vectors	✅	❌ native	Zero-config, ideal for prototypes
FAISS	Library (Meta)	Manual (external)	Manual	File on disk	100M+	✅	❌	Extreme speed, total control
pgvector	Postgres extension	Full SQL (`WHERE`)	Standard SQL	Postgres	~5M practical	✅	✅ (RDS, Supabase)	Joins, ACID transactions, complex filters
Qdrant	Dedicated vector DB	Very rich payload filtering	REST/gRPC API	Disk + snapshots	100M+	✅ Docker	✅ Qdrant Cloud	Advanced filters, Rust, good performance
Pinecone	SaaS	Metadata filters	API	Managed	Unlimited (SaaS)	❌	✅	Zero-ops, automatic scale
Weaviate	Vector DB + graph	GraphQL + hybrid BM25	API	Disk/cluster	100M+	✅ Docker	✅ WCS	Native hybrid, multimodal
Milvus	Open source enterprise	Rich	API	Distributed cluster	1B+	✅	✅ Zilliz	Massive scale, Attu ecosystem

RAGorbit nodes: store.chroma, store.pgvector, store.qdrant · Module: M3 — Embeddings and stores

How to choose / when to use each

ChromaDB is the first step in almost every course project: zero server, simple CRUD, native metadata filters — perfect for template 09 (HR) and demos. FAISS when you need maximum speed and control the infrastructure yourself, but accept implementing filters and CRUD by hand (anti-pattern: FAISS + complex business filters). pgvector if you already have Postgres: template 02 (Banking) uses it because hard SQL filters are a regulatory requirement and you can JOIN with operational tables. Qdrant balances on-premise production with rich payload filters without adding Postgres. Pinecone for teams that do not want to operate infrastructure and accept SaaS lock-in. Weaviate if you need hybrid search (semantic + BM25) without extra code. Milvus only when you exceed tens of millions of vectors and have a platform team. Anti-patterns: Chroma in production with 50M+ docs; pgvector > 5M without prior benchmarking; Pinecone for convenience without evaluating cost at scale.

4. Chunking strategies

Strategy	Deterministic	Requires structure	Natural metadata	Pros	Cons	Ideal case
Fixed	Yes	No	No	Simple, fast, predictable	Cuts sentences and paragraphs mid-way	Prototype, homogeneous free text
Recursive	Yes	Paragraphs/sentences	No	Robust default; respects text hierarchy	Does not understand legal clauses or ATA sections	Articles, reports, policies (RAGorbit default)
Semantic	No (uses embeddings)	No	No	Variable-size chunks by semantic coherence	Slower and costlier at ingest	Dense narrative texts without clear structure
By-layout	Yes (with Unstructured)	PDF visual structure	Block `tipo`	Preserves tables, titles, lists as units	Requires advanced parser (Unstructured)	Reports with tables, rich PDFs
By-clause / by-section	Yes	Domain structure	`clausula_id`, `ata_chapter`	Exact citability; precise hard filters	Requires knowing the document schema	Contracts, regulations, ATA manuals

RAGorbit node: ingest.chunker · Module: M2 — Ingestion

How to choose / when to use each

Start with recursive (RAGorbit default: chunkSize=1000, overlap=150) unless the domain forces something else. Move to by-clause or by-section when citability is legal or compliance-related: one chunk = one numbered clause or one ATA section (template 05 Legal, template 08 Manufacturing). Use by-layout when the PDF mixes tables, figures, and text and a character splitter destroys meaning — typically with Unstructured as a pre-step. Semantic chunking only when recursive produces incoherent chunks in very long narrative texts and you have GPU budget at ingest. Fixed only for quick prototypes or when documents are already pre-chunked. Overlap > 30% inflates the index without proportional benefit.

5. Ingestion frameworks

Framework	Abstraction	Strengths	Weaknesses	Best for	Avoid if
LangChain loaders	`Document` + 100+ loaders in `langchain-community`	Easy install; integration with LCEL splitters and stores	Extraction quality varies by underlying loader	Simple PDFs, CSV, web; LangChain stack	You need maximum quality on complex PDFs
LlamaIndex readers	`Node` + `llama-hub` readers	Rich metadata by default; multi-format `SimpleDirectoryReader`	Ecosystem separate from LangChain	LlamaIndex projects; mixed directories	You only use LangChain without mixing
Unstructured.io	Typed elements (`Title`, `Table`, `NarrativeText`)	Best parsing of rich PDFs; `hi_res` mode with vision	Slower; `hi_res` requires heavy dependencies or cloud API	Complex tables, multiple columns, figures	PDF is simple plain text
`loader.multimodal` RAGorbit	Integrated pipeline tables→JSON, images→vision	`sectionScheme` (ATA), contract with graph nodes	Vision cost and latency	Technical manuals, policies with photos	Document is text-only

RAGorbit nodes: loader.*, ingest.chunker · Module: M2 — Ingestion

How to choose / when to use each

If your stack is already LangChain/LangGraph (like RAGorbit codegen), LangChain loaders cover 80% of cases with minimal friction. If the project revolves around LlamaIndex indexes and query engines, its readers offer richer metadata from the start. When extraction quality is the bottleneck — legal PDFs with column tables, financial reports — Unstructured before the chunker is worth it even if it adds latency. The loader.multimodal RAGorbit node combines tabular extraction, vision, and sectionScheme in a contract that fits directly with ingest.chunker and hard filters from retrieval.vector.

6. Retrieval and rerankers

Search: dense vs BM25 vs hybrid

Method	Precision	Recall	Latency	When
BM25 (keyword)	High on exact terms	Low on semantics	Very low	IDs, codes, part numbers, proper names
Vector (dense)	Medium-high	High on natural language	Low	Everyday-language questions, synonyms
Hybrid	High	High	Medium	General case in technical + natural domains
GraphRAG	Very high (structure)	Medium	High	Relationships between entities (Neo4j)

List fusion (hybrid)

Method	When to prefer
RRF (Reciprocal Rank Fusion)	Scores on different scales (BM25 and cosine) — recommended default
Weighted sum (`alpha`)	Scores normalized to the same scale; fine control vector vs keyword
Cross-encoder (reranker)	Maximum precision after retrieving noisy top-K

Rerankers

Model	Quality	Latency	Cost (approx. 2025)	When
BGE-reranker-v2	Very high	50–150 ms local	Free	On-premise production, critical domains
Cohere Rerank v3	Very high	100–300 ms API	Pay per use	Fast prototype, Cohere stack
ColBERT	High	20–80 ms	Free	Large scale, efficient late interaction
FlashRank	Medium-high	5–20 ms	Free	Critical latency, edge

RAGorbit nodes: retrieval.vector, retrieval.hybrid, retrieval.reranker · Module: M4 — Retrieval and query

How to choose / when to use each

Pure vector retrieval is enough for HR or homogeneous FAQs; as soon as ATA codes, policy numbers, or exact technical jargon appear, add BM25 and fuse with RRF (do not sum raw scores from incompatible scales). The reranker goes after retrieve: recover noisy top-10 or top-20, the cross-encoder returns precise top-3 (~50–150 ms extra). In legal, medical, or banking, the reranker is almost always justified; in high-volume bots with latency < 1 s, evaluate whether metadata hard filtering already removes noise. BGE-reranker local for privacy; Cohere Rerank if you have no GPU. GraphRAG only when relationships between entities matter as much as text (template 05 Legal with Neo4j).

7. Structured output

Mechanism	Validity guarantee	Cloud APIs	Local models	Automatic retries	Typical use
Tool-calling	High (fine-tuned on frontier)	Yes	Variable	No	OpenAI/Anthropic/Google production
JSON-mode	Medium (valid JSON, not schema)	Yes	Variable	No	Very simple schemas
instructor	High (Pydantic + retries)	Yes	Yes	Yes (`max_retries`)	When tool-calling is unavailable
outlines	Total (formal grammar)	No	Yes	No	Local HF models, critical latency
`with_structured_output` (LangChain)	High (Pydantic)	Yes	Variable	Variable	Pipelines already in LCEL/LangGraph

Criterion	instructor	`with_structured_output`	JSON-mode
Already using LangChain	Less natural	Best	Manual parser
Retries with validation feedback	Native	Variable	No
Strict schema validation	Yes	Yes	No (JSON syntax only)
LangSmith / tracing	Extra callbacks	Native	Manual
Models without tool-calling	With retries	Not available	Only option

RAGorbit node: logic.structured · Module: M5 — Generation and logic

How to choose / when to use each

In RAGorbit pipelines (LangGraph/LCEL), with_structured_output is the most natural option: Pydantic validates shape, integrates with LangSmith, and fits the logic.structured node. Use instructor if you want structured output without coupling to LangChain or need automatic retries with validation error messages. Tool-calling when the model supports it well and the schema is complex — the production path on frontier models. JSON-mode only for simple objects without field validation. outlines exclusively with local Hugging Face models where you need a formal guarantee that output satisfies the grammar. Remember: Pydantic validates shape, not truth — combine with logic.citations and RAGAS faithfulness evaluation. Business thresholds (score >= 70) go in logic.rules, never in the LLM.

8. Evaluation frameworks

Framework	Type	CI/CD integration	Dashboard	Real time	Provider-agnostic	Main metrics
RAGAS	Batch/offline	Yes (via pytest)	No (exports CSV/JSON)	No	Yes	faithfulness, answer relevancy, context precision/recall
TruLens	Instrumentation	Partial	Yes (Streamlit)	Yes	Yes	groundedness, relevance per call
DeepEval	LLM unit tests	Yes (native pytest)	Yes (cloud)	No	Yes	Metrics as tests with `threshold`
promptfoo	Prompt/model evaluation	Yes (CLI/YAML)	Yes (HTML)	No	Yes	A/B comparison of prompts and providers

RAGorbit nodes: logic.citations, observability.feedback · Module: M5 — Generation and logic

How to choose / when to use each

RAGAS is the standard for evaluating a full RAG pipeline in batch before a release or in nightly CI: you need a dataset with question, answer, contexts, and optionally ground_truth. DeepEval turns the same metrics into pytest tests with thresholds — ideal if your team already thinks in "tests that fail the build". TruLens instruments each call in development and shows a real-time dashboard to iterate prompts without exporting datasets. promptfoo shines at comparing models and prompts in parallel (YAML + HTML table) — perfect for deciding between Claude and GPT-4o or between two system prompt versions. In mature production: TruLens (continuous monitoring) + RAGAS (periodic batch evaluation).

9. Agent and multi-agent frameworks

Framework	Mental model	Memory / state	Flow control	Learning curve	Ideal cases
LangGraph	State graph (`StateGraph`, checkpoints)	Checkpointer (memory, SQLite, Postgres)	Maximum — conditional edges, subgraphs, HITL	Medium-high	Transactional agents, supervisor multi-agent, RAGorbit production
CrewAI	Crew = agents + tasks + roles	Crew memory (short/long term)	Medium — declarative orchestration by roles	Low-medium	Agent teams with fixed roles (researcher, writer, reviewer)
AutoGen / AG2	Conversation between agents	Conversational history	Low-medium — emergent from dialogue	Medium	Collaborative prototypes, coding agents, exploration
BeeAI	Modular agents (IBM)	Configurable per agent	Medium	Medium	IBM/watsonx enterprise integration, governed agents
Semantic Kernel	Plugins + planners (Microsoft)	Semantic memory + embeddings	Medium-high — automatic planners	Medium-high	.NET/Azure ecosystem, orchestration with typed plugins

Agent patterns (complement)

Pattern	Flexibility	LLM cost	When
ReAct	High	Medium (N steps)	Starting point — conversational and transactional agents
Plan-and-Execute	Low (fixed plan)	Higher	Long tasks with well-defined steps
Reflexion	High	High (steps + evaluation)	Batch with reliable evaluation function

RAGorbit nodes: agent.react, agent.fanout, tool.* · Modules: M6 — Agents I, M7 — Agents II

How to choose / when to use each

LangGraph is the production framework of the course and RAGorbit: explicit graphs, checkpointing between turns, guardrails as conditional edges, and traceable audit. Start with create_react_agent (M6) and migrate to explicit StateGraph when you need HITL, fan-out, or subgraphs (M7). CrewAI accelerates multi-role prototypes where each agent has a fixed persona — excellent for the comparative rebooking workshop in M7, less fine control than LangGraph in regulated production. AutoGen/AG2 serves to explore emergent conversational dynamics; in transactional customer service the implicit flow makes audit difficult. BeeAI and Semantic Kernel make sense in already-adopted IBM or Microsoft stacks. For a single agent with 2–3 tools, ReAct (agent.react) is enough; multi-agent only when there is real parallelization (template 10 Logistics) or strong domain specialization.

10. MCP vs plugins and proprietary functions

Aspect	MCP (Model Context Protocol)	Plugins / proprietary functions
Standard	Open (Anthropic + ecosystem)	Closed per provider (OpenAI Plugins, Assistants)
Transport	STDIO, Streamable HTTP	Provider proprietary HTTP
Discovery	Client lists tools/resources/prompts from server	Manual definition per integration
Security	Sampling, roots, permission approval	Variable; depends on provider
Portability	One MCP server serves Claude, Cursor, custom agents	Lock-in to provider ecosystem
Implementation	FastMCP (Python), custom servers	Provider SDK
In RAGorbit	Node `tool.mcp`	`tool.service`, `tool.http`

Module: M8 — MCP

How to choose / when to use each

Use MCP when you want to expose tools in a standard, reusable way across different clients (IDE, airline agent, internal copilot) with an explicit permission model — the M8 workshop exposes PolicyRAG as an MCP server with approval of sensitive actions. Proprietary tools (tool.service, tool.http) remain valid for one-off REST integrations that do not need the protocol or portability. MCP does not replace your business APIs: it wraps them in a contract the agent discovers dynamically. In regulated production, combine MCP with guardrail.pre-tool and guardrail.confirm for irreversible actions.

11. Guardrails

Approach	What it offers	Strengths	Weaknesses	When
Guardrails AI	Python validators + guardrails hub (PII, toxicity, schema)	Integrable in pipeline; pre/post LLM validation	Curve for complex custom guardrails	Python-first startups and teams
NeMo Guardrails	Colang DSL + programmatic rails (NVIDIA)	Multi-turn conversation with declarative rails	NVIDIA stack; DSL learning curve	Enterprise environments with NeMo/NVIDIA
Custom (RAGorbit)	`guardrail.*` nodes in the graph	Deterministic, auditable, no external dependency	Must implement each rule	Production with legal/financial consequences

Equivalent RAGorbit nodes

Need	Native node	External alternative
Validate before executing tool	`guardrail.pre-tool`	Guardrails AI validator
User confirmation (payments)	`guardrail.confirm`	NeMo dialog rail
Transactional idempotency	`guardrail.idempotency`	Redis + composite key
Resilience (retry, circuit breaker)	`guardrail.resilience`	tenacity, Istio

RAGorbit nodes: §11 guardrail · Module: M9 — Production and security (pending, see PLAN.md §6 M9)

How to choose / when to use each

The course golden rule: restrictions with legal or financial consequences must be deterministic, not instructions in the prompt. RAGorbit guardrail.* nodes implement that in the graph — the LLM does not decide whether a payment > $500 requires confirmation; the guardrail.confirm node enforces it. Guardrails AI and NeMo Guardrails add content validation layers (PII, toxicity, jailbreaks) useful as a complement, especially in the exploration phase. In banking, airlines, or healthcare, prioritize custom guardrails in the graph + prompt injection tests (M9 workshop); external libraries are accelerators, not substitutes for business logic.

12. Observability

Tool	Approach	LLM traces	Infra metrics	Cost	Open source	LangChain integration
LangSmith	LangChain LLM platform	Excellent (chains, agents, tools)	Basic	SaaS (limited free tier)	No	Native
Langfuse	LLM observability	Complete (prompts, tokens, latency)	Basic	SaaS + self-host	Yes (core)	Good (callbacks)
OpenTelemetry + Phoenix	OTel standard + Arize Phoenix UI	Good (via instrumentation)	Excellent (Prometheus/Grafana)	Free (self-host)	Yes	Via OTel callbacks

RAGorbit nodes: observability.audit, observability.metrics, observability.feedback · Module: M9 — Production and security (pending)

How to choose / when to use each

LangSmith if your entire stack is LangChain/LangGraph and you want to debug chains and agents with minimal configuration — the lowest-friction path in M6+. Langfuse when you need open source or self-hosting with a dashboard of prompts, costs, and latency without lock-in to LangChain. OpenTelemetry + Phoenix (or Jaeger/Grafana) when observability must unify LLM with infrastructure — Kafka throughput, P95 latency, circuit breakers — as in template 10 (Logistics). In RAGorbit, observability.audit publishes tool calls to Kafka/log for regulatory audit; the tools above complement with token visibility and debugging. Combine audit in the graph + Langfuse/LangSmith for development + OTel for production.

13. UIs for RAG and agents

Framework	Paradigm	Chat UI	Deployment	Curve	Best for
Gradio	ML components (`gr.ChatInterface`, `gr.Blocks`)	Native, polished with little code	Hugging Face Spaces, local	Low	RAG demos, internal prototypes, quick chatbots
Streamlit	Reactive script (`st.chat_message`, `st.chat_input`)	Good with widgets	Streamlit Cloud, local	Low	Evaluation dashboards (TruLens), internal tools
Flask (+ FastAPI in RAGorbit)	Traditional API/web	Must build it	Any hosting	Medium	Production, total control, integration with existing systems

RAGorbit nodes: io.input, io.output · Modules: M9 — Production (pending), also covered in IBM syllabus (M1/M5 Flask)

How to choose / when to use each

Gradio is the fastest option for teaching and demonstrating RAG: gr.ChatInterface in ~20 lines connected to your chain. Streamlit shines when the UI is a monitoring or evaluation panel (TruLens dashboard, faithfulness metrics) rather than a production chat. Flask/FastAPI when you need authentication, SSE/WebSocket, rate limiting, and a stable API contract — RAGorbit generates deploymentTarget: chat-service with FastAPI for that. For the M9 workshop (payment with idempotency + guardrail), Gradio is enough to test the flow; in production you migrate the same logic to the FastAPI skeleton from codegen.

14. Production orchestration

Approach	Durability	State	Retry / saga	Ops complexity	When
Temporal	High (durable workflow)	Full workflow history	Native, with compensations	High (Temporal cluster)	Processes of days/weeks, HITL, multi-step approvals
Queues + DB state (Kafka + Postgres)	Medium-high	State in tables/event log	Manual (idempotency, retries)	Medium	High volume event-driven, massive fan-out
Cron + batch	Low	Files/checkpoints	Manual	Low	Nightly indexing, short jobs

RAGorbit mapping

Node	Deployment target	Pattern
`io.trigger`	`temporal`	Long workflows with cron and human waits
`io.event-source`	`event-worker`	Kafka + exactly-once + fan-out (`agent.fanout`)
`io.batch`	`batch`	Scheduled file processing
`io.input`	`chat-service`	Real-time FastAPI SSE/WebSocket

RAGorbit nodes: io.trigger, io.event-source · Module: M9 — Production (pending) · Template: 10-logistics-disruption-rebooking

How to choose / when to use each

Temporal when the flow can last days, includes human steps, and must survive server restarts — banking onboarding, multi-stage medical approvals. Operational complexity is only justified with truly long processes. Kafka + DB state is the template 10 pattern: thousands of disruption events, stateless agent.fanout, idempotency via guardrail.idempotency and exactly-once in the consumer. You do not need Temporal if each event is processed in seconds and state lives in Postgres. Batch with cron to re-index documents at night (templates 02, 04). Practical rule: real-time chat → FastAPI; massive events → Kafka; endless processes with humans → Temporal.

15. RAG vs fine-tuning vs pure prompting

Criterion	Pure prompting	RAG	Fine-tuning
Initial cost	Minimum	Medium (index + embeddings)	High (GPU + data + training)
Cost per query	Prompt tokens	Tokens + retrieval	Tokens only (model already specialized)
Data needed	None (or few-shot)	Updatable documents	500–5 000+ quality Q/A pairs
Knowledge update	Change prompt	Re-index documents	Retrain model
Traceability / citations	No (source hallucinations)	Yes (retrieved chunks)	Not inherently
Privacy	Data in prompt to provider	Documents in own index	High if training locally
Best for	General tasks, format, simple classification	FAQs, manuals, policies, compliance	Brand style, ultra-specialized domain
RAGorbit node	`logic.prompt`	pipeline `loader→store→retrieval→logic`	(external to graph; complements `model.llm`)

Decision tree

Do you have updatable proprietary documents?
  NO → Pure prompting (zero/few-shot)
  YES → RAG

Does RAG + base model give sufficient quality?
  YES → Stay with RAG
  NO → Do you have +1000 quality Q/A pairs?
         NO → Improve prompting / retrieval / reranker
         YES → Consider fine-tuning (+ RAG in mature systems)

Module: M1 — Fundamentals

How to choose / when to use each

Pure prompting solves writing, translation, and classification when the model already knows the topic. As soon as knowledge is private, changing, or must be cited, RAG is the first option — the central pattern of the entire course and the 10 templates. Fine-tuning does not replace RAG: it teaches how to reason or what tone to use, while RAG provides the factual what. The RAG + fine-tuning combination appears in mature healthcare or legal systems, but the course insists on mastering RAG first because it is cheaper, auditable, and iterable. If RAG fails, before fine-tuning improve chunking, metadata, hybrid, and reranker (M2–M4).

Criticisms of the LangChain / LangGraph / LangSmith stack and when NOT to use it

This section does not invalidate the tables above (§5, §7, §9, §12): LangChain/LangGraph remain the reference framework of the course and RAGorbit, but the tri-modal method requires naming criticisms honestly and knowing when another path is healthier.

Why so many criticisms

LangChain grew very fast between 2022 and 2024: from loose utilities to LCEL, to fragmentation into packages (langchain, langchain-core, langchain-community, langchain-openai…), and LangGraph as the agent orchestration layer. That pace left many abstractions, documentation that could not keep up with new versions, and tutorials on the web written for APIs already retired.

Part of the criticism still circulating in forums and posts is historical and outdated: complaints about monolithic LLMChain, imports from langchain.schema, or opaque pre-LCEL/LangGraph agents no longer describe the stack this course uses (LCEL + explicit LangGraph, as of 2025/2026). Another part remains valid: debugging inside composed chains, version churn, the question of whether a framework is needed, LangSmith lock-in, and the weight of the dependency tree. The course polarity applies here too: understand the mechanism in layer ② and choose layer ③ with criteria, not fashion.

Valid criticisms and their nuance

Criticism	What is true	Mitigation / when it does not apply
Over-abstraction and "leaky abstractions"	When something fails in the middle of an LCEL pipeline or a LangGraph node, the stack trace crosses layers (`RunnableSequence`, callbacks, wrappers) and it is hard to see whether the bug is in the model, retriever, or parser.	Build layer ② (scratch) first to know which step fails; in production, explicit graphs (`StateGraph`) instead of opaque chains; logging per node. Does not apply if the pipeline is short (retrieve → prompt → LLM) and you already master it.
Version churn / breaking changes	Between LangChain 0.1 and 0.2+ (and subsequent minors through 2025/2026) import paths, separated packages, and deprecated APIs changed. A `pip install -U` can break CI.	Pin versions in `requirements.txt` or lockfile; follow the style of the active module's `solucion_framework.py`; migrate package by package (`langchain-core` stable, integrations in `langchain-*`). Less pain if you do not update on every release.
Curve: too many ways to do the same thing	Loaders, retrievers, memory, "built-in" agents, and LCEL compete with LangGraph patterns; official documentation improves but third-party examples still show legacy paths.	This course reduces the menu: LCEL for linear RAG (M1 §11), `create_react_agent` → `StateGraph` for agents (M6 §8). If you already master one pattern, do not add another without reason.
"Do you really need a framework?"	For many cases — a chat with 3 tools, RAG with Chroma and an embedding — the provider SDK (`openai`, `anthropic`) + vector store + a 40-line `while` loop is enough. The framework adds composition and ecosystem, not magic.	Use a framework when you change providers often, the pipeline has many steps, or you need checkpointing/HITL (LangGraph). Does not apply to one-shot scripts or teams prioritizing minimal dependencies.
LangSmith: proprietary, limited free tier, observability lock-in	LangSmith is SaaS from LangChain Inc.; the free tier has trace/retention limits (see current provider pricing). The richest traces are optimized for LangChain chains.	LangSmith remains the lowest-friction path if the entire stack is LangChain/LangGraph (consistent with §12). To avoid lock-in: Langfuse (open-source, self-host) or OpenTelemetry + Phoenix/Jaeger. In RAGorbit, `observability.audit` in the graph complements any of them.
Dependency overhead and attack surface	`langchain` + integrations pull dozens of transitive packages; more code = more CVEs to watch and more `pip install` time in CI.	Course layer ② runs on stdlib; layer ③ only where it adds value. In regulated environments, audit `pip audit` / SBOM; consider native SDK + minimal libraries (`chromadb`, `instructor`).

How this course responds to those criticisms

The tri-modal method (PLAN §2, HANDOFF §3) is the main defense: in layer ② you implement retrieval, tool calling, and ReAct loops by hand; in layer ③ you see how the framework reimplements the same thing. If LangChain changes an import or a chain fails in production, you reason about the mechanism, not just the wrapper.

"Frameworks do not do magic" — demonstrated by M1 §11 (Document → splitter → embeddings → vector store → retriever → LCEL) and M6 §8 (your ReAct while mapped to nodes and edges).
You do not depend on the framework to design — the RAGorbit node catalog is agnostic; LangChain is one layer ③ implementation, not the system definition.
Explicit alternatives — §9 already compares LangGraph with CrewAI/AutoGen; hands-on without LangChain (table below) avoid monoculture.
Observability without lock-in — §12 positions Langfuse and OTel as legitimate peers of LangSmith; the course does not assume you pay SaaS to learn.

When LangGraph/LangChain ARE worth it — and when NOT

Situation	Recommendation	Why
Multi-step RAG pipeline (retrieve → rerank → structured output → rules) with provider change	LangChain/LCEL + LangGraph	`Runnable` composition, parsers, and auditable nodes; aligned with RAGorbit codegen
Transactional agent with checkpointing, HITL, subgraphs, fan-out	LangGraph (`StateGraph`)	Explicit flow control — consistent with §9 and M7
Multi-role prototype with fixed personas (researcher, writer)	CrewAI or LangGraph	CrewAI faster for roles; LangGraph if you later need guardrails in the graph
Single script, < 50 lines, one provider, no memory between sessions	Native SDK	Fewer dependencies, direct debugging
Simple RAG (embed → Chroma → top-k → prompt) without complex orchestration	Native SDK or LlamaIndex	See rag-sin-langchain.md; LangChain adds little if you do not compose many steps
Team already standardized on LlamaIndex, Haystack, or Pydantic-AI	That framework	Do not add LangChain on top without a planned migration
Unified LLM + infra observability (Kafka, P95, Prometheus)	OpenTelemetry (+ Phoenix/Jaeger)	§12; LangSmith only covers the LLM layer well
Environment with strict dependency audit / air-gap	Layer ② + minimal SDK	Smaller surface; full framework only if the business requires it

Alternatives by layer

Layer	Option without LangChain (course hands-on)	When to prefer it
RAG (ingest → retrieve → generate)	rag-sin-langchain.md — LlamaIndex, Haystack, native SDK	Project centered on indexes/query engines; minimal deps; IBM M2/M4 comparison
Agents (tools, ReAct, multi-agent)	agentes-sin-langchain.md — CrewAI, AutoGen/AG2, Pydantic-AI, native loop	Declarative roles (CrewAI), emergent dialogue (AutoGen), typed validation (Pydantic-AI), or total control (loop)
Observability	Langfuse, OpenTelemetry (+ Phoenix)	Self-host, no lock-in to LangChain Inc.; see §12
Structured output	`instructor`, `outlines`	Without coupling to `with_structured_output`; see §7

How to choose / when to use each

If you reach this document from scratch, do not interpret the criticisms as "avoid LangChain": the course uses it in layer ③ because RAGorbit codegen and the templates align with LCEL/LangGraph, and because changing providers or composing long pipelines is cheaper with those pieces. Do avoid it (or delay it) when you still do not understand what each step does — there layer ② is mandatory, not optional — or when your organization already chose another framework and mixing two stacks only doubles debt.

For observability, LangSmith in development + Langfuse/OTel in production is a reasonable combination and does not contradict §12. For simple agents, native SDK or Pydantic-AI may suffice; reserve LangGraph for flows that §9 already marks as production (checkpoints, HITL, audit). The course practical rule: understand the mechanism in ②, choose the tool in ③, and be able to name the alternative in the table above before a Reddit post decides for you.

Cross-links

Node catalog (per-node sheet): catalogo-nodos.md

Course plan (modules and "Competes"): PLAN.md — especially §6 and §11

Guides by module:

M1 Fundamentals — LLM, minimal RAG, model choice

M2 Ingestion — loaders, chunking, metadata

M3 Embeddings and stores — indexes, Chroma, FAISS, pgvector

M4 Retrieval and query — hybrid, rerank, GraphRAG

M5 Generation and logic — structured output, evaluation

M6 Agents I — ReAct, memory, LangGraph

Industry templates: examples/

Flow IR contract: docs/01-concepts.md

Technical node catalog: docs/02-node-catalog.md

RAGorbit course reference document. Generated for modular study: read it alongside the active module guide and return here when you need to compare alternatives.

← Back to course View on GitHub →