RAG without LangChain
RAG without LangChain — building the same HR assistant with competing technologies
RAGorbit course reference. This document teaches how to build the same RAG from the M1 workshop (HR policy assistant, template 09-hr-policy-assistant) using LangChain alternatives: LlamaIndex, provider native SDK + Chroma, and Haystack. The pedagogy mirrors the course sections "Layer ③ explained": bridge table, API by API, block-by-block walkthrough, when to use / when NOT to, and gotchas.
Audience: Python programmers who have already completed layer ② (
01-fundamentos/lab/solucion_scratch.py) and want to be a full AI engineer, not only alang*expert.Reference query throughout this document:
¿Cuántos días de vacaciones me corresponden si llevo 3 años en la empresa?Expected answer (per §3 of the policies): 18 días hábiles de vacaciones.
Introduction: why learn RAG without LangChain
In M1 you learned LangChain because it is RAGorbit's orchestration framework for codegen and the most documented ecosystem. But mastering LangChain is not mastering RAG. RAG is a four-step pattern (index → retrieve → augment prompt → generate) that exists before and after any framework.
Learning the alternatives gives you three concrete advantages:
- Framework independence. If tomorrow your company adopts LlamaIndex, or decides to drop intermediate layers and call APIs directly, you do not start from zero.
- Engineering judgment. You will know when a framework adds value (abstractions, composition, observability) and when it is just dead weight in an 80-line script.
- Debugging. When a pipeline fails in production, the error is usually in embeddings, chunks, or the prompt — not in the LangChain import. Understanding each piece without the framework lets you isolate the problem.
WHAT YOU ALREADY KNOW (layer ②) WHAT YOU WILL LEARN HERE
──────────────────────── ─────────────────────────────────────
cargar_chunks() + embed() The same with LlamaIndex, native SDK, or Haystack
recuperar() + construir_prompt() Same HR case, same query, different tools
main() orchestrating everything When each approach wins or loses vs LangChain
Global bridge table: layer ② → each technology
This table maps each function from solucion_scratch.py to its equivalent in LangChain (M1 §11), LlamaIndex, native SDK, and Haystack:
| What you did by hand (layer ②) | LangChain (M1 §11) | LlamaIndex (§1) | Native SDK + Chroma (§2) | Haystack (§3) |
|---|---|---|---|---|
cargar_chunks(ruta) — read txt and split by --- |
TextLoader + CharacterTextSplitter |
Document + manual split or SentenceSplitter |
open() + re.split(r"\n---\n", ...) (stdlib) |
Document + manual split at index time |
embed(texto) — bag-of-words → dict |
OpenAIEmbeddings |
Settings.embed_model / OpenAIEmbedding |
SentenceTransformer.encode() or embeddings API |
SentenceTransformersDocumentEmbedder |
chunks list in memory |
Chroma.from_documents(...) |
VectorStoreIndex.from_documents(...) |
collection.upsert(...) |
document_store.write_documents(...) |
similitud_coseno() + sort |
as_retriever(search_kwargs={"k": 3}) |
as_query_engine(similarity_top_k=3) / as_retriever |
collection.query(n_results=3) |
InMemoryEmbeddingRetriever |
recuperar() → (índice, sim, texto) |
retriever.invoke(query) → list[Document] |
retriever.retrieve(query) → list[NodeWithScore] |
resultados["documents"] + distances |
retriever.run(query=...) → documents |
construir_prompt() — f-string |
ChatPromptTemplate |
PromptTemplate + text_qa_template |
f-string / manual str.format |
PromptBuilder (Jinja2 template) |
| (no LLM in scratch) | ChatOpenAI / ChatAnthropic |
Settings.llm / Anthropic |
anthropic.Anthropic().messages.create(...) |
OpenAIGenerator / AnthropicChatGenerator |
main() orchestrating |
LCEL chain with | |
query_engine.query(...) |
Sequential responder(query) function |
Pipeline.run(...) |
RAGorbit nodes from template 09: loader → ingest.chunker → model.embedding → store.chroma → retrieval.vector → logic.prompt → model.llm. The four approaches in this document implement that same chain with different tools.
Environment: on the course study machine there is no
pipor network (HANDOFF.md§5). The framework code in this document is ILLUSTRATIVE — each block has a header# Requiere: pip install .... Run it in your environment when you have packages and API keys.
1. LlamaIndex (the main RAG alternative)
1.1 What LlamaIndex is and how it differs from LangChain
LlamaIndex (formerly GPT Index) is a Python framework focused on data + queries: load documents, build indexes, retrieve context, and answer questions. It was born as "the RAG library" before RAG went mainstream.
Mental model difference:
| Aspect | LangChain | LlamaIndex |
|---|---|---|
| Central unit | Composable Runnable with | (LCEL) |
Index (VectorStoreIndex, etc.) + query engine |
| Main strength | General orchestration (RAG, agents, tools, LCEL) | RAG pipelines, indexes, query engines, agents over indexes |
| Document abstraction | Document(page_content=..., metadata=...) |
Document(text=..., metadata=...) |
| Retrieval | vectorstore.as_retriever().invoke(query) |
index.as_retriever() or index.as_query_engine() |
| Generation | You wire retriever + prompt + LLM in LCEL | as_query_engine() integrates retrieve + prompt + LLM in one object |
| Ecosystem | LangGraph, LangSmith, 100+ integrations | LlamaHub readers, specialized indexes, LlamaParse |
Analogy: LangChain is a box of universal connectors (plugs for everything). LlamaIndex is a semantic search engine factory with a query accelerator (query_engine) that already includes the most common RAG wiring.
LANGCHAIN (M1) LLAMAINDEX (this §1)
──────────────── ─────────────────────────────────
TextLoader → Splitter → Chroma Document → VectorStoreIndex
→ as_retriever → LCEL chain → as_query_engine → .query()
You wire each step The query engine wires retrieve+prompt+LLM
Version note (2025/2026): since LlamaIndex 0.10,
ServiceContextis deprecated; in 0.11 it was removed. Use the global singletonSettingsor passembed_model/llmdirectly to local constructors. If you see old tutorials withServiceContext, they are obsolete.
1.2 Bridge table: scratch → LlamaIndex
| What you did by hand (layer ②) | LlamaIndex piece (layer ③) | RAGorbit node (template 09) |
|---|---|---|
cargar_chunks(ruta) |
Document(text=...) per fragment (manual split by \n---\n) |
loader + ingest.chunker |
embed(texto) |
Settings.embed_model = OpenAIEmbedding(...) |
model.embedding |
| In-memory list + vectors | VectorStoreIndex.from_documents(docs) |
store.chroma (conceptually) |
recuperar() top-3 |
index.as_retriever(similarity_top_k=3) |
retrieval.vector |
construir_prompt() + LLM |
index.as_query_engine(similarity_top_k=3, text_qa_template=...) |
logic.prompt + model.llm |
main() |
query_engine.query(pregunta) |
edges of flow.json |
1.3 The Document object
LlamaIndex uses Document with the text field (not page_content like LangChain):
from llama_index.core import Document
doc = Document(
text="POLÍTICA DE VACACIONES §3 — Acumulación y disfrute\nLos empleados...",
metadata={"source": "datos/politicas_rrhh.txt", "seccion": "§3"},
)
text: the fragment content (equivalent to each string in your scratchchunkslist).metadata: tags for later filters (M4). In HR you could add{"tipo": "vacaciones"}.
Indexes consume list[Document] and convert them internally into nodes (TextNode) with embeddings.
1.4 Settings — replacement for ServiceContext
In modern LlamaIndex, global configuration lives in Settings:
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.anthropic import Anthropic
# Embeddings — equivalent to OpenAIEmbeddings in LangChain
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
# api_key is read from OPENAI_API_KEY
)
# LLM — equivalent to ChatAnthropic in LangChain (RAGorbit default)
Settings.llm = Anthropic(
model="claude-opus-4-8",
temperature=0.2,
)
Settings attribute |
What it controls | Scratch / LangChain equivalent |
|---|---|---|
Settings.embed_model |
Global embedding model | embed() / OpenAIEmbeddings |
Settings.llm |
Global generation model | LLM stub / ChatAnthropic |
Settings.chunk_size |
Maximum chunk size (if you use automatic splitters) | chunk_size of CharacterTextSplitter |
Settings.chunk_overlap |
Overlap between chunks | chunk_overlap of the splitter |
Local alternative (HR privacy):
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
Settings.llm = Ollama(model="llama3.1", request_timeout=120.0)
1.5 VectorStoreIndex.from_documents
The VectorStoreIndex is the most used index in LlamaIndex: it converts documents into embeddings and enables semantic search.
from llama_index.core import VectorStoreIndex
# documentos: list[Document] — the 8 HR policy fragments
index = VectorStoreIndex.from_documents(
documentos,
show_progress=True,
)
# Under the hood: embed_documents → store vectors → index ready to query
What .from_documents does internally (offline phase):
documentos (8 Document)
│
├──▶ Settings.embed_model.get_text_embedding_batch([doc.text for doc in docs])
│ → 8 dense vectors
│
└──▶ In-memory vector index (or in Chroma if you use StorageContext — §1.8)
Equivalent to your loop for chunk in chunks: embed(chunk) + store in memory, but with real semantic embeddings.
1.6 as_query_engine — retrieve + prompt + LLM in one
The query engine is LlamaIndex's distinctive piece. In LangChain you wire retriever + prompt + LLM with LCEL; in LlamaIndex:
from llama_index.core import PromptTemplate
# Template equivalent to construir_prompt() from scratch
QA_TEMPLATE = PromptTemplate(
"Eres el asistente de RRHH de la empresa. "
"Responde ÚNICAMENTE basándote en los fragmentos de política proporcionados.\n\n"
"Fragmentos relevantes:\n{context_str}\n\n"
"Pregunta del empleado: {query_str}\n\n"
"Responde en markdown con lenguaje claro y sencillo."
)
query_engine = index.as_query_engine(
similarity_top_k=3, # top-3, like k=3 in recuperar()
text_qa_template=QA_TEMPLATE,
)
response = query_engine.query(
"¿Cuántos días de vacaciones me corresponden si llevo 3 años en la empresa?"
)
print(response.response) # final LLM text
# response.source_nodes # retrieved nodes (for inspection / citations)
| Parameter | Meaning | Scratch equivalent |
|---|---|---|
similarity_top_k=3 |
How many fragments to retrieve | k=3 in recuperar() |
text_qa_template |
Template with {context_str} and {query_str} |
construir_prompt() |
response_mode |
"compact", "tree_summarize", etc. |
How it condenses long context (default "compact" is enough for HR) |
Important prediction: with real semantic embeddings, §3 ("Después de 3 años… 18 días") usually ranks first — not §4 as in scratch bag-of-words. The mechanism is identical; vector quality changes.
1.7 as_retriever — retrieve only, no generation
If you want to control the prompt yourself (as in LangChain LCEL), use the retriever without a query engine:
retriever = index.as_retriever(similarity_top_k=3)
nodos = retriever.retrieve(
"¿Cuántos días de vacaciones me corresponden si llevo 3 años en la empresa?"
)
# nodos: list[NodeWithScore]
# nodos[0].text → chunk text
# nodos[0].score → similarity score
# nodos[0].metadata → metadata from the original Document
for i, nodo in enumerate(nodos):
print(f"[{i+1}] score={nodo.score:.4f} | {nodo.text[:80]}...")
| Method | Returns | When to use it |
|---|---|---|
as_query_engine().query(...) |
Response with .response (LLM text) |
Full RAG pipeline in one call |
as_retriever().retrieve(...) |
list[NodeWithScore] |
Inspect ranking, citations, or wire a custom prompt |
1.8 Chroma integration: ChromaVectorStore + StorageContext
To persist the index to disk (like the store.chroma node in template 09):
import chromadb
from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
# Chroma client — in memory or persistent
client = chromadb.PersistentClient(path="./chroma_hr_policies")
collection = client.get_or_create_collection(
name="hr_policies",
metadata={"hnsw:space": "cosine"}, # cosine metric — see M3 §8
)
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documentos,
storage_context=storage_context,
)
Separate package:
ChromaVectorStorelives inllama-index-vector-stores-chroma, not in the core. Install it explicitly.
LangChain equivalent: Chroma.from_documents(..., collection_name="hr_policies", persist_directory="./chroma_db"). The difference: LlamaIndex wraps Chroma as the index backend; LangChain treats it as an independent VectorStore.
1.9 Full mini-pipeline COMMENTED — HR case
# Requiere: pip install llama-index llama-index-embeddings-openai llama-index-llms-anthropic
# Opcional Chroma: pip install llama-index-vector-stores-chroma chromadb
# Este archivo es ILUSTRATIVO — no se ejecuta en el entorno de desarrollo sin red.
#
# Mismo pipeline que solucion_scratch.py y solucion_framework.py (LangChain),
# pero con LlamaIndex. Query de prueba al final.
import re
from pathlib import Path
from llama_index.core import Document, VectorStoreIndex, Settings, PromptTemplate
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.anthropic import Anthropic
# ---------------------------------------------------------------------------
# CONFIGURACIÓN GLOBAL (reemplaza ServiceContext — eliminado en LlamaIndex 0.11)
# ---------------------------------------------------------------------------
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.llm = Anthropic(model="claude-opus-4-8", temperature=0.2)
# ---------------------------------------------------------------------------
# BLOQUE 1 — CARGAR Y TROCEAR (≈ cargar_chunks del scratch)
# ---------------------------------------------------------------------------
ruta = Path("datos/politicas_rrhh.txt")
contenido = ruta.read_text(encoding="utf-8")
fragmentos = [p.strip() for p in re.split(r"\n---\n", contenido) if p.strip()]
# fragmentos: 8 strings — uno por política
documentos = [
Document(text=texto, metadata={"source": str(ruta), "chunk_id": i})
for i, texto in enumerate(fragmentos)
]
print(f"Total de documentos: {len(documentos)}") # Esperado: 8
# ---------------------------------------------------------------------------
# BLOQUE 2 — ÍNDICE VECTORIAL (≈ embed + store del scratch)
# ---------------------------------------------------------------------------
index = VectorStoreIndex.from_documents(documentos, show_progress=True)
# ---------------------------------------------------------------------------
# BLOQUE 3 — RETRIEVER (inspección — ≈ recuperar del scratch)
# ---------------------------------------------------------------------------
retriever = index.as_retriever(similarity_top_k=3)
query = "¿Cuántos días de vacaciones me corresponden si llevo 3 años en la empresa?"
nodos = retriever.retrieve(query)
print("\nTOP-3 NODOS RECUPERADOS:")
for i, nodo in enumerate(nodos):
print(f" [{i+1}] score={nodo.score:.4f} | {nodo.text[:80].replace(chr(10), ' ')}...")
# ---------------------------------------------------------------------------
# BLOQUE 4 — QUERY ENGINE (≈ construir_prompt + LLM del scratch)
# ---------------------------------------------------------------------------
qa_template = PromptTemplate(
"Eres el asistente de RRHH de la empresa. "
"Responde ÚNICAMENTE basándote en los fragmentos de política proporcionados. "
"Si la información no está en los fragmentos, dilo explícitamente.\n\n"
"Fragmentos relevantes:\n{context_str}\n\n"
"Pregunta del empleado: {query_str}\n\n"
"Responde en markdown con lenguaje claro y sencillo."
)
query_engine = index.as_query_engine(
similarity_top_k=3,
text_qa_template=qa_template,
)
# ---------------------------------------------------------------------------
# BLOQUE 5 — EJECUTAR
# ---------------------------------------------------------------------------
# response = query_engine.query(query)
# print("\nRespuesta del LLM:")
# print(response.response)
print("\n(requiere ANTHROPIC_API_KEY y OPENAI_API_KEY — descomenta las líneas anteriores)")
1.10 Block-by-block walkthrough
┌──────────────────────────────────────────────────────────────────┐
│ IMPORTS + Settings │
│ OpenAIEmbedding, Anthropic, Document, VectorStoreIndex │
│ Settings.embed_model / Settings.llm (NO ServiceContext) │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 1 — LOAD AND CHUNK (≈ cargar_chunks) │
│ read_text → re.split("\n---\n") → list[Document] │
│ 8 Document with metadata chunk_id │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 2 — INDEX (≈ embed + index) │
│ VectorStoreIndex.from_documents(documentos) │
│ Indexes 8 semantic vectors │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 3 — RETRIEVER (inspection) (≈ recuperar) │
│ retriever.retrieve(query) → list[NodeWithScore] │
│ Prints scores and previews │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 4 — QUERY ENGINE (≈ prompt + LLM) │
│ PromptTemplate with {context_str} and {query_str} │
│ as_query_engine(similarity_top_k=3, text_qa_template=...) │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 5 — RUN │
│ query_engine.query(query) → response.response │
│ response.source_nodes for citations │
└──────────────────────────────────────────────────────────────────┘
1.11 When to choose LlamaIndex vs LangChain
Choose LlamaIndex when:
- The project is primarily RAG (queries over documents, indexes, query engines).
- You want
query_engine.query()in one line without wiring LCEL. - You need specialized indexes (tree, list, index composition) or LlamaHub readers.
- Your team has already standardized on LlamaIndex and does not use LangGraph.
Choose LangChain (or stay with it) when:
- You need LangGraph for stateful agents, checkpoints, and HITL (M6–M7).
- Your stack is RAGorbit / codegen that already generates LCEL.
- You want LangSmith for native tracing.
- You mix RAG with many tools, LCEL structured output, and heterogeneous pipelines.
Avoid mixing both in the same pipeline without a clear reason — you duplicate abstractions (LangChain Document ≠ LlamaIndex Document) and complicate debugging.
LlamaIndex gotchas:
| Gotcha | What happens | Solution |
|---|---|---|
ServiceContext in old tutorials |
ImportError or migration error |
Use Settings (since 0.10; removed in 0.11) |
Document(page_content=...) |
Wrong attribute | LlamaIndex uses text=, not page_content |
| Separate integration packages | ModuleNotFoundError for Chroma, Anthropic, etc. |
pip install llama-index-vector-stores-chroma llama-index-llms-anthropic |
response.response vs str(response) |
Confusion with the Response type |
Use .response for text; .source_nodes for chunks |
| Default prompt in English | Responses in English if you do not customize | Pass a Spanish text_qa_template (as in block 4) |
2. No framework — provider native SDK + Chroma
2.1 The direct answer to "do you need a framework?"
No. The RAG pattern is vector arithmetic + an HTTP call. Frameworks do not add magic to retrieval — they add convention, composition, and less repeated code.
This approach uses only:
| Piece | Library | Role |
|---|---|---|
| Load and chunk | stdlib (pathlib, re) |
Same as scratch, but with real embeddings |
| Vector store | chromadb |
Persistence + cosine search (M3 §8) |
| Embeddings | sentence-transformers or provider API |
Dense semantic vectors (M3 §15) |
| LLM | anthropic or openai SDK |
Direct call, no intermediate layer |
| Prompt | Manual f-string | Same as scratch construir_prompt() |
FRAMEWORK (LangChain/LlamaIndex) NATIVE SDK (this §2)
────────────────────────────── ──────────────────────────────────
Document, Embeddings, Retriever chromadb.Collection + query()
Chain / query_engine sequential responder() function
5-8 subpackage imports 3-4 libraries with stable APIs
2.2 Bridge table: scratch → native SDK
| What you did by hand (layer ②) | Native SDK piece | Library |
|---|---|---|
cargar_chunks(ruta) |
read_text() + re.split(r"\n---\n", ...) |
stdlib |
embed(texto) |
modelo.encode(texto, normalize_embeddings=True) |
sentence-transformers |
In-memory store dict |
collection.upsert(ids, documents, embeddings, metadatas) |
chromadb |
similitud_coseno() + sort |
collection.query(query_embeddings=..., n_results=3) |
chromadb |
recuperar() |
resultados["documents"][0] + resultados["distances"][0] |
chromadb |
construir_prompt() |
f-string with numbered chunks | stdlib |
| LLM stub | client.messages.create(model=..., messages=[...]) |
anthropic |
2.3 Key APIs, one by one
ChromaDB — your store dict but with an index
import chromadb
client = chromadb.PersistentClient(path="./chroma_hr")
collection = client.get_or_create_collection(
name="hr_policies",
metadata={"hnsw:space": "cosine"},
)
sentence-transformers — your embed() but semantic
from sentence_transformers import SentenceTransformer
modelo = SentenceTransformer("BAAI/bge-base-en-v1.5")
vec = modelo.encode("¿Cuántos días de vacaciones?", normalize_embeddings=True)
# vec: ndarray of 768 floats — not a bag-of-words dict
Index (offline phase)
ids = [f"chunk_{i}" for i in range(len(fragmentos))]
embeddings = modelo.encode(fragmentos, normalize_embeddings=True).tolist()
collection.upsert(
ids=ids,
documents=fragmentos,
embeddings=embeddings,
metadatas=[{"chunk_id": i} for i in range(len(fragmentos))],
)
Retrieve (online phase)
query = "¿Cuántos días de vacaciones me corresponden si llevo 3 años en la empresa?"
query_vec = modelo.encode([query], normalize_embeddings=True).tolist()
resultados = collection.query(
query_embeddings=query_vec,
n_results=3,
include=["documents", "distances", "metadatas"],
)
# resultados["documents"][0] → list[str] top-3
# resultados["distances"][0] → distances (lower = more similar with cosine)
Anthropic SDK — direct generation
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from the environment
mensaje = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
temperature=0.2,
system="Eres el asistente de RRHH. Responde SOLO con los fragmentos dados.",
messages=[{"role": "user", "content": prompt_aumentado}],
)
respuesta = mensaje.content[0].text
OpenAI alternative:
openai.OpenAI().chat.completions.create(model="gpt-4o-mini", messages=[...]). The pattern is identical; only the client changes.
2.4 Full mini-pipeline COMMENTED — HR case
# Requiere: pip install chromadb sentence-transformers anthropic
# Este archivo es ILUSTRATIVO — no se ejecuta en el entorno de desarrollo sin red.
#
# RAG sin lang*: stdlib + chromadb + sentence-transformers + anthropic SDK.
# Mismo caso RRHH, misma query que solucion_scratch.py.
import re
from pathlib import Path
import anthropic
import chromadb
from sentence_transformers import SentenceTransformer
# ---------------------------------------------------------------------------
# BLOQUE 1 — CARGAR Y TROCEAR (stdlib — idéntico al scratch)
# ---------------------------------------------------------------------------
def cargar_chunks(ruta: str) -> list[str]:
contenido = Path(ruta).read_text(encoding="utf-8")
partes = re.split(r"\n---\n", contenido)
return [p.strip() for p in partes if p.strip()]
RUTA_DATOS = "datos/politicas_rrhh.txt"
fragmentos = cargar_chunks(RUTA_DATOS)
print(f"Total de chunks: {len(fragmentos)}") # Esperado: 8
# ---------------------------------------------------------------------------
# BLOQUE 2 — EMBEDDINGS + CHROMA (≈ embed + store del scratch)
# ---------------------------------------------------------------------------
modelo = SentenceTransformer("BAAI/bge-base-en-v1.5")
client = chromadb.PersistentClient(path="./chroma_hr_native")
collection = client.get_or_create_collection(
name="hr_policies",
metadata={"hnsw:space": "cosine"},
)
ids = [f"chunk_{i}" for i in range(len(fragmentos))]
embeddings = modelo.encode(fragmentos, normalize_embeddings=True).tolist()
collection.upsert(
ids=ids,
documents=fragmentos,
embeddings=embeddings,
metadatas=[{"chunk_id": i, "source": RUTA_DATOS} for i in range(len(fragmentos))],
)
# ---------------------------------------------------------------------------
# BLOQUE 3 — RECUPERAR TOP-3 (≈ recuperar del scratch)
# ---------------------------------------------------------------------------
def recuperar(query: str, k: int = 3) -> list[tuple[float, str]]:
query_vec = modelo.encode([query], normalize_embeddings=True).tolist()
resultados = collection.query(
query_embeddings=query_vec,
n_results=k,
include=["documents", "distances"],
)
docs = resultados["documents"][0]
dists = resultados["distances"][0]
return list(zip(dists, docs))
# ---------------------------------------------------------------------------
# BLOQUE 4 — PROMPT AUMENTADO (≈ construir_prompt del scratch)
# ---------------------------------------------------------------------------
def construir_prompt(query: str, resultados: list[tuple[float, str]]) -> str:
lineas = [f"[{i+1}] {texto}" for i, (_, texto) in enumerate(resultados)]
contexto = "\n\n".join(lineas)
return (
"Eres el asistente de RRHH de la empresa. "
"Responde ÚNICAMENTE basándote en los fragmentos de política proporcionados. "
"Si la información no está en los fragmentos, dilo explícitamente.\n\n"
f"Fragmentos relevantes:\n{contexto}\n\n"
f"Pregunta del empleado: {query}\n\n"
"Responde en markdown con lenguaje claro y sencillo."
)
# ---------------------------------------------------------------------------
# BLOQUE 5 — LLM + ORQUESTACIÓN (≈ main del scratch, con LLM real)
# ---------------------------------------------------------------------------
def responder(query: str, k: int = 3) -> str:
resultados = recuperar(query, k=k)
prompt = construir_prompt(query, resultados)
client = anthropic.Anthropic()
mensaje = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
temperature=0.2,
messages=[{"role": "user", "content": prompt}],
)
return mensaje.content[0].text
# ---------------------------------------------------------------------------
# BLOQUE 6 — EJECUTAR
# ---------------------------------------------------------------------------
QUERY = "¿Cuántos días de vacaciones me corresponden si llevo 3 años en la empresa?"
resultados = recuperar(QUERY, k=3)
print("\nTOP-3 CHUNKS RECUPERADOS:")
for i, (dist, texto) in enumerate(resultados, start=1):
print(f" [{i}] distancia={dist:.4f} | {texto[:80].replace(chr(10), ' ')}...")
print("\nPROMPT AUMENTADO:")
print(construir_prompt(QUERY, resultados))
# respuesta = responder(QUERY)
# print("\nRespuesta del LLM:")
# print(respuesta)
print("\n(requiere ANTHROPIC_API_KEY — descomenta las líneas anteriores)")
2.5 Block-by-block walkthrough
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 1 — LOAD AND CHUNK (stdlib) │
│ cargar_chunks() — re.split("\n---\n") → 8 fragments │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 2 — EMBEDDINGS + CHROMA │
│ SentenceTransformer.encode() → collection.upsert() │
│ Persistence in ./chroma_hr_native │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 3 — RETRIEVE │
│ encode(query) → collection.query(n_results=3) │
│ Returns (distance, text) — inspectable │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 4 — AUGMENTED PROMPT │
│ construir_prompt() — f-string with numbered chunks │
│ Same format as scratch and LangChain │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 5 — LLM (native SDK) │
│ anthropic.Anthropic().messages.create(...) │
│ No LangChain, no LlamaIndex │
└──────────────────────────────────────────────────────────────────┘
2.6 When the native SDK is the most sensible choice
Use it when:
- Small or medium project (one RAG microservice, an internal script).
- You want maximum control of latency, cost, and each HTTP call.
- You need minimal dependencies (security audit, slim container).
- The team does not want to learn framework abstractions — only Python + APIs.
- Privacy: local embeddings (
sentence-transformers) + Chroma on-premise + Anthropic/OpenAI only for generation.
Avoid it when:
- The pipeline grows to hybrid retriever + reranker + structured output + agent — you will reimplement what LangChain/LangGraph already composes (M4–M6).
- You need tracing, evaluation, and frequent provider swapping without touching every call.
- Multiple teams must read the same code — frameworks provide shared convention.
Native SDK gotchas:
| Gotcha | What happens | Solution |
|---|---|---|
| Re-index when changing embedding model | Vectors incompatible between models | Same model at ingest and query; if you change, collection.delete() and re-upsert |
Chroma without normalize_embeddings |
Ranking biased toward long texts | Always normalize_embeddings=True in .encode() |
| Prompt without separate system | Instructions mixed with context | Use Anthropic system= parameter or system message in OpenAI |
collection.query returns nested lists |
documents[0] is the results list |
First index = the query (only one here) |
3. Haystack (deepset) — component pipelines
3.1 What Haystack is
Haystack (by deepset) is an open source framework oriented toward production pipelines for NLP and RAG. Its mental model is a directed acyclic graph (DAG) of components with typed inputs and outputs.
Haystack 2.0 (2024) rewrote the framework from scratch. If you see Haystack 1.x code (Pipeline.add_node, ElasticsearchDocumentStore), it is from another generation — do not mix it with 2.x.
HAYSTACK 2.x — mental model
────────────────────────────
Pipeline
├── add_component("retriever", InMemoryEmbeddingRetriever(...))
├── add_component("prompt_builder", PromptBuilder(template=...))
├── add_component("llm", OpenAIGenerator(...))
├── connect("retriever.documents", "prompt_builder.documents")
└── connect("prompt_builder", "llm")
pipeline.run({...}) → each component receives typed inputs and produces typed outputs
Difference vs LangChain and LlamaIndex:
| Aspect | LangChain | LlamaIndex | Haystack 2.x |
|---|---|---|---|
| Composition | LCEL | |
integrated query_engine |
explicit Pipeline + connect |
| Visualization | LangSmith | Notebooks / logs | Pipelines serializable to YAML |
| Evaluation | External (RAGAS, etc.) | External | Native integration with eval frameworks |
| Focus | General + agents | Indexes / query | Declarative production pipelines |
3.2 Bridge table: scratch → Haystack
| What you did by hand (layer ②) | Haystack 2.x piece | Notes |
|---|---|---|
cargar_chunks(ruta) |
Document(content=...) + manual split |
Haystack Document uses content, not page_content |
embed(texto) |
SentenceTransformersDocumentEmbedder + SentenceTransformersTextEmbedder |
Separate embedder for docs (offline) and query (online) |
| In-memory store | InMemoryDocumentStore |
Also ChromaDocumentStore via integration |
recuperar() |
InMemoryEmbeddingRetriever |
Connected to the document store with embeddings |
construir_prompt() |
PromptBuilder(template=...) — Jinja2 template |
Variables documents, query |
| LLM stub | OpenAIGenerator or AnthropicChatGenerator |
Generators for completion; ChatGenerators for chat models |
main() |
pipeline.run({...}) |
One dict with inputs per component |
3.3 Key APIs, one by one
Document (Haystack)
from haystack import Document
doc = Document(
content="POLÍTICA DE VACACIONES §3 — Acumulación y disfrute\n...",
meta={"source": "politicas_rrhh.txt", "chunk_id": 0},
)
Pipeline + add_component + connect
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
pipeline = Pipeline()
pipeline.add_component("retriever", retriever)
pipeline.add_component("prompt_builder", PromptBuilder(template=mi_plantilla))
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))
# Explicit connection: retriever output → prompt builder input
pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder", "llm")
add_component(name, instance)— registers a node in the graph.connect(sender, receiver)— wires output → input. The name"retriever.documents"specifies which output of the sender you connect.pipeline.run(data)— runs the graph.datais a dict with inputs per component.
PromptBuilder — Jinja2 template
from haystack.components.builders import PromptBuilder
plantilla = """
Eres el asistente de RRHH. Responde SOLO con los fragmentos dados.
Fragmentos relevantes:
{% for doc in documents %}
[{{ loop.index }}] {{ doc.content }}
{% endfor %}
Pregunta del empleado: {{ query }}
Responde en markdown con lenguaje claro y sencillo.
"""
prompt_builder = PromptBuilder(template=plantilla)
Haystack 2.x also offers
ChatPromptBuilderfor chat models with system/user messages. For this workshop we usePromptBuilder+OpenAIGeneratorbecause it is the most direct pair to map scratchconstruir_prompt(). In production with Claude/GPT-4o, many teams migrate toChatPromptBuilder+AnthropicChatGenerator.
3.4 Full mini-pipeline COMMENTED — HR case
# Requiere: pip install haystack-ai sentence-transformers
# Este archivo es ILUSTRATIVO — no se ejecuta en el entorno de desarrollo sin red.
#
# RAG con Haystack 2.x — mismo caso RRHH que solucion_scratch.py.
# Pipeline: indexación offline → Retriever + PromptBuilder + Generator.
import re
from pathlib import Path
from haystack import Pipeline, Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.embedders import (
SentenceTransformersDocumentEmbedder,
SentenceTransformersTextEmbedder,
)
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.utils import Secret
# ---------------------------------------------------------------------------
# BLOQUE 1 — CARGAR Y TROCEAR (≈ cargar_chunks del scratch)
# ---------------------------------------------------------------------------
ruta = Path("datos/politicas_rrhh.txt")
contenido = ruta.read_text(encoding="utf-8")
fragmentos = [p.strip() for p in re.split(r"\n---\n", contenido) if p.strip()]
documentos = [
Document(content=texto, meta={"source": str(ruta), "chunk_id": i})
for i, texto in enumerate(fragmentos)
]
print(f"Total de documentos: {len(documentos)}") # Esperado: 8
# ---------------------------------------------------------------------------
# BLOQUE 2 — DOCUMENT STORE + EMBEDDINGS (≈ embed + store del scratch)
# ---------------------------------------------------------------------------
document_store = InMemoryDocumentStore()
doc_embedder = SentenceTransformersDocumentEmbedder(
model="BAAI/bge-base-en-v1.5",
)
doc_embedder.warm_up()
# Embedder calcula vectores y los adjunta a los Document
docs_con_embeddings = doc_embedder.run(documents=documentos)["documents"]
document_store.write_documents(docs_con_embeddings)
# ---------------------------------------------------------------------------
# BLOQUE 3 — COMPONENTES DEL PIPELINE RAG
# ---------------------------------------------------------------------------
text_embedder = SentenceTransformersTextEmbedder(
model="BAAI/bge-base-en-v1.5",
)
text_embedder.warm_up()
retriever = InMemoryEmbeddingRetriever(document_store=document_store, top_k=3)
plantilla = """
Eres el asistente de RRHH de la empresa. Responde ÚNICAMENTE basándote en los fragmentos de política proporcionados. Si la información no está en los fragmentos, dilo explícitamente.
Fragmentos relevantes:
{% for doc in documents %}
[{{ loop.index }}] {{ doc.content }}
{% endfor %}
Pregunta del empleado: {{ query }}
Responde en markdown con lenguaje claro y sencillo.
"""
prompt_builder = PromptBuilder(template=plantilla)
llm = OpenAIGenerator(
api_key=Secret.from_env_var("OPENAI_API_KEY"),
model="gpt-4o-mini",
generation_kwargs={"temperature": 0.2},
)
# ---------------------------------------------------------------------------
# BLOQUE 4 — ENSAMBLAR PIPELINE (Retriever → PromptBuilder → Generator)
# ---------------------------------------------------------------------------
rag_pipeline = Pipeline()
rag_pipeline.add_component("text_embedder", text_embedder)
rag_pipeline.add_component("retriever", retriever)
rag_pipeline.add_component("prompt_builder", prompt_builder)
rag_pipeline.add_component("llm", llm)
rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm")
# ---------------------------------------------------------------------------
# BLOQUE 5 — EJECUTAR
# ---------------------------------------------------------------------------
QUERY = "¿Cuántos días de vacaciones me corresponden si llevo 3 años en la empresa?"
# Solo recuperación (inspección — ≈ recuperar del scratch):
embedding_result = text_embedder.run(text=QUERY)
docs_recuperados = retriever.run(
query_embedding=embedding_result["embedding"],
)["documents"]
print("\nTOP-3 DOCUMENTOS RECUPERADOS:")
for i, doc in enumerate(docs_recuperados):
print(f" [{i+1}] {doc.content[:80].replace(chr(10), ' ')}...")
# Pipeline completo (descomentar con OPENAI_API_KEY):
# result = rag_pipeline.run({
# "text_embedder": {"text": QUERY},
# "prompt_builder": {"query": QUERY},
# })
# print("\nRespuesta del LLM:")
# print(result["llm"]["replies"][0])
print("\n(requiere OPENAI_API_KEY — descomenta las líneas anteriores)")
3.5 Block-by-block walkthrough
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 1 — LOAD AND CHUNK │
│ read_text → re.split → list[Document(content=..., meta=...)] │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 2 — OFFLINE INDEXING │
│ SentenceTransformersDocumentEmbedder.run(documents) │
│ document_store.write_documents(docs_con_embeddings) │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 3 — COMPONENTS │
│ TextEmbedder (query) · Retriever (top_k=3) │
│ PromptBuilder (Jinja2) · OpenAIGenerator │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 4 — PIPELINE │
│ text_embedder → retriever → prompt_builder → llm │
│ explicit connect() between outputs and inputs │
└──────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ BLOCK 5 — RUN │
│ pipeline.run({"text_embedder": {"text": query}, ...}) │
│ result["llm"]["replies"][0] → final answer │
└──────────────────────────────────────────────────────────────────┘
3.6 When Haystack fits
Use it when:
- You want declarative pipelines serializable to YAML (versioning, CI/CD, deployment).
- The team values components testable in isolation (unit test the retriever without the LLM).
- You need integrated evaluation and a production culture (deepset has years in industrial NLP).
- You build RAG without the LangChain ecosystem — independence similar to native SDK but with structure.
Avoid it when:
- You are already on LangGraph with complex agents — migrating the orchestrator adds little.
- You need to prototype in 10 minutes — Haystack has more boilerplate than LlamaIndex
query_engine. - Your organization standardized on RAGorbit/LangChain — Haystack would be a second framework without reason.
Haystack gotchas:
| Gotcha | What happens | Solution |
|---|---|---|
| Haystack 1.x code on the internet | Incompatible APIs (add_node vs add_component) |
Verify it is Haystack 2.x (haystack-ai on pip) |
Forgetting warm_up() on embedders |
Error on first run | Call .warm_up() after creating embedders |
Ambiguous connect |
Pipeline does not wire documents to prompt | Explicitly connect "retriever.documents" → "prompt_builder.documents" |
PromptBuilder vs ChatPromptBuilder |
Wrong format for chat models | Use ChatPromptBuilder + AnthropicChatGenerator for Claude |
| Two embedders (doc + text) | Confusion about which to use when | Doc embedder = offline (index); Text embedder = online (query) |
4. Final comparison table
4.1 LangChain vs LlamaIndex vs Haystack vs native SDK (for RAG)
| Criterion | LangChain | LlamaIndex | Haystack 2.x | Native SDK + Chroma |
|---|---|---|---|---|
| Abstraction | Medium-high (LCEL, Runnables) | High (indexes, query engines) | High (typed Pipeline DAG) | Minimal (your functions) |
| Learning curve | Medium — many subpackages | Medium — index/query engine concept | Medium-high — components + connect | Low — if you already did scratch |
| Fine control | Medium — hidden layers in Runnables | Medium — query engine integrates steps | High — each component is explicit | Maximum |
| Lines for minimal HR RAG | ~50 (see solucion_framework.py) |
~45 with query_engine |
~70 (indexing + pipeline) | ~80 (but no framework) |
| Best for | RAGorbit ecosystem, LangGraph, LCEL, multi-tool | Pure RAG projects, indexes, query engines | Declarative production, YAML, integrated eval | Microservices, control, minimal deps |
| Avoid if | You only need a simple query_engine | You need LangGraph or advanced LCEL | Fast prototype or LangChain stack | Pipeline grows to hybrid + agent + HITL |
| Provider swapping | One line (ChatOpenAI → ChatAnthropic) |
Settings.llm = ... |
Change Generator component | Rewrite HTTP call |
| Chroma persistence | Chroma.from_documents(persist_directory=...) |
ChromaVectorStore + StorageContext |
ChromaDocumentStore (integration) |
chromadb.PersistentClient direct |
| Tracing / observability | Native LangSmith | Callbacks / external | Native eval integration | You implement (logs, OTel) |
| Dependencies | Many (langchain-*) |
Many (llama-index-*) |
Moderate (haystack-ai) |
Few (chromadb, ST, SDK) |
4.2 Decision rule
Do you start from scratch and the course / RAGorbit already uses LangChain?
YES → LangChain (M1 §11) — consistency with codegen and M6+ LangGraph
NO ↓
Is the project ONLY RAG over documents, without complex agents?
YES → Do you want minimal code?
YES → LlamaIndex (query_engine in few lines)
NO → Do you want YAML pipelines and production culture?
YES → Haystack 2.x
NO → Native SDK + Chroma
NO ↓
Do you need stateful agents, HITL, fan-out?
YES → LangGraph (M6–M7) — no alternative in this guide replaces it equally
NO → Reevaluate with the row above
Course golden rule: master one orchestration tool in depth (LangChain in the syllabus) and know the others to choose, not to mix them all in one project.
4.3 Mental map: four paths to the same destination
┌─────────────────────────────────────┐
│ politicas_rrhh.txt (8 fragments) │
└──────────────────┬──────────────────┘
│
┌───────────┬───────────────┼───────────────┬───────────────┐
▼ ▼ ▼ ▼ │
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│LangChain│ │LlamaIdx │ │ Haystack│ │Native SDK│ │
│ LCEL │ │query_eng│ │ Pipeline│ │ functions│ │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
└───────────┴───────────────┴───────────────┘ │
│ │
▼ │
┌─────────────────────────────────────┐ │
│ top-3 chunks on "vacaciones 3 │ │
│ años" → augmented prompt → LLM │ │
└──────────────────┬──────────────────┘ │
▼ │
"Tienes derecho a 18 días hábiles..." ◀───────────┘
Cross-links
- LangChain from scratch (course foundation): M1 §11 — Layer ③ explained
- ChromaDB, FAISS, and sentence-transformers: M3 §15 — Layer ③ explained
- Hybrid retrievers, rerank, and hard filter: M4 — Retrieval and query
- Decision tables (models, stores, frameworks):
tecnologias-comparadas.md— especially §1–§3 (models/embeddings/stores) and §5 (ingest)- RAGorbit node cards:
catalogo-nodos.md- Scratch workshop (layer ②):
01-fundamentos/lab/solucion_scratch.py- LangChain workshop (layer ③):
01-fundamentos/lab/solucion_framework.py- HR template:
examples/09-hr-policy-assistant/- Layer convention and environment:
HANDOFF.md§3 and §5
RAGorbit course reference document. Read it after M1 §11 and the workshop layer ②; use it when you need to build the same RAG without depending exclusively on LangChain.