How Rag Implementation fits into a Paperclip company.

Rag Implementation drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md542 linesmarkdown
Expand
1---2name: rag-implementation3description: Build Retrieval-Augmented Generation (RAG) systems for LLM applications with vector databases and semantic search. Use when implementing knowledge-grounded AI, building document Q&A systems, or integrating LLMs with external knowledge bases.4---5 6# RAG Implementation7 8Master Retrieval-Augmented Generation (RAG) to build LLM applications that provide accurate, grounded responses using external knowledge sources.9 10## When to Use This Skill11 12- Building Q&A systems over proprietary documents13- Creating chatbots with current, factual information14- Implementing semantic search with natural language queries15- Reducing hallucinations with grounded responses16- Enabling LLMs to access domain-specific knowledge17- Building documentation assistants18- Creating research tools with source citation19 20## Core Components21 22### 1. Vector Databases23 24**Purpose**: Store and retrieve document embeddings efficiently25 26**Options:**27 28- **Pinecone**: Managed, scalable, serverless29- **Weaviate**: Open-source, hybrid search, GraphQL30- **Milvus**: High performance, on-premise31- **Chroma**: Lightweight, easy to use, local development32- **Qdrant**: Fast, filtered search, Rust-based33- **pgvector**: PostgreSQL extension, SQL integration34 35### 2. Embeddings36 37**Purpose**: Convert text to numerical vectors for similarity search38 39**Models (2026):**40| Model | Dimensions | Best For |41|-------|------------|----------|42| **voyage-3-large** | 1024 | Claude apps (Anthropic recommended) |43| **voyage-code-3** | 1024 | Code search |44| **text-embedding-3-large** | 3072 | OpenAI apps, high accuracy |45| **text-embedding-3-small** | 1536 | OpenAI apps, cost-effective |46| **bge-large-en-v1.5** | 1024 | Open source, local deployment |47| **multilingual-e5-large** | 1024 | Multi-language support |48 49### 3. Retrieval Strategies50 51**Approaches:**52 53- **Dense Retrieval**: Semantic similarity via embeddings54- **Sparse Retrieval**: Keyword matching (BM25, TF-IDF)55- **Hybrid Search**: Combine dense + sparse with weighted fusion56- **Multi-Query**: Generate multiple query variations57- **HyDE**: Generate hypothetical documents for better retrieval58 59### 4. Reranking60 61**Purpose**: Improve retrieval quality by reordering results62 63**Methods:**64 65- **Cross-Encoders**: BERT-based reranking (ms-marco-MiniLM)66- **Cohere Rerank**: API-based reranking67- **Maximal Marginal Relevance (MMR)**: Diversity + relevance68- **LLM-based**: Use LLM to score relevance69 70## Quick Start with LangGraph71 72```python73from langgraph.graph import StateGraph, START, END74from langchain_anthropic import ChatAnthropic75from langchain_voyageai import VoyageAIEmbeddings76from langchain_pinecone import PineconeVectorStore77from langchain_core.documents import Document78from langchain_core.prompts import ChatPromptTemplate79from langchain_text_splitters import RecursiveCharacterTextSplitter80from typing import TypedDict, Annotated81 82class RAGState(TypedDict):83    question: str84    context: list[Document]85    answer: str86 87# Initialize components88llm = ChatAnthropic(model="claude-sonnet-4-6")89embeddings = VoyageAIEmbeddings(model="voyage-3-large")90vectorstore = PineconeVectorStore(index_name="docs", embedding=embeddings)91retriever = vectorstore.as_retriever(search_kwargs={"k": 4})92 93# RAG prompt94rag_prompt = ChatPromptTemplate.from_template(95    """Answer based on the context below. If you cannot answer, say so.96 97    Context:98    {context}99 100    Question: {question}101 102    Answer:"""103)104 105async def retrieve(state: RAGState) -> RAGState:106    """Retrieve relevant documents."""107    docs = await retriever.ainvoke(state["question"])108    return {"context": docs}109 110async def generate(state: RAGState) -> RAGState:111    """Generate answer from context."""112    context_text = "\n\n".join(doc.page_content for doc in state["context"])113    messages = rag_prompt.format_messages(114        context=context_text,115        question=state["question"]116    )117    response = await llm.ainvoke(messages)118    return {"answer": response.content}119 120# Build RAG graph121builder = StateGraph(RAGState)122builder.add_node("retrieve", retrieve)123builder.add_node("generate", generate)124builder.add_edge(START, "retrieve")125builder.add_edge("retrieve", "generate")126builder.add_edge("generate", END)127 128rag_chain = builder.compile()129 130# Use131result = await rag_chain.ainvoke({"question": "What are the main features?"})132print(result["answer"])133```134 135## Advanced RAG Patterns136 137### Pattern 1: Hybrid Search with RRF138 139```python140from langchain_community.retrievers import BM25Retriever141from langchain.retrievers import EnsembleRetriever142 143# Sparse retriever (BM25 for keyword matching)144bm25_retriever = BM25Retriever.from_documents(documents)145bm25_retriever.k = 10146 147# Dense retriever (embeddings for semantic search)148dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})149 150# Combine with Reciprocal Rank Fusion weights151ensemble_retriever = EnsembleRetriever(152    retrievers=[bm25_retriever, dense_retriever],153    weights=[0.3, 0.7]  # 30% keyword, 70% semantic154)155```156 157### Pattern 2: Multi-Query Retrieval158 159```python160from langchain.retrievers.multi_query import MultiQueryRetriever161 162# Generate multiple query perspectives for better recall163multi_query_retriever = MultiQueryRetriever.from_llm(164    retriever=vectorstore.as_retriever(search_kwargs={"k": 5}),165    llm=llm166)167 168# Single query → multiple variations → combined results169results = await multi_query_retriever.ainvoke("What is the main topic?")170```171 172### Pattern 3: Contextual Compression173 174```python175from langchain.retrievers import ContextualCompressionRetriever176from langchain.retrievers.document_compressors import LLMChainExtractor177 178# Compressor extracts only relevant portions179compressor = LLMChainExtractor.from_llm(llm)180 181compression_retriever = ContextualCompressionRetriever(182    base_compressor=compressor,183    base_retriever=vectorstore.as_retriever(search_kwargs={"k": 10})184)185 186# Returns only relevant parts of documents187compressed_docs = await compression_retriever.ainvoke("specific query")188```189 190### Pattern 4: Parent Document Retriever191 192```python193from langchain.retrievers import ParentDocumentRetriever194from langchain.storage import InMemoryStore195from langchain_text_splitters import RecursiveCharacterTextSplitter196 197# Small chunks for precise retrieval, large chunks for context198child_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)199parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)200 201# Store for parent documents202docstore = InMemoryStore()203 204parent_retriever = ParentDocumentRetriever(205    vectorstore=vectorstore,206    docstore=docstore,207    child_splitter=child_splitter,208    parent_splitter=parent_splitter209)210 211# Add documents (splits children, stores parents)212await parent_retriever.aadd_documents(documents)213 214# Retrieval returns parent documents with full context215results = await parent_retriever.ainvoke("query")216```217 218### Pattern 5: HyDE (Hypothetical Document Embeddings)219 220```python221from langchain_core.prompts import ChatPromptTemplate222 223class HyDEState(TypedDict):224    question: str225    hypothetical_doc: str226    context: list[Document]227    answer: str228 229hyde_prompt = ChatPromptTemplate.from_template(230    """Write a detailed passage that would answer this question:231 232    Question: {question}233 234    Passage:"""235)236 237async def generate_hypothetical(state: HyDEState) -> HyDEState:238    """Generate hypothetical document for better retrieval."""239    messages = hyde_prompt.format_messages(question=state["question"])240    response = await llm.ainvoke(messages)241    return {"hypothetical_doc": response.content}242 243async def retrieve_with_hyde(state: HyDEState) -> HyDEState:244    """Retrieve using hypothetical document."""245    # Use hypothetical doc for retrieval instead of original query246    docs = await retriever.ainvoke(state["hypothetical_doc"])247    return {"context": docs}248 249# Build HyDE RAG graph250builder = StateGraph(HyDEState)251builder.add_node("hypothetical", generate_hypothetical)252builder.add_node("retrieve", retrieve_with_hyde)253builder.add_node("generate", generate)254builder.add_edge(START, "hypothetical")255builder.add_edge("hypothetical", "retrieve")256builder.add_edge("retrieve", "generate")257builder.add_edge("generate", END)258 259hyde_rag = builder.compile()260```261 262## Document Chunking Strategies263 264### Recursive Character Text Splitter265 266```python267from langchain_text_splitters import RecursiveCharacterTextSplitter268 269splitter = RecursiveCharacterTextSplitter(270    chunk_size=1000,271    chunk_overlap=200,272    length_function=len,273    separators=["\n\n", "\n", ". ", " ", ""]  # Try in order274)275 276chunks = splitter.split_documents(documents)277```278 279### Token-Based Splitting280 281```python282from langchain_text_splitters import TokenTextSplitter283 284splitter = TokenTextSplitter(285    chunk_size=512,286    chunk_overlap=50,287    encoding_name="cl100k_base"  # OpenAI tiktoken encoding288)289```290 291### Semantic Chunking292 293```python294from langchain_experimental.text_splitter import SemanticChunker295 296splitter = SemanticChunker(297    embeddings=embeddings,298    breakpoint_threshold_type="percentile",299    breakpoint_threshold_amount=95300)301```302 303### Markdown Header Splitter304 305```python306from langchain_text_splitters import MarkdownHeaderTextSplitter307 308headers_to_split_on = [309    ("#", "Header 1"),310    ("##", "Header 2"),311    ("###", "Header 3"),312]313 314splitter = MarkdownHeaderTextSplitter(315    headers_to_split_on=headers_to_split_on,316    strip_headers=False317)318```319 320## Vector Store Configurations321 322### Pinecone (Serverless)323 324```python325from pinecone import Pinecone, ServerlessSpec326from langchain_pinecone import PineconeVectorStore327 328# Initialize Pinecone client329pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])330 331# Create index if needed332if "my-index" not in pc.list_indexes().names():333    pc.create_index(334        name="my-index",335        dimension=1024,  # voyage-3-large dimensions336        metric="cosine",337        spec=ServerlessSpec(cloud="aws", region="us-east-1")338    )339 340# Create vector store341index = pc.Index("my-index")342vectorstore = PineconeVectorStore(index=index, embedding=embeddings)343```344 345### Weaviate346 347```python348import weaviate349from langchain_weaviate import WeaviateVectorStore350 351client = weaviate.connect_to_local()  # or connect_to_weaviate_cloud()352 353vectorstore = WeaviateVectorStore(354    client=client,355    index_name="Documents",356    text_key="content",357    embedding=embeddings358)359```360 361### Chroma (Local Development)362 363```python364from langchain_chroma import Chroma365 366vectorstore = Chroma(367    collection_name="my_collection",368    embedding_function=embeddings,369    persist_directory="./chroma_db"370)371```372 373### pgvector (PostgreSQL)374 375```python376from langchain_postgres.vectorstores import PGVector377 378connection_string = "postgresql+psycopg://user:pass@localhost:5432/vectordb"379 380vectorstore = PGVector(381    embeddings=embeddings,382    collection_name="documents",383    connection=connection_string,384)385```386 387## Retrieval Optimization388 389### 1. Metadata Filtering390 391```python392from langchain_core.documents import Document393 394# Add metadata during indexing395docs_with_metadata = []396for doc in documents:397    doc.metadata.update({398        "source": doc.metadata.get("source", "unknown"),399        "category": determine_category(doc.page_content),400        "date": datetime.now().isoformat()401    })402    docs_with_metadata.append(doc)403 404# Filter during retrieval405results = await vectorstore.asimilarity_search(406    "query",407    filter={"category": "technical"},408    k=5409)410```411 412### 2. Maximal Marginal Relevance (MMR)413 414```python415# Balance relevance with diversity416results = await vectorstore.amax_marginal_relevance_search(417    "query",418    k=5,419    fetch_k=20,  # Fetch 20, return top 5 diverse420    lambda_mult=0.5  # 0=max diversity, 1=max relevance421)422```423 424### 3. Reranking with Cross-Encoder425 426```python427from sentence_transformers import CrossEncoder428 429reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')430 431async def retrieve_and_rerank(query: str, k: int = 5) -> list[Document]:432    # Get initial results433    candidates = await vectorstore.asimilarity_search(query, k=20)434 435    # Rerank436    pairs = [[query, doc.page_content] for doc in candidates]437    scores = reranker.predict(pairs)438 439    # Sort by score and take top k440    ranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)441    return [doc for doc, score in ranked[:k]]442```443 444### 4. Cohere Rerank445 446```python447from langchain.retrievers import CohereRerank448from langchain_cohere import CohereRerank449 450reranker = CohereRerank(model="rerank-english-v3.0", top_n=5)451 452# Wrap retriever with reranking453reranked_retriever = ContextualCompressionRetriever(454    base_compressor=reranker,455    base_retriever=vectorstore.as_retriever(search_kwargs={"k": 20})456)457```458 459## Prompt Engineering for RAG460 461### Contextual Prompt with Citations462 463```python464rag_prompt = ChatPromptTemplate.from_template(465    """Answer the question based on the context below. Include citations using [1], [2], etc.466 467    If you cannot answer based on the context, say "I don't have enough information."468 469    Context:470    {context}471 472    Question: {question}473 474    Instructions:475    1. Use only information from the context476    2. Cite sources with [1], [2] format477    3. If uncertain, express uncertainty478 479    Answer (with citations):"""480)481```482 483### Structured Output for RAG484 485```python486from pydantic import BaseModel, Field487 488class RAGResponse(BaseModel):489    answer: str = Field(description="The answer based on context")490    confidence: float = Field(description="Confidence score 0-1")491    sources: list[str] = Field(description="Source document IDs used")492    reasoning: str = Field(description="Brief reasoning for the answer")493 494# Use with structured output495structured_llm = llm.with_structured_output(RAGResponse)496```497 498## Evaluation Metrics499 500```python501from typing import TypedDict502 503class RAGEvalMetrics(TypedDict):504    retrieval_precision: float  # Relevant docs / retrieved docs505    retrieval_recall: float     # Retrieved relevant / total relevant506    answer_relevance: float     # Answer addresses question507    faithfulness: float         # Answer grounded in context508    context_relevance: float    # Context relevant to question509 510async def evaluate_rag_system(511    rag_chain,512    test_cases: list[dict]513) -> RAGEvalMetrics:514    """Evaluate RAG system on test cases."""515    metrics = {k: [] for k in RAGEvalMetrics.__annotations__}516 517    for test in test_cases:518        result = await rag_chain.ainvoke({"question": test["question"]})519 520        # Retrieval metrics521        retrieved_ids = {doc.metadata["id"] for doc in result["context"]}522        relevant_ids = set(test["relevant_doc_ids"])523 524        precision = len(retrieved_ids & relevant_ids) / len(retrieved_ids)525        recall = len(retrieved_ids & relevant_ids) / len(relevant_ids)526 527        metrics["retrieval_precision"].append(precision)528        metrics["retrieval_recall"].append(recall)529 530        # Use LLM-as-judge for quality metrics531        quality = await evaluate_answer_quality(532            question=test["question"],533            answer=result["answer"],534            context=result["context"],535            expected=test.get("expected_answer")536        )537        metrics["answer_relevance"].append(quality["relevance"])538        metrics["faithfulness"].append(quality["faithfulness"])539        metrics["context_relevance"].append(quality["context_relevance"])540 541    return {k: sum(v) / len(v) for k, v in metrics.items()}542```
Related skills
Accessibility Compliance

This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov
Airflow Dag Patterns

If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API
Angular Migration

Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app