npx skills add https://github.com/wshobson/agents --skill rag-implementationHow Rag Implementation fits into a Paperclip company.
Rag Implementation drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
SKILL.md542 linesExpandCollapse
---name: rag-implementationdescription: Build Retrieval-Augmented Generation (RAG) systems for LLM applications with vector databases and semantic search. Use when implementing knowledge-grounded AI, building document Q&A systems, or integrating LLMs with external knowledge bases.--- # RAG Implementation Master Retrieval-Augmented Generation (RAG) to build LLM applications that provide accurate, grounded responses using external knowledge sources. ## When to Use This Skill - Building Q&A systems over proprietary documents- Creating chatbots with current, factual information- Implementing semantic search with natural language queries- Reducing hallucinations with grounded responses- Enabling LLMs to access domain-specific knowledge- Building documentation assistants- Creating research tools with source citation ## Core Components ### 1. Vector Databases **Purpose**: Store and retrieve document embeddings efficiently **Options:** - **Pinecone**: Managed, scalable, serverless- **Weaviate**: Open-source, hybrid search, GraphQL- **Milvus**: High performance, on-premise- **Chroma**: Lightweight, easy to use, local development- **Qdrant**: Fast, filtered search, Rust-based- **pgvector**: PostgreSQL extension, SQL integration ### 2. Embeddings **Purpose**: Convert text to numerical vectors for similarity search **Models (2026):**| Model | Dimensions | Best For ||-------|------------|----------|| **voyage-3-large** | 1024 | Claude apps (Anthropic recommended) || **voyage-code-3** | 1024 | Code search || **text-embedding-3-large** | 3072 | OpenAI apps, high accuracy || **text-embedding-3-small** | 1536 | OpenAI apps, cost-effective || **bge-large-en-v1.5** | 1024 | Open source, local deployment || **multilingual-e5-large** | 1024 | Multi-language support | ### 3. Retrieval Strategies **Approaches:** - **Dense Retrieval**: Semantic similarity via embeddings- **Sparse Retrieval**: Keyword matching (BM25, TF-IDF)- **Hybrid Search**: Combine dense + sparse with weighted fusion- **Multi-Query**: Generate multiple query variations- **HyDE**: Generate hypothetical documents for better retrieval ### 4. Reranking **Purpose**: Improve retrieval quality by reordering results **Methods:** - **Cross-Encoders**: BERT-based reranking (ms-marco-MiniLM)- **Cohere Rerank**: API-based reranking- **Maximal Marginal Relevance (MMR)**: Diversity + relevance- **LLM-based**: Use LLM to score relevance ## Quick Start with LangGraph ```pythonfrom langgraph.graph import StateGraph, START, ENDfrom langchain_anthropic import ChatAnthropicfrom langchain_voyageai import VoyageAIEmbeddingsfrom langchain_pinecone import PineconeVectorStorefrom langchain_core.documents import Documentfrom langchain_core.prompts import ChatPromptTemplatefrom langchain_text_splitters import RecursiveCharacterTextSplitterfrom typing import TypedDict, Annotated class RAGState(TypedDict): question: str context: list[Document] answer: str # Initialize componentsllm = ChatAnthropic(model="claude-sonnet-4-6")embeddings = VoyageAIEmbeddings(model="voyage-3-large")vectorstore = PineconeVectorStore(index_name="docs", embedding=embeddings)retriever = vectorstore.as_retriever(search_kwargs={"k": 4}) # RAG promptrag_prompt = ChatPromptTemplate.from_template( """Answer based on the context below. If you cannot answer, say so. Context: {context} Question: {question} Answer:""") async def retrieve(state: RAGState) -> RAGState: """Retrieve relevant documents.""" docs = await retriever.ainvoke(state["question"]) return {"context": docs} async def generate(state: RAGState) -> RAGState: """Generate answer from context.""" context_text = "\n\n".join(doc.page_content for doc in state["context"]) messages = rag_prompt.format_messages( context=context_text, question=state["question"] ) response = await llm.ainvoke(messages) return {"answer": response.content} # Build RAG graphbuilder = StateGraph(RAGState)builder.add_node("retrieve", retrieve)builder.add_node("generate", generate)builder.add_edge(START, "retrieve")builder.add_edge("retrieve", "generate")builder.add_edge("generate", END) rag_chain = builder.compile() # Useresult = await rag_chain.ainvoke({"question": "What are the main features?"})print(result["answer"])``` ## Advanced RAG Patterns ### Pattern 1: Hybrid Search with RRF ```pythonfrom langchain_community.retrievers import BM25Retrieverfrom langchain.retrievers import EnsembleRetriever # Sparse retriever (BM25 for keyword matching)bm25_retriever = BM25Retriever.from_documents(documents)bm25_retriever.k = 10 # Dense retriever (embeddings for semantic search)dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 10}) # Combine with Reciprocal Rank Fusion weightsensemble_retriever = EnsembleRetriever( retrievers=[bm25_retriever, dense_retriever], weights=[0.3, 0.7] # 30% keyword, 70% semantic)``` ### Pattern 2: Multi-Query Retrieval ```pythonfrom langchain.retrievers.multi_query import MultiQueryRetriever # Generate multiple query perspectives for better recallmulti_query_retriever = MultiQueryRetriever.from_llm( retriever=vectorstore.as_retriever(search_kwargs={"k": 5}), llm=llm) # Single query → multiple variations → combined resultsresults = await multi_query_retriever.ainvoke("What is the main topic?")``` ### Pattern 3: Contextual Compression ```pythonfrom langchain.retrievers import ContextualCompressionRetrieverfrom langchain.retrievers.document_compressors import LLMChainExtractor # Compressor extracts only relevant portionscompressor = LLMChainExtractor.from_llm(llm) compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=vectorstore.as_retriever(search_kwargs={"k": 10})) # Returns only relevant parts of documentscompressed_docs = await compression_retriever.ainvoke("specific query")``` ### Pattern 4: Parent Document Retriever ```pythonfrom langchain.retrievers import ParentDocumentRetrieverfrom langchain.storage import InMemoryStorefrom langchain_text_splitters import RecursiveCharacterTextSplitter # Small chunks for precise retrieval, large chunks for contextchild_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200) # Store for parent documentsdocstore = InMemoryStore() parent_retriever = ParentDocumentRetriever( vectorstore=vectorstore, docstore=docstore, child_splitter=child_splitter, parent_splitter=parent_splitter) # Add documents (splits children, stores parents)await parent_retriever.aadd_documents(documents) # Retrieval returns parent documents with full contextresults = await parent_retriever.ainvoke("query")``` ### Pattern 5: HyDE (Hypothetical Document Embeddings) ```pythonfrom langchain_core.prompts import ChatPromptTemplate class HyDEState(TypedDict): question: str hypothetical_doc: str context: list[Document] answer: str hyde_prompt = ChatPromptTemplate.from_template( """Write a detailed passage that would answer this question: Question: {question} Passage:""") async def generate_hypothetical(state: HyDEState) -> HyDEState: """Generate hypothetical document for better retrieval.""" messages = hyde_prompt.format_messages(question=state["question"]) response = await llm.ainvoke(messages) return {"hypothetical_doc": response.content} async def retrieve_with_hyde(state: HyDEState) -> HyDEState: """Retrieve using hypothetical document.""" # Use hypothetical doc for retrieval instead of original query docs = await retriever.ainvoke(state["hypothetical_doc"]) return {"context": docs} # Build HyDE RAG graphbuilder = StateGraph(HyDEState)builder.add_node("hypothetical", generate_hypothetical)builder.add_node("retrieve", retrieve_with_hyde)builder.add_node("generate", generate)builder.add_edge(START, "hypothetical")builder.add_edge("hypothetical", "retrieve")builder.add_edge("retrieve", "generate")builder.add_edge("generate", END) hyde_rag = builder.compile()``` ## Document Chunking Strategies ### Recursive Character Text Splitter ```pythonfrom langchain_text_splitters import RecursiveCharacterTextSplitter splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, length_function=len, separators=["\n\n", "\n", ". ", " ", ""] # Try in order) chunks = splitter.split_documents(documents)``` ### Token-Based Splitting ```pythonfrom langchain_text_splitters import TokenTextSplitter splitter = TokenTextSplitter( chunk_size=512, chunk_overlap=50, encoding_name="cl100k_base" # OpenAI tiktoken encoding)``` ### Semantic Chunking ```pythonfrom langchain_experimental.text_splitter import SemanticChunker splitter = SemanticChunker( embeddings=embeddings, breakpoint_threshold_type="percentile", breakpoint_threshold_amount=95)``` ### Markdown Header Splitter ```pythonfrom langchain_text_splitters import MarkdownHeaderTextSplitter headers_to_split_on = [ ("#", "Header 1"), ("##", "Header 2"), ("###", "Header 3"),] splitter = MarkdownHeaderTextSplitter( headers_to_split_on=headers_to_split_on, strip_headers=False)``` ## Vector Store Configurations ### Pinecone (Serverless) ```pythonfrom pinecone import Pinecone, ServerlessSpecfrom langchain_pinecone import PineconeVectorStore # Initialize Pinecone clientpc = Pinecone(api_key=os.environ["PINECONE_API_KEY"]) # Create index if neededif "my-index" not in pc.list_indexes().names(): pc.create_index( name="my-index", dimension=1024, # voyage-3-large dimensions metric="cosine", spec=ServerlessSpec(cloud="aws", region="us-east-1") ) # Create vector storeindex = pc.Index("my-index")vectorstore = PineconeVectorStore(index=index, embedding=embeddings)``` ### Weaviate ```pythonimport weaviatefrom langchain_weaviate import WeaviateVectorStore client = weaviate.connect_to_local() # or connect_to_weaviate_cloud() vectorstore = WeaviateVectorStore( client=client, index_name="Documents", text_key="content", embedding=embeddings)``` ### Chroma (Local Development) ```pythonfrom langchain_chroma import Chroma vectorstore = Chroma( collection_name="my_collection", embedding_function=embeddings, persist_directory="./chroma_db")``` ### pgvector (PostgreSQL) ```pythonfrom langchain_postgres.vectorstores import PGVector connection_string = "postgresql+psycopg://user:pass@localhost:5432/vectordb" vectorstore = PGVector( embeddings=embeddings, collection_name="documents", connection=connection_string,)``` ## Retrieval Optimization ### 1. Metadata Filtering ```pythonfrom langchain_core.documents import Document # Add metadata during indexingdocs_with_metadata = []for doc in documents: doc.metadata.update({ "source": doc.metadata.get("source", "unknown"), "category": determine_category(doc.page_content), "date": datetime.now().isoformat() }) docs_with_metadata.append(doc) # Filter during retrievalresults = await vectorstore.asimilarity_search( "query", filter={"category": "technical"}, k=5)``` ### 2. Maximal Marginal Relevance (MMR) ```python# Balance relevance with diversityresults = await vectorstore.amax_marginal_relevance_search( "query", k=5, fetch_k=20, # Fetch 20, return top 5 diverse lambda_mult=0.5 # 0=max diversity, 1=max relevance)``` ### 3. Reranking with Cross-Encoder ```pythonfrom sentence_transformers import CrossEncoder reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2') async def retrieve_and_rerank(query: str, k: int = 5) -> list[Document]: # Get initial results candidates = await vectorstore.asimilarity_search(query, k=20) # Rerank pairs = [[query, doc.page_content] for doc in candidates] scores = reranker.predict(pairs) # Sort by score and take top k ranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True) return [doc for doc, score in ranked[:k]]``` ### 4. Cohere Rerank ```pythonfrom langchain.retrievers import CohereRerankfrom langchain_cohere import CohereRerank reranker = CohereRerank(model="rerank-english-v3.0", top_n=5) # Wrap retriever with rerankingreranked_retriever = ContextualCompressionRetriever( base_compressor=reranker, base_retriever=vectorstore.as_retriever(search_kwargs={"k": 20}))``` ## Prompt Engineering for RAG ### Contextual Prompt with Citations ```pythonrag_prompt = ChatPromptTemplate.from_template( """Answer the question based on the context below. Include citations using [1], [2], etc. If you cannot answer based on the context, say "I don't have enough information." Context: {context} Question: {question} Instructions: 1. Use only information from the context 2. Cite sources with [1], [2] format 3. If uncertain, express uncertainty Answer (with citations):""")``` ### Structured Output for RAG ```pythonfrom pydantic import BaseModel, Field class RAGResponse(BaseModel): answer: str = Field(description="The answer based on context") confidence: float = Field(description="Confidence score 0-1") sources: list[str] = Field(description="Source document IDs used") reasoning: str = Field(description="Brief reasoning for the answer") # Use with structured outputstructured_llm = llm.with_structured_output(RAGResponse)``` ## Evaluation Metrics ```pythonfrom typing import TypedDict class RAGEvalMetrics(TypedDict): retrieval_precision: float # Relevant docs / retrieved docs retrieval_recall: float # Retrieved relevant / total relevant answer_relevance: float # Answer addresses question faithfulness: float # Answer grounded in context context_relevance: float # Context relevant to question async def evaluate_rag_system( rag_chain, test_cases: list[dict]) -> RAGEvalMetrics: """Evaluate RAG system on test cases.""" metrics = {k: [] for k in RAGEvalMetrics.__annotations__} for test in test_cases: result = await rag_chain.ainvoke({"question": test["question"]}) # Retrieval metrics retrieved_ids = {doc.metadata["id"] for doc in result["context"]} relevant_ids = set(test["relevant_doc_ids"]) precision = len(retrieved_ids & relevant_ids) / len(retrieved_ids) recall = len(retrieved_ids & relevant_ids) / len(relevant_ids) metrics["retrieval_precision"].append(precision) metrics["retrieval_recall"].append(recall) # Use LLM-as-judge for quality metrics quality = await evaluate_answer_quality( question=test["question"], answer=result["answer"], context=result["context"], expected=test.get("expected_answer") ) metrics["answer_relevance"].append(quality["relevance"]) metrics["faithfulness"].append(quality["faithfulness"]) metrics["context_relevance"].append(quality["context_relevance"]) return {k: sum(v) / len(v) for k, v in metrics.items()}```Accessibility Compliance
This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov
Airflow Dag Patterns
If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API
Angular Migration
Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app