Name: Agent Memory Systems
Author: Sickn33
Install
Terminal · npx
$npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill agent-memory-systems
Works with Paperclip
How Agent Memory Systems fits into a Paperclip company.

Agent Memory Systems drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md1088 linesmarkdown
Expand
1---2name: agent-memory-systems3description: "Memory is the cornerstone of intelligent agents. Without it, every4  interaction starts from zero. This skill covers the architecture of agent5  memory: short-term (context window), long-term (vector stores), and the6  cognitive architectures that organize them."7risk: safe8source: vibeship-spawner-skills (Apache 2.0)9date_added: 2026-02-2710---11 12# Agent Memory Systems13 14Memory is the cornerstone of intelligent agents. Without it, every interaction15starts from zero. This skill covers the architecture of agent memory: short-term16(context window), long-term (vector stores), and the cognitive architectures17that organize them.18 19Key insight: Memory isn't just storage - it's retrieval. A million stored facts20mean nothing if you can't find the right one. Chunking, embedding, and retrieval21strategies determine whether your agent remembers or forgets.22 23The field is fragmented with inconsistent terminology. We use the CoALA cognitive24architecture framework: semantic memory (facts), episodic memory (experiences),25and procedural memory (how-to knowledge).26 27## Principles28 29- Memory quality = retrieval quality, not storage quantity30- Chunk for retrieval, not for storage31- Context isolation is the enemy of memory32- Right memory type for right information33- Decay old memories - not everything should be forever34- Test retrieval accuracy before production35- Background memory formation beats real-time36 37## Capabilities38 39- agent-memory40- long-term-memory41- short-term-memory42- working-memory43- episodic-memory44- semantic-memory45- procedural-memory46- memory-retrieval47- memory-formation48- memory-decay49 50## Scope51 52- vector-database-operations → data-engineer53- rag-pipeline-architecture → llm-architect54- embedding-model-selection → ml-engineer55- knowledge-graph-design → knowledge-engineer56 57## Tooling58 59### Memory_frameworks60 61- LangMem (LangChain) - When: LangGraph agents with persistent memory Note: Semantic, episodic, procedural memory types62- MemGPT / Letta - When: Virtual context management, OS-style memory Note: Hierarchical memory tiers, automatic paging63- Mem0 - When: User memory layer for personalization Note: Designed for user preferences and history64 65### Vector_stores66 67- Pinecone - When: Managed, enterprise-scale (billions of vectors) Note: Best query performance, highest cost68- Qdrant - When: Complex metadata filtering, open-source Note: Rust-based, excellent filtering69- Weaviate - When: Hybrid search, knowledge graph features Note: GraphQL interface, good for relationships70- ChromaDB - When: Prototyping, small/medium apps Note: Developer-friendly, ~20ms p50 at 100K vectors71- pgvector - When: Already using PostgreSQL, simpler setup Note: Good for <1M vectors, familiar tooling72 73### Embedding_models74 75- OpenAI text-embedding-3-large - When: Best quality, 3072 dimensions Note: $0.13/1M tokens76- OpenAI text-embedding-3-small - When: Good balance, 1536 dimensions Note: $0.02/1M tokens, 5x cheaper77- nomic-embed-text-v1.5 - When: Open-source, local deployment Note: 768 dimensions, good quality78- all-MiniLM-L6-v2 - When: Lightweight, fast local embedding Note: 384 dimensions, lowest latency79 80## Patterns81 82### Memory Type Architecture83 84Choosing the right memory type for different information85 86**When to use**: Designing agent memory system87 88# MEMORY TYPE ARCHITECTURE (CoALA Framework):89 90"""91Three memory types for different purposes:92 931. Semantic Memory: Facts and knowledge94   - What you know about the world95   - User preferences, domain knowledge96   - Stored in profiles (structured) or collections (unstructured)97 982. Episodic Memory: Experiences and events99   - What happened (timestamped events)100   - Past conversations, task outcomes101   - Used for learning from experience102 1033. Procedural Memory: How to do things104   - Rules, skills, workflows105   - Often implemented as few-shot examples106   - "How did I solve this before?"107"""108 109## LangMem Implementation110"""111from langmem import MemoryStore112from langgraph.graph import StateGraph113 114# Initialize memory store115memory = MemoryStore(116    connection_string=os.environ["POSTGRES_URL"]117)118 119# Semantic memory: user profile120await memory.semantic.upsert(121    namespace="user_profile",122    key=user_id,123    content={124        "name": "Alice",125        "preferences": ["dark mode", "concise responses"],126        "expertise_level": "developer",127    }128)129 130# Episodic memory: past interaction131await memory.episodic.add(132    namespace="conversations",133    content={134        "timestamp": datetime.now(),135        "summary": "Helped debug authentication issue",136        "outcome": "resolved",137        "key_insights": ["Token expiry was root cause"],138    },139    metadata={"user_id": user_id, "topic": "debugging"}140)141 142# Procedural memory: learned pattern143await memory.procedural.add(144    namespace="skills",145    content={146        "task_type": "debug_auth",147        "steps": ["Check token expiry", "Verify refresh flow"],148        "example_interaction": few_shot_example,149    }150)151"""152 153## Memory Retrieval at Runtime154"""155async def prepare_context(user_id, query):156    # Get user profile (semantic)157    profile = await memory.semantic.get(158        namespace="user_profile",159        key=user_id160    )161 162    # Find relevant past experiences (episodic)163    similar_experiences = await memory.episodic.search(164        namespace="conversations",165        query=query,166        filter={"user_id": user_id},167        limit=3168    )169 170    # Find relevant skills (procedural)171    relevant_skills = await memory.procedural.search(172        namespace="skills",173        query=query,174        limit=2175    )176 177    return {178        "profile": profile,179        "past_experiences": similar_experiences,180        "relevant_skills": relevant_skills,181    }182"""183 184### Vector Store Selection Pattern185 186Choosing the right vector database for your use case187 188**When to use**: Setting up persistent memory storage189 190# VECTOR STORE SELECTION:191 192"""193Decision matrix:194 195|            | Pinecone | Qdrant | Weaviate | ChromaDB | pgvector |196|------------|----------|--------|----------|----------|----------|197| Scale      | Billions | 100M+  | 100M+    | 1M       | 1M       |198| Managed    | Yes      | Both   | Both     | Self     | Self     |199| Filtering  | Basic    | Best   | Good     | Basic    | SQL      |200| Hybrid     | No       | Yes    | Best     | No       | Yes      |201| Cost       | High     | Medium | Medium   | Free     | Free     |202| Latency    | 5ms      | 7ms    | 10ms     | 20ms     | 15ms     |203"""204 205## Pinecone (Enterprise Scale)206"""207from pinecone import Pinecone208 209pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])210index = pc.Index("agent-memory")211 212# Upsert with metadata213index.upsert(214    vectors=[215        {216            "id": f"memory-{uuid4()}",217            "values": embedding,218            "metadata": {219                "user_id": user_id,220                "timestamp": datetime.now().isoformat(),221                "type": "episodic",222                "content": memory_text,223            }224        }225    ],226    namespace=namespace227)228 229# Query with filter230results = index.query(231    vector=query_embedding,232    filter={"user_id": user_id, "type": "episodic"},233    top_k=5,234    include_metadata=True235)236"""237 238## Qdrant (Complex Filtering)239"""240from qdrant_client import QdrantClient241from qdrant_client.models import PointStruct, Filter, FieldCondition242 243client = QdrantClient(url="http://localhost:6333")244 245# Complex filtering with Qdrant246results = client.search(247    collection_name="agent_memory",248    query_vector=query_embedding,249    query_filter=Filter(250        must=[251            FieldCondition(key="user_id", match={"value": user_id}),252            FieldCondition(key="type", match={"value": "semantic"}),253        ],254        should=[255            FieldCondition(key="topic", match={"any": ["auth", "security"]}),256        ]257    ),258    limit=5259)260"""261 262## ChromaDB (Prototyping)263"""264import chromadb265 266client = chromadb.PersistentClient(path="./memory_db")267collection = client.get_or_create_collection("agent_memory")268 269# Simple and fast for prototypes270collection.add(271    ids=[str(uuid4())],272    embeddings=[embedding],273    documents=[memory_text],274    metadatas=[{"user_id": user_id, "type": "episodic"}]275)276 277results = collection.query(278    query_embeddings=[query_embedding],279    n_results=5,280    where={"user_id": user_id}281)282"""283 284### Chunking Strategy Pattern285 286Breaking documents into retrievable chunks287 288**When to use**: Processing documents for memory storage289 290# CHUNKING STRATEGIES:291 292"""293The chunking dilemma:294- Too large: Vector loses specificity295- Too small: Loses context296 297Optimal chunk size depends on:298- Document type (code vs prose vs data)299- Query patterns (factual vs exploratory)300- Embedding model (each has sweet spot)301 302General guidance: 256-512 tokens for most use cases303"""304 305## Fixed-Size Chunking (Baseline)306"""307from langchain.text_splitter import RecursiveCharacterTextSplitter308 309splitter = RecursiveCharacterTextSplitter(310    chunk_size=500,      # Characters311    chunk_overlap=50,    # Overlap prevents cutting sentences312    separators=["\n\n", "\n", ". ", " ", ""]  # Priority order313)314 315chunks = splitter.split_text(document)316"""317 318## Semantic Chunking (Better Quality)319"""320from langchain_experimental.text_splitter import SemanticChunker321from langchain_openai import OpenAIEmbeddings322 323# Splits based on semantic similarity324splitter = SemanticChunker(325    embeddings=OpenAIEmbeddings(),326    breakpoint_threshold_type="percentile",327    breakpoint_threshold_amount=95328)329 330chunks = splitter.split_text(document)331"""332 333## Structure-Aware Chunking (Documents with Hierarchy)334"""335from langchain.text_splitter import MarkdownHeaderTextSplitter336 337# Respect document structure338splitter = MarkdownHeaderTextSplitter(339    headers_to_split_on=[340        ("#", "Header 1"),341        ("##", "Header 2"),342        ("###", "Header 3"),343    ]344)345 346chunks = splitter.split_text(markdown_doc)347# Each chunk has header metadata for context348"""349 350## Contextual Chunking (Anthropic's Approach)351"""352# Add context to each chunk before embedding353# Reduces retrieval failures by 35%354 355def add_context_to_chunk(chunk, document_summary):356    context_prompt = f'''357    Document summary: {document_summary}358 359    The following is a chunk from this document:360    {chunk}361    '''362    return context_prompt363 364# Embed the contextualized chunk, not raw chunk365for chunk in chunks:366    contextualized = add_context_to_chunk(chunk, summary)367    embedding = embed(contextualized)368    store(chunk, embedding)  # Store original, embed contextualized369"""370 371## Code-Specific Chunking372"""373from langchain.text_splitter import Language, RecursiveCharacterTextSplitter374 375# Language-aware splitting376python_splitter = RecursiveCharacterTextSplitter.from_language(377    language=Language.PYTHON,378    chunk_size=1000,379    chunk_overlap=200380)381 382# Respects function/class boundaries383chunks = python_splitter.split_text(python_code)384"""385 386### Background Memory Formation387 388Processing memories asynchronously for better quality389 390**When to use**: You want higher recall without slowing interactions391 392# BACKGROUND MEMORY FORMATION:393 394"""395Real-time memory extraction slows conversations and adds396complexity to agent tool calls. Background processing after397conversations yields higher quality memories.398 399Pattern: Subconscious memory formation400"""401 402## LangGraph Background Processing403"""404from langgraph.graph import StateGraph405from langgraph.checkpoint.postgres import PostgresSaver406 407async def background_memory_processor(thread_id: str):408    # Run after conversation ends or goes idle409    conversation = await load_conversation(thread_id)410 411    # Extract insights without time pressure412    insights = await llm.invoke('''413        Analyze this conversation and extract:414        1. Key facts learned about the user415        2. User preferences revealed416        3. Tasks completed or pending417        4. Patterns in user behavior418 419        Be thorough - this runs in background.420 421        Conversation:422        {conversation}423    ''')424 425    # Store to long-term memory426    for insight in insights:427        await memory.semantic.upsert(428            namespace="user_insights",429            key=generate_key(insight),430            content=insight,431            metadata={"source_thread": thread_id}432        )433 434# Trigger on conversation end or idle timeout435@on_conversation_idle(timeout_minutes=5)436async def process_conversation(thread_id):437    await background_memory_processor(thread_id)438"""439 440## Memory Consolidation (Like Sleep)441"""442# Periodically consolidate and deduplicate memories443 444async def consolidate_memories(user_id: str):445    # Get all memories for user446    memories = await memory.semantic.list(447        namespace="user_insights",448        filter={"user_id": user_id}449    )450 451    # Find similar memories (potential duplicates)452    clusters = cluster_by_similarity(memories, threshold=0.9)453 454    # Merge similar memories455    for cluster in clusters:456        if len(cluster) > 1:457            merged = await llm.invoke(f'''458                Consolidate these related memories into one:459                {cluster}460 461                Preserve all important information.462            ''')463            await memory.semantic.upsert(464                namespace="user_insights",465                key=generate_key(merged),466                content=merged467            )468            # Delete originals469            for old in cluster:470                await memory.semantic.delete(old.id)471"""472 473### Memory Decay Pattern474 475Forgetting old, irrelevant memories476 477**When to use**: Memory grows large, retrieval slows down478 479# MEMORY DECAY:480 481"""482Not all memories should live forever:483- Old preferences may be outdated484- Task details lose relevance485- Conflicting memories confuse retrieval486 487Implement intelligent decay based on:488- Recency (when was it created/accessed?)489- Frequency (how often is it retrieved?)490- Importance (is it a core fact or detail?)491"""492 493## Time-Based Decay494"""495from datetime import datetime, timedelta496 497async def decay_old_memories(namespace: str, max_age_days: int):498    cutoff = datetime.now() - timedelta(days=max_age_days)499 500    old_memories = await memory.episodic.list(501        namespace=namespace,502        filter={"last_accessed": {"$lt": cutoff.isoformat()}}503    )504 505    for mem in old_memories:506        # Soft delete (mark as archived)507        await memory.episodic.update(508            id=mem.id,509            metadata={"archived": True, "archived_at": datetime.now()}510        )511"""512 513## Utility-Based Decay (MIRIX Approach)514"""515def calculate_memory_utility(memory):516    '''517    Composite utility score inspired by cognitive science:518    - Recency: When was it last accessed?519    - Frequency: How often is it accessed?520    - Importance: How critical is this information?521    '''522    now = datetime.now()523 524    # Recency score (exponential decay with 72h half-life)525    hours_since_access = (now - memory.last_accessed).total_seconds() / 3600526    recency_score = 0.5 ** (hours_since_access / 72)527 528    # Frequency score529    frequency_score = min(memory.access_count / 10, 1.0)530 531    # Importance (from metadata or heuristic)532    importance = memory.metadata.get("importance", 0.5)533 534    # Weighted combination535    utility = (536        0.4 * recency_score +537        0.3 * frequency_score +538        0.3 * importance539    )540 541    return utility542 543async def prune_low_utility_memories(threshold=0.2):544    all_memories = await memory.list_all()545    for mem in all_memories:546        if calculate_memory_utility(mem) < threshold:547            await memory.archive(mem.id)548"""549 550## Sharp Edges551 552### Chunking Isolates Information From Its Context553 554Severity: CRITICAL555 556Situation: Processing documents for vector storage557 558Symptoms:559Retrieval finds chunks but they don't make sense alone. Agent560answers miss the big picture. "The function returns X" retrieved561without knowing which function. References to "this" without562knowing what "this" refers to.563 564Why this breaks:565When we chunk for AI processing, we're breaking connections,566reducing a holistic narrative to isolated fragments that often567miss the big picture. A chunk about "the configuration" without568context about what system is being configured is nearly useless.569 570Recommended fix:571 572## Contextual Chunking (Anthropic's approach)573# Add document context to each chunk before embedding574# Reduces retrieval failures by 35%575 576def contextualize_chunk(chunk, document):577    summary = summarize(document)578 579    # LLM generates context for chunk580    context = llm.invoke(f'''581        Document summary: {summary}582 583        Generate a brief context statement for this chunk584        that would help someone understand what it refers to:585 586        {chunk}587    ''')588 589    return f"{context}\n\n{chunk}"590 591# Embed the contextualized version592for chunk in chunks:593    contextualized = contextualize_chunk(chunk, full_doc)594    embedding = embed(contextualized)595    # Store original chunk, embed contextualized596    store(original=chunk, embedding=embedding)597 598## Hierarchical Chunking599# Store at multiple granularities600chunks_small = split(doc, size=256)601chunks_medium = split(doc, size=512)602chunks_large = split(doc, size=1024)603 604# Retrieve at appropriate level based on query605 606### Chunk Size Mismatched to Query Patterns607 608Severity: HIGH609 610Situation: Configuring chunking for memory storage611 612Symptoms:613High-quality documents produce low-quality retrievals. Simple614questions miss relevant information. Complex questions get615fragments instead of complete answers.616 617Why this breaks:618Optimal chunk size depends on query patterns:619- Factual queries need small, specific chunks620- Conceptual queries need larger context621- Code needs function-level boundaries622 623The sweet spot varies by document type and embedding model.624Default 1000 characters works for nothing specific.625 626Recommended fix:627 628## Test different sizes629from sklearn.metrics import recall_score630 631def evaluate_chunk_size(documents, test_queries, chunk_size):632    chunks = split_documents(documents, size=chunk_size)633    index = build_index(chunks)634 635    correct_retrievals = 0636    for query, expected_chunk in test_queries:637        results = index.search(query, k=5)638        if expected_chunk in results:639            correct_retrievals += 1640 641    return correct_retrievals / len(test_queries)642 643# Test multiple sizes644for size in [256, 512, 768, 1024]:645    recall = evaluate_chunk_size(docs, test_queries, size)646    print(f"Size {size}: Recall@5 = {recall:.2%}")647 648## Size recommendations by content type649CHUNK_SIZES = {650    "documentation": 512,   # Complete concepts651    "code": 1000,          # Function-level652    "conversation": 256,   # Turn-level653    "articles": 768,       # Paragraph-level654}655 656## Use overlap to prevent boundary issues657splitter = RecursiveCharacterTextSplitter(658    chunk_size=512,659    chunk_overlap=50,  # 10% overlap660)661 662### Semantic Search Returns Irrelevant Results663 664Severity: HIGH665 666Situation: Querying memory for context667 668Symptoms:669Agent retrieves memories that seem related but aren't useful.670"Tell me about the user's preferences" returns conversation671about preferences in general, not this user's. High similarity672scores for wrong content.673 674Why this breaks:675Semantic similarity isn't the same as relevance. "The user676likes Python" and "Python is a programming language" are677semantically similar but very different types of information.678Without metadata filtering, retrieval is just word matching.679 680Recommended fix:681 682## Always filter by metadata first683# Don't rely on semantic similarity alone684 685# Bad: Only semantic search686results = index.query(687    vector=query_embedding,688    top_k=5689)690 691# Good: Filter then search692results = index.query(693    vector=query_embedding,694    filter={695        "user_id": current_user.id,696        "type": "preference",697        "created_after": cutoff_date,698    },699    top_k=5700)701 702## Use hybrid search (semantic + keyword)703from qdrant_client import QdrantClient704 705client = QdrantClient(...)706 707# Hybrid search with fusion708results = client.search(709    collection_name="memories",710    query_vector=semantic_embedding,711    query_text=query,  # Also keyword match712    fusion={"method": "rrf"},  # Reciprocal Rank Fusion713)714 715## Rerank results with cross-encoder716from sentence_transformers import CrossEncoder717 718reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")719 720# Initial retrieval (recall-oriented)721candidates = index.query(query_embedding, top_k=20)722 723# Rerank (precision-oriented)724pairs = [(query, c.text) for c in candidates]725scores = reranker.predict(pairs)726reranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)727 728### Old Memories Override Current Information729 730Severity: HIGH731 732Situation: User preferences or facts change over time733 734Symptoms:735Agent uses outdated preferences. "User prefers dark mode" from7366 months ago overrides recent "switch to light mode" request.737Agent confidently uses stale data.738 739Why this breaks:740Vector stores don't have temporal awareness by default. A memory741from a year ago has the same retrieval weight as one from today.742Recent information should generally override old information743for preferences and mutable facts.744 745Recommended fix:746 747## Add temporal scoring748from datetime import datetime, timedelta749 750def time_decay_score(memory, half_life_days=30):751    age = (datetime.now() - memory.created_at).days752    decay = 0.5 ** (age / half_life_days)753    return decay754 755def retrieve_with_recency(query, user_id):756    # Get candidates757    candidates = index.query(758        vector=embed(query),759        filter={"user_id": user_id},760        top_k=20761    )762 763    # Apply time decay764    for candidate in candidates:765        time_score = time_decay_score(candidate)766        candidate.final_score = candidate.similarity * 0.7 + time_score * 0.3767 768    # Re-sort by final score769    return sorted(candidates, key=lambda x: x.final_score, reverse=True)[:5]770 771## Update instead of append for preferences772async def update_preference(user_id, category, value):773    # Delete old preference774    await memory.delete(775        filter={"user_id": user_id, "type": "preference", "category": category}776    )777 778    # Store new preference779    await memory.upsert(780        id=f"pref-{user_id}-{category}",781        content={"category": category, "value": value},782        metadata={"updated_at": datetime.now()}783    )784 785## Explicit versioning for facts786await memory.upsert(787    id=f"fact-{fact_id}-v{version}",788    content=new_fact,789    metadata={790        "version": version,791        "supersedes": previous_id,792        "valid_from": datetime.now()793    }794)795 796### Contradictory Memories Retrieved Together797 798Severity: MEDIUM799 800Situation: User has changed preferences or provided conflicting info801 802Symptoms:803Agent retrieves "user prefers dark mode" and "user prefers light804mode" in same context. Gives inconsistent answers. Seems confused805or forgetful to user.806 807Why this breaks:808Without conflict resolution, both old and new information coexist.809Semantic search might return both because they're both about the810same topic (preferences). Agent has no way to know which is current.811 812Recommended fix:813 814## Detect conflicts on storage815async def store_with_conflict_check(memory, user_id):816    # Find potentially conflicting memories817    similar = await index.query(818        vector=embed(memory.content),819        filter={"user_id": user_id, "type": memory.type},820        threshold=0.9,  # Very similar821        top_k=5822    )823 824    for existing in similar:825        if is_contradictory(memory.content, existing.content):826            # Ask for resolution827            resolution = await resolve_conflict(memory, existing)828            if resolution == "replace":829                await index.delete(existing.id)830            elif resolution == "version":831                await mark_superseded(existing.id, memory.id)832 833    await index.upsert(memory)834 835## Conflict detection heuristic836def is_contradictory(new_content, old_content):837    # Use LLM to detect contradiction838    result = llm.invoke(f'''839        Do these two statements contradict each other?840 841        Statement 1: {old_content}842        Statement 2: {new_content}843 844        Respond with just YES or NO.845    ''')846    return result.strip().upper() == "YES"847 848## Periodic consolidation849async def consolidate_memories(user_id):850    all_memories = await index.list(filter={"user_id": user_id})851    clusters = cluster_by_topic(all_memories)852 853    for cluster in clusters:854        if has_conflicts(cluster):855            resolved = await llm.invoke(f'''856                These memories may conflict. Create one consolidated857                memory that represents the current truth:858                {cluster}859            ''')860            await replace_cluster(cluster, resolved)861 862### Retrieved Memories Exceed Context Window863 864Severity: MEDIUM865 866Situation: Retrieving too many memories at once867 868Symptoms:869Token limit errors. Agent truncates important information.870System prompt gets cut off. Retrieved memories compete with871user query for space.872 873Why this breaks:874Retrieval typically returns top-k results. If k is too high or875chunks are too large, retrieved context overwhelms the window.876Critical information (system prompt, recent messages) gets pushed877out.878 879Recommended fix:880 881## Budget tokens for different memory types882TOKEN_BUDGET = {883    "system_prompt": 500,884    "user_profile": 200,885    "recent_messages": 2000,886    "retrieved_memories": 1000,887    "current_query": 500,888    "buffer": 300,  # Safety margin889}890 891def budget_aware_retrieval(query, context_limit=4000):892    remaining = context_limit - TOKEN_BUDGET["system_prompt"] - TOKEN_BUDGET["buffer"]893 894    # Prioritize recent messages895    recent = get_recent_messages(limit=TOKEN_BUDGET["recent_messages"])896    remaining -= count_tokens(recent)897 898    # Then user profile899    profile = get_user_profile(limit=TOKEN_BUDGET["user_profile"])900    remaining -= count_tokens(profile)901 902    # Finally retrieved memories with remaining budget903    memories = retrieve_memories(query, max_tokens=remaining)904 905    return build_context(profile, recent, memories)906 907## Dynamic k based on chunk size908def retrieve_with_budget(query, max_tokens=1000):909    avg_chunk_tokens = 150  # From your data910    max_k = max_tokens // avg_chunk_tokens911 912    results = index.query(query, top_k=max_k)913 914    # Trim if still over budget915    total_tokens = 0916    filtered = []917    for result in results:918        tokens = count_tokens(result.text)919        if total_tokens + tokens <= max_tokens:920            filtered.append(result)921            total_tokens += tokens922        else:923            break924 925    return filtered926 927### Query and Document Embeddings From Different Models928 929Severity: MEDIUM930 931Situation: Upgrading embedding model or mixing providers932 933Symptoms:934Retrieval quality suddenly drops. Relevant documents not found.935Random results returned. Works for new documents, fails for old.936 937Why this breaks:938Embedding models produce different vector spaces. A query embedded939with text-embedding-3 won't match documents embedded with text-ada-002.940Mixing models creates garbage similarity scores.941 942Recommended fix:943 944## Track embedding model in metadata945await index.upsert(946    id=doc_id,947    vector=embedding,948    metadata={949        "embedding_model": "text-embedding-3-small",950        "embedding_version": "2024-01",951        "content": content952    }953)954 955## Filter by model version on retrieval956results = index.query(957    vector=query_embedding,958    filter={"embedding_model": current_model},959    top_k=10960)961 962## Migration strategy for model upgrades963async def migrate_embeddings(old_model, new_model):964    # Get all documents with old model965    old_docs = await index.list(filter={"embedding_model": old_model})966 967    for doc in old_docs:968        # Re-embed with new model969        new_embedding = await embed(doc.content, model=new_model)970 971        # Update in place972        await index.update(973            id=doc.id,974            vector=new_embedding,975            metadata={"embedding_model": new_model}976        )977 978## Use separate collections during migration979# Old collection: production queries980# New collection: re-embedding in progress981# Switch over when complete982 983## Validation Checks984 985### In-Memory Store in Production Code986 987Severity: ERROR988 989In-memory stores lose data on restart990 991Message: In-memory store detected. Use persistent storage (Postgres, Qdrant, Pinecone) for production.992 993### Vector Upsert Without Metadata994 995Severity: WARNING996 997Vectors should have metadata for filtering998 999Message: Vector upsert without metadata. Add user_id, type, timestamp for proper filtering.1000 1001### Query Without User Filtering1002 1003Severity: ERROR1004 1005Queries should filter by user to prevent data leakage1006 1007Message: Vector query without user filtering. Always filter by user_id to prevent data leakage.1008 1009### Hardcoded Chunk Size Without Justification1010 1011Severity: INFO1012 1013Chunk size should be tested and justified1014 1015Message: Hardcoded chunk size. Test different sizes for your content type and measure retrieval accuracy.1016 1017### Chunking Without Overlap1018 1019Severity: WARNING1020 1021Chunk overlap prevents boundary issues1022 1023Message: Text splitting without overlap. Add chunk_overlap (10-20%) to prevent boundary issues.1024 1025### Semantic Search Without Filters1026 1027Severity: WARNING1028 1029Pure semantic search often returns irrelevant results1030 1031Message: Pure semantic search. Add metadata filters (user, type, time) for better relevance.1032 1033### Retrieval Without Result Limit1034 1035Severity: WARNING1036 1037Unbounded retrieval can overflow context1038 1039Message: Retrieval without limit. Set top_k to prevent context overflow.1040 1041### Embeddings Without Model Version Tracking1042 1043Severity: WARNING1044 1045Track embedding model to handle migrations1046 1047Message: Store embedding model version in metadata to handle model migrations.1048 1049### Different Models for Document and Query Embedding1050 1051Severity: ERROR1052 1053Documents and queries must use same embedding model1054 1055Message: Ensure same embedding model for indexing and querying.1056 1057## Collaboration1058 1059### Delegation Triggers1060 1061- user needs vector database at scale -> data-engineer (Production vector store operations)1062- user needs embedding model optimization -> ml-engineer (Custom embeddings, fine-tuning)1063- user needs knowledge graph -> knowledge-engineer (Graph-based memory structures)1064- user needs RAG pipeline -> llm-architect (End-to-end retrieval augmented generation)1065- user needs multi-agent shared memory -> multi-agent-orchestration (Memory sharing between agents)1066 1067## Related Skills1068 1069Works well with: `autonomous-agents`, `multi-agent-orchestration`, `llm-architect`, `agent-tool-builder`1070 1071## When to Use1072- User mentions or implies: agent memory1073- User mentions or implies: long-term memory1074- User mentions or implies: memory systems1075- User mentions or implies: remember across sessions1076- User mentions or implies: memory retrieval1077- User mentions or implies: episodic memory1078- User mentions or implies: semantic memory1079- User mentions or implies: vector store1080- User mentions or implies: rag1081- User mentions or implies: langmem1082- User mentions or implies: memgpt1083- User mentions or implies: conversation history1084 1085## Limitations1086- Use this skill only when the task clearly matches the scope described above.1087- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.1088- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
Related skills
3d Web Experience

Install 3d Web Experience skill for Claude Code from sickn33/antigravity-awesome-skills.
Agent Memory Mcp

Install Agent Memory Mcp skill for Claude Code from sickn33/antigravity-awesome-skills.
Ai Agents Architect

Install Ai Agents Architect skill for Claude Code from sickn33/antigravity-awesome-skills.