Claude Agent Skill · by Sickn33

Context Window Management

Install Context Window Management skill for Claude Code from sickn33/antigravity-awesome-skills.

Works with Paperclip

How Context Window Management fits into a Paperclip company.

Context Window Management drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

S
SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59
Explore pack
Source file
SKILL.md315 lines
Expand
---name: context-window-managementdescription: Strategies for managing LLM context windows including  summarization, trimming, routing, and avoiding context rotrisk: unknownsource: vibeship-spawner-skills (Apache 2.0)date_added: 2026-02-27--- # Context Window Management Strategies for managing LLM context windows including summarization, trimming, routing, and avoiding context rot ## Capabilities - context-engineering- context-summarization- context-trimming- context-routing- token-counting- context-prioritization ## Prerequisites - Knowledge: LLM fundamentals, Tokenization basics, Prompt engineering- Skills_recommended: prompt-engineering ## Scope - Does_not_cover: RAG implementation details, Model fine-tuning, Embedding models- Boundaries: Focus is context optimization, Covers strategies not specific implementations ## Ecosystem ### Primary_tools - tiktoken - OpenAI's tokenizer for counting tokens- LangChain - Framework with context management utilities- Claude API - 200K+ context with caching support ## Patterns ### Tiered Context Strategy Different strategies based on context size **When to use**: Building any multi-turn conversation system interface ContextTier {    maxTokens: number;    strategy: 'full' | 'summarize' | 'rag';    model: string;} const TIERS: ContextTier[] = [    { maxTokens: 8000, strategy: 'full', model: 'claude-3-haiku' },    { maxTokens: 32000, strategy: 'full', model: 'claude-3-5-sonnet' },    { maxTokens: 100000, strategy: 'summarize', model: 'claude-3-5-sonnet' },    { maxTokens: Infinity, strategy: 'rag', model: 'claude-3-5-sonnet' }]; async function selectStrategy(messages: Message[]): ContextTier {    const tokens = await countTokens(messages);     for (const tier of TIERS) {        if (tokens <= tier.maxTokens) {            return tier;        }    }    return TIERS[TIERS.length - 1];} async function prepareContext(messages: Message[]): PreparedContext {    const tier = await selectStrategy(messages);     switch (tier.strategy) {        case 'full':            return { messages, model: tier.model };         case 'summarize':            const summary = await summarizeOldMessages(messages);            return { messages: [summary, ...recentMessages(messages)], model: tier.model };         case 'rag':            const relevant = await retrieveRelevant(messages);            return { messages: [...relevant, ...recentMessages(messages)], model: tier.model };    }} ### Serial Position Optimization Place important content at start and end **When to use**: Constructing prompts with significant context // LLMs weight beginning and end more heavily// Structure prompts to leverage this function buildOptimalPrompt(components: {    systemPrompt: string;    criticalContext: string;    conversationHistory: Message[];    currentQuery: string;}): string {    // START: System instructions (always first)    const parts = [components.systemPrompt];     // CRITICAL CONTEXT: Right after system (high primacy)    if (components.criticalContext) {        parts.push(`## Key Context\n${components.criticalContext}`);    }     // MIDDLE: Conversation history (lower weight)    // Summarize if long, keep recent messages full    const history = components.conversationHistory;    if (history.length > 10) {        const oldSummary = summarize(history.slice(0, -5));        const recent = history.slice(-5);        parts.push(`## Earlier Conversation (Summary)\n${oldSummary}`);        parts.push(`## Recent Messages\n${formatMessages(recent)}`);    } else {        parts.push(`## Conversation\n${formatMessages(history)}`);    }     // END: Current query (high recency)    // Restate critical requirements here    parts.push(`## Current Request\n${components.currentQuery}`);     // FINAL: Reminder of key constraints    parts.push(`Remember: ${extractKeyConstraints(components.systemPrompt)}`);     return parts.join('\n\n');} ### Intelligent Summarization Summarize by importance, not just recency **When to use**: Context exceeds optimal size interface MessageWithMetadata extends Message {    importance: number;  // 0-1 score    hasCriticalInfo: boolean;  // User preferences, decisions    referenced: boolean;  // Was this referenced later?} async function smartSummarize(    messages: MessageWithMetadata[],    targetTokens: number): Message[] {    // Sort by importance, preserve order for tied scores    const sorted = [...messages].sort((a, b) =>        (b.importance + (b.hasCriticalInfo ? 0.5 : 0) + (b.referenced ? 0.3 : 0)) -        (a.importance + (a.hasCriticalInfo ? 0.5 : 0) + (a.referenced ? 0.3 : 0))    );     const keep: Message[] = [];    const summarizePool: Message[] = [];    let currentTokens = 0;     for (const msg of sorted) {        const msgTokens = await countTokens([msg]);        if (currentTokens + msgTokens < targetTokens * 0.7) {            keep.push(msg);            currentTokens += msgTokens;        } else {            summarizePool.push(msg);        }    }     // Summarize the low-importance messages    if (summarizePool.length > 0) {        const summary = await llm.complete(`            Summarize these messages, preserving:            - Any user preferences or decisions            - Key facts that might be referenced later            - The overall flow of conversation             Messages:            ${formatMessages(summarizePool)}        `);         keep.unshift({ role: 'system', content: `[Earlier context: ${summary}]` });    }     // Restore original order    return keep.sort((a, b) => a.timestamp - b.timestamp);} ### Token Budget Allocation Allocate token budget across context components **When to use**: Need predictable context management interface TokenBudget {    system: number;      // System prompt    criticalContext: number;  // User prefs, key info    history: number;     // Conversation history    query: number;       // Current query    response: number;    // Reserved for response} function allocateBudget(totalTokens: number): TokenBudget {    return {        system: Math.floor(totalTokens * 0.10),      // 10%        criticalContext: Math.floor(totalTokens * 0.15),  // 15%        history: Math.floor(totalTokens * 0.40),     // 40%        query: Math.floor(totalTokens * 0.10),       // 10%        response: Math.floor(totalTokens * 0.25),    // 25%    };} async function buildWithBudget(    components: ContextComponents,    modelMaxTokens: number): PreparedContext {    const budget = allocateBudget(modelMaxTokens);     // Truncate/summarize each component to fit budget    const prepared = {        system: truncateToTokens(components.system, budget.system),        criticalContext: truncateToTokens(            components.criticalContext, budget.criticalContext        ),        history: await summarizeToTokens(components.history, budget.history),        query: truncateToTokens(components.query, budget.query),    };     // Reallocate unused budget    const used = await countTokens(Object.values(prepared).join('\n'));    const remaining = modelMaxTokens - used - budget.response;     if (remaining > 0) {        // Give extra to history (most valuable for conversation)        prepared.history = await summarizeToTokens(            components.history,            budget.history + remaining        );    }     return prepared;} ## Validation Checks ### No Token Counting Severity: WARNING Message: Building context without token counting. May exceed model limits. Fix action: Count tokens before sending, implement budget allocation ### Naive Message Truncation Severity: WARNING Message: Truncating messages without summarization. Critical context may be lost. Fix action: Summarize old messages instead of simply removing them ### Hardcoded Token Limit Severity: INFO Message: Hardcoded token limit. Consider making configurable per model. Fix action: Use model-specific limits from configuration ### No Context Management Strategy Severity: WARNING Message: LLM calls without context management strategy. Fix action: Implement context management: budgets, summarization, or RAG ## Collaboration ### Delegation Triggers - retrieval|rag|search -> rag-implementation (Need retrieval system)- memory|persistence|remember -> conversation-memory (Need memory storage)- cache|caching -> prompt-caching (Need caching optimization) ### Complete Context System Skills: context-window-management, rag-implementation, conversation-memory, prompt-caching Workflow: ```1. Design context strategy2. Implement RAG for large corpuses3. Set up memory persistence4. Add caching for performance``` ## Related Skills Works well with: `rag-implementation`, `conversation-memory`, `prompt-caching`, `llm-npc-dialogue` ## When to Use- User mentions or implies: context window- User mentions or implies: token limit- User mentions or implies: context management- User mentions or implies: context engineering- User mentions or implies: long context- User mentions or implies: context overflow ## Limitations- Use this skill only when the task clearly matches the scope described above.- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.