Name: Autonomous Agents
Author: Sickn33
Install
Terminal · npx
$npx skills add https://github.com/vercel-labs/agent-skills --skill vercel-react-best-practices
Works with Paperclip
How Autonomous Agents fits into a Paperclip company.

Autonomous Agents drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md1084 linesmarkdown
Expand
1---2name: autonomous-agents3description: Autonomous agents are AI systems that can independently decompose4  goals, plan actions, execute tools, and self-correct without constant human5  guidance. The challenge isn't making them capable - it's making them reliable.6  Every extra decision multiplies failure probability.7risk: unknown8source: vibeship-spawner-skills (Apache 2.0)9date_added: 2026-02-2710---11 12# Autonomous Agents13 14Autonomous agents are AI systems that can independently decompose goals,15plan actions, execute tools, and self-correct without constant human guidance.16The challenge isn't making them capable - it's making them reliable. Every17extra decision multiplies failure probability.18 19This skill covers agent loops (ReAct, Plan-Execute), goal decomposition,20reflection patterns, and production reliability. Key insight: compounding21error rates kill autonomous agents. A 95% success rate per step drops to2260% by step 10. Build for reliability first, autonomy second.23 242025 lesson: The winners are constrained, domain-specific agents with clear25boundaries, not "autonomous everything." Treat AI outputs as proposals,26not truth.27 28## Principles29 30- Reliability over autonomy - every step compounds error probability31- Constrain scope - domain-specific beats general-purpose32- Treat outputs as proposals, not truth33- Build guardrails before expanding capabilities34- Human-in-the-loop for critical decisions is non-negotiable35- Log everything - every action must be auditable36- Fail safely with rollback, not silently with corruption37 38## Capabilities39 40- autonomous-agents41- agent-loops42- goal-decomposition43- self-correction44- reflection-patterns45- react-pattern46- plan-execute47- agent-reliability48- agent-guardrails49 50## Scope51 52- multi-agent-systems → multi-agent-orchestration53- tool-building → agent-tool-builder54- memory-systems → agent-memory-systems55- workflow-orchestration → workflow-automation56 57## Tooling58 59### Frameworks60 61- LangGraph - When: Production agents with state management Note: 1.0 released Oct 2025, checkpointing, human-in-loop62- AutoGPT - When: Research/experimentation, open-ended exploration Note: Needs external guardrails for production63- CrewAI - When: Role-based agent teams Note: Good for specialized agent collaboration64- Claude Agent SDK - When: Anthropic ecosystem agents Note: Computer use, tool execution65 66### Patterns67 68- ReAct - When: Reasoning + Acting in alternating steps Note: Foundation for most modern agents69- Plan-Execute - When: Separate planning from execution Note: Better for complex multi-step tasks70- Reflection - When: Self-evaluation and correction Note: Evaluator-optimizer loop71 72## Patterns73 74### ReAct Agent Loop75 76Alternating reasoning and action steps77 78**When to use**: Interactive problem-solving, tool use, exploration79 80# REACT PATTERN:81 82"""83The ReAct loop:841. Thought: Reason about what to do next852. Action: Choose and execute a tool863. Observation: Receive result874. Repeat until goal achieved88 89Key: Explicit reasoning traces make debugging possible90"""91 92## Basic ReAct Implementation93"""94from langchain.agents import create_react_agent95from langchain_openai import ChatOpenAI96 97# Define the ReAct prompt template98react_prompt = '''99Answer the question using the following format:100 101Question: the input question102Thought: reason about what to do103Action: tool_name104Action Input: input to the tool105Observation: result of the action106... (repeat Thought/Action/Observation as needed)107Thought: I now know the final answer108Final Answer: the answer109'''110 111# Create the agent112agent = create_react_agent(113    llm=ChatOpenAI(model="gpt-4o"),114    tools=tools,115    prompt=react_prompt,116)117 118# Execute with step limit119result = agent.invoke(120    {"input": query},121    config={"max_iterations": 10}  # Prevent runaway loops122)123"""124 125## LangGraph ReAct (Production)126"""127from langgraph.prebuilt import create_react_agent128from langgraph.checkpoint.postgres import PostgresSaver129 130# Production checkpointer131checkpointer = PostgresSaver.from_conn_string(132    os.environ["POSTGRES_URL"]133)134 135agent = create_react_agent(136    model=llm,137    tools=tools,138    checkpointer=checkpointer,  # Durable state139)140 141# Invoke with thread for state persistence142config = {"configurable": {"thread_id": "user-123"}}143result = agent.invoke({"messages": [query]}, config)144"""145 146### Plan-Execute Pattern147 148Separate planning phase from execution149 150**When to use**: Complex multi-step tasks, when full plan visibility matters151 152# PLAN-EXECUTE PATTERN:153 154"""155Two-phase approach:1561. Planning: Decompose goal into subtasks1572. Execution: Execute subtasks, potentially re-plan158 159Advantages:160- Full visibility into plan before execution161- Can validate/modify plan with human162- Cleaner separation of concerns163 164Disadvantages:165- Less adaptive to mid-task discoveries166- Plan may become stale167"""168 169## LangGraph Plan-Execute170"""171from langgraph.prebuilt import create_plan_and_execute_agent172 173# Planner creates the task list174planner_prompt = '''175For the given objective, create a step-by-step plan.176Each step should be atomic and actionable.177Format: numbered list of steps.178'''179 180# Executor handles individual steps181executor_prompt = '''182You are executing step {step_number} of the plan.183Previous results: {previous_results}184Current step: {current_step}185Execute this step using available tools.186'''187 188agent = create_plan_and_execute_agent(189    planner=planner_llm,190    executor=executor_llm,191    tools=tools,192    replan_on_error=True,  # Re-plan if step fails193)194 195# Human approval of plan196config = {197    "configurable": {198        "thread_id": "task-456",199    },200    "interrupt_before": ["execute"],  # Pause before execution201}202 203# First call creates plan204plan = agent.invoke({"objective": goal}, config)205 206# Review plan, then continue207if human_approves(plan):208    result = agent.invoke(None, config)  # Continue from checkpoint209"""210 211## Decomposition Strategies212"""213# Decomposition-First: Plan everything, then execute214# Best for: Stable tasks, need full plan approval215 216# Interleaved: Plan one step, execute, repeat217# Best for: Dynamic tasks, learning as you go218 219def interleaved_execute(goal, max_steps=10):220    state = {"goal": goal, "completed": [], "remaining": [goal]}221 222    for step in range(max_steps):223        # Plan next action based on current state224        next_action = planner.plan_next(state)225 226        if next_action == "DONE":227            break228 229        # Execute and update state230        result = executor.execute(next_action)231        state["completed"].append((next_action, result))232 233        # Re-evaluate remaining work234        state["remaining"] = planner.reassess(state)235 236    return state237"""238 239### Reflection Pattern240 241Self-evaluation and iterative improvement242 243**When to use**: Quality matters, complex outputs, creative tasks244 245# REFLECTION PATTERN:246 247"""248Self-correction loop:2491. Generate initial output2502. Evaluate against criteria2513. Critique and identify issues2524. Refine based on critique2535. Repeat until satisfactory254 255Also called: Evaluator-Optimizer, Self-Critique256"""257 258## Basic Reflection259"""260def reflect_and_improve(task, max_iterations=3):261    # Initial generation262    output = generator.generate(task)263 264    for i in range(max_iterations):265        # Evaluate output266        critique = evaluator.critique(267            task=task,268            output=output,269            criteria=[270                "Correctness",271                "Completeness",272                "Clarity",273            ]274        )275 276        if critique["passes_all"]:277            return output278 279        # Refine based on critique280        output = generator.refine(281            task=task,282            previous_output=output,283            critique=critique["feedback"],284        )285 286    return output  # Best effort after max iterations287"""288 289## LangGraph Reflection290"""291from langgraph.graph import StateGraph292 293def build_reflection_graph():294    graph = StateGraph(ReflectionState)295 296    # Nodes297    graph.add_node("generate", generate_node)298    graph.add_node("reflect", reflect_node)299    graph.add_node("output", output_node)300 301    # Edges302    graph.add_edge("generate", "reflect")303    graph.add_conditional_edges(304        "reflect",305        should_continue,306        {307            "continue": "generate",  # Loop back308            "end": "output",309        }310    )311 312    return graph.compile()313 314def should_continue(state):315    if state["iteration"] >= 3:316        return "end"317    if state["score"] >= 0.9:318        return "end"319    return "continue"320"""321 322## Separate Evaluator (More Robust)323"""324# Use different model for evaluation to avoid self-bias325generator = ChatOpenAI(model="gpt-4o")326evaluator = ChatOpenAI(model="gpt-4o-mini")  # Different perspective327 328# Or use specialized evaluators329from langchain.evaluation import load_evaluator330evaluator = load_evaluator("criteria", criteria="correctness")331"""332 333### Guardrailed Autonomy334 335Constrained agents with safety boundaries336 337**When to use**: Production systems, critical operations338 339# GUARDRAILED AUTONOMY:340 341"""342Production agents need multiple safety layers:3431. Input validation3442. Action constraints3453. Output validation3464. Cost limits3475. Human escalation3486. Rollback capability349"""350 351## Multi-Layer Guardrails352"""353class GuardedAgent:354    def __init__(self, agent, config):355        self.agent = agent356        self.max_cost = config.get("max_cost_usd", 1.0)357        self.max_steps = config.get("max_steps", 10)358        self.allowed_actions = config.get("allowed_actions", [])359        self.require_approval = config.get("require_approval", [])360 361    async def execute(self, goal):362        total_cost = 0363        steps = 0364 365        while steps < self.max_steps:366            # Get next action367            action = await self.agent.plan_next(goal)368 369            # Validate action is allowed370            if action.name not in self.allowed_actions:371                raise ActionNotAllowedError(action.name)372 373            # Check if approval needed374            if action.name in self.require_approval:375                approved = await self.request_human_approval(action)376                if not approved:377                    return {"status": "rejected", "action": action}378 379            # Estimate cost380            estimated_cost = self.estimate_cost(action)381            if total_cost + estimated_cost > self.max_cost:382                raise CostLimitExceededError(total_cost)383 384            # Execute with rollback capability385            checkpoint = await self.save_checkpoint()386            try:387                result = await self.agent.execute(action)388                total_cost += self.actual_cost(action)389                steps += 1390            except Exception as e:391                await self.rollback_to(checkpoint)392                raise393 394            if result.is_complete:395                break396 397        return {"status": "complete", "total_cost": total_cost}398"""399 400## Least Privilege Principle401"""402# Define minimal permissions per task type403TASK_PERMISSIONS = {404    "research": ["web_search", "read_file"],405    "coding": ["read_file", "write_file", "run_tests"],406    "admin": ["all"],  # Rarely grant this407}408 409def create_scoped_agent(task_type):410    allowed = TASK_PERMISSIONS.get(task_type, [])411    tools = [t for t in ALL_TOOLS if t.name in allowed]412    return Agent(tools=tools)413"""414 415## Cost Control416"""417# Context length grows quadratically in cost418# Double context = 4x cost419 420def trim_context(messages, max_tokens=4000):421    # Keep system message and recent messages422    system = messages[0]423    recent = messages[-10:]424 425    # Summarize middle if needed426    if len(messages) > 11:427        middle = messages[1:-10]428        summary = summarize(middle)429        return [system, summary] + recent430 431    return messages432"""433 434### Durable Execution Pattern435 436Agents that survive failures and resume437 438**When to use**: Long-running tasks, production systems, multi-day processes439 440# DURABLE EXECUTION:441 442"""443Production agents must:444- Survive server restarts445- Resume from exact point of failure446- Handle hours/days of runtime447- Allow human intervention mid-process448 449LangGraph 1.0 provides this natively.450"""451 452## LangGraph Checkpointing453"""454from langgraph.checkpoint.postgres import PostgresSaver455from langgraph.graph import StateGraph456 457# Production checkpointer (not MemorySaver!)458checkpointer = PostgresSaver.from_conn_string(459    os.environ["POSTGRES_URL"]460)461 462# Build graph with checkpointing463graph = StateGraph(AgentState)464# ... add nodes and edges ...465 466agent = graph.compile(checkpointer=checkpointer)467 468# Each invocation saves state469config = {"configurable": {"thread_id": "long-task-789"}}470 471# Start task472agent.invoke({"goal": complex_goal}, config)473 474# If server dies, resume later:475state = agent.get_state(config)476if not state.is_complete:477    agent.invoke(None, config)  # Continues from checkpoint478"""479 480## Human-in-the-Loop Interrupts481"""482# Pause at specific nodes483agent = graph.compile(484    checkpointer=checkpointer,485    interrupt_before=["critical_action"],  # Pause before486    interrupt_after=["validation"],        # Pause after487)488 489# First invocation pauses at interrupt490result = agent.invoke({"goal": goal}, config)491 492# Human reviews state493state = agent.get_state(config)494if human_approves(state):495    # Continue from pause point496    agent.invoke(None, config)497else:498    # Modify state and continue499    agent.update_state(config, {"approved": False})500    agent.invoke(None, config)501"""502 503## Time-Travel Debugging504"""505# LangGraph stores full history506history = list(agent.get_state_history(config))507 508# Go back to any previous state509past_state = history[5]510agent.update_state(config, past_state.values)511 512# Replay from that point with modifications513agent.invoke(None, config)514"""515 516## Sharp Edges517 518### Error Probability Compounds Exponentially519 520Severity: CRITICAL521 522Situation: Building multi-step autonomous agents523 524Symptoms:525Agent works in demos but fails in production. Simple tasks succeed,526complex tasks fail mysteriously. Success rate drops dramatically527as task complexity increases. Users lose trust.528 529Why this breaks:530Each step has independent failure probability. A 95% success rate531per step sounds great until you realize:532- 5 steps: 77% success (0.95^5)533- 10 steps: 60% success (0.95^10)534- 20 steps: 36% success (0.95^20)535 536This is the fundamental limit of autonomous agents. Every additional537step multiplies failure probability.538 539Recommended fix:540 541## Reduce step count542# Combine steps where possible543# Prefer fewer, more capable steps over many small ones544 545## Increase per-step reliability546# Use structured outputs (JSON schemas)547# Add validation at each step548# Use better models for critical steps549 550## Design for failure551class RobustAgent:552    def execute_with_retry(self, step, max_retries=3):553        for attempt in range(max_retries):554            try:555                result = step.execute()556                if self.validate(result):557                    return result558            except Exception as e:559                if attempt == max_retries - 1:560                    raise561                self.log_retry(step, attempt, e)562 563## Break into checkpointed segments564# Human review at each segment565# Resume from last good checkpoint566 567### API Costs Explode with Context Growth568 569Severity: CRITICAL570 571Situation: Running agents with growing conversation context572 573Symptoms:574$47 to close a single support ticket. Thousands in surprise API bills.575Agents getting slower as they run longer. Token counts exceeding576model limits.577 578Why this breaks:579Transformer costs scale quadratically with context length. Double580the context, quadruple the compute. A long-running agent that581re-sends its full conversation each turn can burn money exponentially.582 583Most agents append to context without trimming. Context grows:584- Turn 1: 500 tokens → $0.01585- Turn 10: 5000 tokens → $0.10586- Turn 50: 25000 tokens → $0.50587- Turn 100: 50000 tokens → $1.00+ per message588 589Recommended fix:590 591## Set hard cost limits592class CostLimitedAgent:593    MAX_COST_PER_TASK = 1.00  # USD594 595    def __init__(self):596        self.total_cost = 0597 598    def before_call(self, estimated_tokens):599        estimated_cost = self.estimate_cost(estimated_tokens)600        if self.total_cost + estimated_cost > self.MAX_COST_PER_TASK:601            raise CostLimitExceeded(602                f"Would exceed ${self.MAX_COST_PER_TASK} limit"603            )604 605    def after_call(self, response):606        self.total_cost += self.calculate_actual_cost(response)607 608## Trim context aggressively609def trim_context(messages, max_tokens=4000):610    # Keep: system prompt + last N messages611    # Summarize: everything in between612    if count_tokens(messages) <= max_tokens:613        return messages614 615    system = messages[0]616    recent = messages[-5:]617    middle = messages[1:-5]618 619    if middle:620        summary = summarize(middle)  # Compress history621        return [system, summary] + recent622 623    return [system] + recent624 625## Use streaming to track costs in real-time626## Alert at 50% of budget, halt at 90%627 628### Demo Works But Production Fails629 630Severity: CRITICAL631 632Situation: Moving from prototype to production633 634Symptoms:635Impressive demo to stakeholders. Months of failure in production.636Works for the founder's use case, fails for real users. Edge cases637overwhelm the system.638 639Why this breaks:640Demos show the happy path with curated inputs. Production means:641- Unexpected inputs (typos, ambiguity, adversarial)642- Scale (1000 users, not 3)643- Reliability (99.9% uptime, not "usually works")644- Edge cases (the 1% that breaks everything)645 646The methodology is questionable, but the core problem is real.647The gap between a working demo and a reliable production system648is where projects die.649 650Recommended fix:651 652## Test at scale before production653# Run 1000+ test cases, not 10654# Measure P95/P99 success rate, not average655# Include adversarial inputs656 657## Build observability first658import structlog659logger = structlog.get_logger()660 661class ObservableAgent:662    def execute(self, task):663        with logger.bind(task_id=task.id):664            logger.info("task_started")665            try:666                result = self._execute(task)667                logger.info("task_completed", result=result)668                return result669            except Exception as e:670                logger.error("task_failed", error=str(e))671                raise672 673## Have escape hatches674# Human takeover when confidence < threshold675# Graceful degradation to simpler behavior676# "I don't know" is a valid response677 678## Deploy incrementally679# 1% of traffic, then 10%, then 50%680# Monitor error rates at each stage681 682### Agent Fabricates Data When Stuck683 684Severity: HIGH685 686Situation: Agent can't complete task with available information687 688Symptoms:689Agent invents plausible-looking data. Fake restaurant names on expense690reports. Made-up statistics in reports. Confident answers that are691completely wrong.692 693Why this breaks:694LLMs are trained to be helpful and produce plausible outputs. When695stuck, they don't say "I can't do this" - they fabricate. Autonomous696agents compound this by acting on fabricated data without human review.697 698The agent that fabricated expense entries was trying to meet its goal699(complete the expense report). It "solved" the problem by inventing data.700 701Recommended fix:702 703## Validate against ground truth704def validate_expense(expense):705    # Cross-check with external sources706    if expense.restaurant:707        if not verify_restaurant_exists(expense.restaurant):708            raise ValidationError("Restaurant not found")709 710    # Check for suspicious patterns711    if expense.amount == round(expense.amount, -1):712        flag_for_review("Suspiciously round amount")713 714## Require evidence715system_prompt = '''716For every factual claim, cite the specific tool output that717supports it. If you cannot find supporting evidence, say718"I could not verify this" rather than guessing.719'''720 721## Use structured outputs722from pydantic import BaseModel723 724class VerifiedClaim(BaseModel):725    claim: str726    source: str  # Must reference tool output727    confidence: float728 729## Detect uncertainty730# Train to output confidence scores731# Flag low-confidence outputs for human review732# Never auto-execute on uncertain data733 734### Integration Is Where Agents Die735 736Severity: HIGH737 738Situation: Connecting agent to external systems739 740Symptoms:741Works with mock APIs, fails with real ones. Rate limits cause crashes.742Auth tokens expire mid-task. Data format mismatches. Partial failures743leave systems in inconsistent state.744 745Why this breaks:746The companies promising "autonomous agents that integrate with your747entire tech stack" haven't built production systems at scale.748Real integrations have:749- Rate limits (429 errors mid-task)750- Auth complexity (OAuth refresh, token expiry)751- Data format variations (API v1 vs v2)752- Partial failures (webhook received, processing failed)753- Eventual consistency (data not immediately available)754 755Recommended fix:756 757## Build robust API clients758from tenacity import retry, stop_after_attempt, wait_exponential759 760class RobustAPIClient:761    @retry(762        stop=stop_after_attempt(3),763        wait=wait_exponential(multiplier=1, min=4, max=60)764    )765    async def call(self, endpoint, data):766        response = await self.client.post(endpoint, json=data)767        if response.status_code == 429:768            retry_after = response.headers.get("Retry-After", 60)769            await asyncio.sleep(int(retry_after))770            raise RateLimitError()771        return response772 773## Handle auth lifecycle774class TokenManager:775    def __init__(self):776        self.token = None777        self.expires_at = None778 779    async def get_token(self):780        if self.is_expired():781            self.token = await self.refresh_token()782        return self.token783 784    def is_expired(self):785        buffer = timedelta(minutes=5)  # Refresh early786        return datetime.now() > (self.expires_at - buffer)787 788## Use idempotency keys789# Every external action should be idempotent790# If agent retries, external system handles duplicate791 792## Design for partial failure793# Each step is independently recoverable794# Checkpoint before external calls795# Rollback capability for each integration796 797### Agent Takes Dangerous Actions798 799Severity: HIGH800 801Situation: Agent with broad permissions802 803Symptoms:804Agent deletes production data. Sends emails to wrong recipients.805Makes purchases without approval. Modifies settings it shouldn't.806Actions that can't be undone.807 808Why this breaks:809Agents optimize for their goal. Without guardrails, they'll take the810shortest path - even if that path is destructive. An agent told to811"clean up the database" might interpret that as "delete everything."812 813Broad permissions + autonomy + goal optimization = danger.814 815Recommended fix:816 817## Least privilege principle818PERMISSIONS = {819    "research_agent": ["read_web", "read_docs"],820    "code_agent": ["read_file", "write_file", "run_tests"],821    "email_agent": ["read_email", "draft_email"],  # NOT send822    "admin_agent": ["all"],  # Rarely used823}824 825## Separate read/write permissions826# Agent can read anything827# Write requires explicit approval828 829## Dangerous actions require confirmation830DANGEROUS_ACTIONS = [831    "delete_*",832    "send_email",833    "transfer_money",834    "modify_production",835    "revoke_access",836]837 838async def execute_action(action):839    if matches_dangerous_pattern(action):840        approval = await request_human_approval(action)841        if not approval:842            return ActionRejected(action)843    return await actually_execute(action)844 845## Dry-run mode for testing846# Agent describes what it would do847# Human approves the plan848# Then agent executes849 850## Audit logging for everything851# Every action logged with context852# Who authorized it853# What changed854# How to reverse it855 856### Agent Runs Out of Context Window857 858Severity: MEDIUM859 860Situation: Long-running agent tasks861 862Symptoms:863Agent forgets earlier instructions. Contradicts itself. Loses track864of the goal. Starts repeating itself. Model errors about token limits.865 866Why this breaks:867Every message, observation, and thought consumes context. Long tasks868exhaust the window. When context is truncated:869- System prompt gets dropped870- Early important context lost871- Agent loses coherence872 873Recommended fix:874 875## Track context usage876class ContextManager:877    def __init__(self, max_tokens=100000):878        self.max_tokens = max_tokens879        self.messages = []880 881    def add(self, message):882        self.messages.append(message)883        self.maybe_compact()884 885    def maybe_compact(self):886        if self.token_count() > self.max_tokens * 0.8:887            self.compact()888 889    def compact(self):890        # Always keep: system prompt891        system = self.messages[0]892 893        # Always keep: last N messages894        recent = self.messages[-10:]895 896        # Summarize: everything else897        middle = self.messages[1:-10]898        if middle:899            summary = summarize_messages(middle)900            self.messages = [system, summary] + recent901 902## Use external memory903# Don't keep everything in context904# Store in vector DB, retrieve when needed905# See agent-memory-systems skill906 907## Hierarchical summarization908# Recent: full detail909# Medium: key points910# Old: compressed summary911 912### Can't Debug What You Can't See913 914Severity: MEDIUM915 916Situation: Agent fails mysteriously917 918Symptoms:919"It just didn't work." No idea why agent failed. Can't reproduce920issues. Users report problems you can't explain. Debugging is921guesswork.922 923Why this breaks:924Agents make dozens of internal decisions. Without visibility into925each step, you're blind to failure modes. Production debugging926without traces is impossible.927 928Recommended fix:929 930## Structured logging931import structlog932 933logger = structlog.get_logger()934 935class TracedAgent:936    def think(self, context):937        with logger.bind(step="think"):938            thought = self.llm.generate(context)939            logger.info("thought_generated",940                thought=thought,941                tokens=count_tokens(thought)942            )943            return thought944 945    def act(self, action):946        with logger.bind(step="act", action=action.name):947            logger.info("action_started")948            try:949                result = action.execute()950                logger.info("action_completed", result=result)951                return result952            except Exception as e:953                logger.error("action_failed", error=str(e))954                raise955 956## Use LangSmith or similar957from langsmith import trace958 959@trace960def agent_step(state):961    # Automatically traced with inputs/outputs962    return next_state963 964## Save full traces965# Every step, every decision966# Inputs and outputs967# Latency at each step968# Token usage969 970## Validation Checks971 972### Agent Loop Without Step Limit973 974Severity: ERROR975 976Autonomous agents must have maximum step limits977 978Message: Agent loop without step limit. Add max_steps to prevent infinite loops.979 980### No Cost Tracking or Limits981 982Severity: ERROR983 984Agents should track and limit API costs985 986Message: Agent uses LLM without cost tracking. Add cost limits to prevent runaway spending.987 988### Agent Without Timeout989 990Severity: WARNING991 992Long-running agents need timeouts993 994Message: Agent invocation without timeout. Add timeout to prevent hung tasks.995 996### MemorySaver Used in Production997 998Severity: ERROR999 1000MemorySaver is for development only1001 1002Message: MemorySaver is not persistent. Use PostgresSaver or SqliteSaver for production.1003 1004### Long-Running Agent Without Checkpointing1005 1006Severity: WARNING1007 1008Agents that run multiple steps need checkpointing1009 1010Message: Multi-step agent without checkpointing. Add checkpointer for durability.1011 1012### Agent Without Thread ID1013 1014Severity: WARNING1015 1016Checkpointed agents need unique thread IDs1017 1018Message: Agent invocation without thread_id. State won't persist correctly.1019 1020### Using Agent Output Without Validation1021 1022Severity: WARNING1023 1024Agent outputs should be validated before use1025 1026Message: Agent output used without validation. Validate before acting on results.1027 1028### Agent Without Structured Output1029 1030Severity: INFO1031 1032Structured outputs are more reliable1033 1034Message: Consider using structured outputs (Pydantic) for more reliable parsing.1035 1036### Agent Without Error Recovery1037 1038Severity: WARNING1039 1040Agents should handle and recover from errors1041 1042Message: Agent call without error handling. Add try/catch or error handler.1043 1044### Destructive Actions Without Rollback1045 1046Severity: WARNING1047 1048Actions that modify state should be reversible1049 1050Message: Destructive action without rollback capability. Save state before modification.1051 1052## Collaboration1053 1054### Delegation Triggers1055 1056- user needs multi-agent coordination -> multi-agent-orchestration (Multiple agents working together)1057- user needs to test/evaluate agent -> agent-evaluation (Benchmarking and testing)1058- user needs tools for agent -> agent-tool-builder (Tool design and implementation)1059- user needs persistent memory -> agent-memory-systems (Long-term memory architecture)1060- user needs workflow automation -> workflow-automation (When agent is overkill for the task)1061- user needs computer control -> computer-use-agents (GUI automation, screen interaction)1062 1063## Related Skills1064 1065Works well with: `agent-tool-builder`, `agent-memory-systems`, `multi-agent-orchestration`, `agent-evaluation`1066 1067## When to Use1068- User mentions or implies: autonomous agent1069- User mentions or implies: autogpt1070- User mentions or implies: babyagi1071- User mentions or implies: self-prompting1072- User mentions or implies: goal decomposition1073- User mentions or implies: react pattern1074- User mentions or implies: agent loop1075- User mentions or implies: self-correcting agent1076- User mentions or implies: reflection agent1077- User mentions or implies: langgraph1078- User mentions or implies: agentic ai1079- User mentions or implies: agent planning1080 1081## Limitations1082- Use this skill only when the task clearly matches the scope described above.1083- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.1084- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
Related skills
3d Web Experience

Install 3d Web Experience skill for Claude Code from sickn33/antigravity-awesome-skills.
Agent Memory Mcp

Install Agent Memory Mcp skill for Claude Code from sickn33/antigravity-awesome-skills.
Agent Memory Systems

Install Agent Memory Systems skill for Claude Code from sickn33/antigravity-awesome-skills.