Install
Terminal · npx$
npx skills add https://github.com/coreyhaines31/marketingskills --skill analytics-trackingWorks with Paperclip
How Codex Autoresearch Loop fits into a Paperclip company.
Codex Autoresearch Loop drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
S
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore packSource file
SKILL.md452 linesExpandCollapse
---name: codex-autoresearch-loopdescription: Self-directed iterative research skill for Codex that continuously cycles through modify, verify, retain or discard, and repeat until a measurable goal is reached.triggers: - run autoresearch on my codebase - iterate autonomously until tests pass - improve my code overnight without interruption - set up a modify verify loop - keep improving coverage until I say stop - eliminate all type errors automatically - run codex autoresearch skill - continuously optimize this metric--- # Codex Autoresearch > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. Codex Autoresearch is a Codex skill that runs an autonomous modify→verify→keep/revert loop on your codebase. You describe a measurable goal in one sentence; Codex confirms the plan, then iterates unattended — every improvement stacks in git, every failure reverts automatically — until interrupted or a cap is reached. Inspired by Karpathy's autoresearch concept, generalized beyond ML training to any software metric. --- ## Installation **Option A — manual copy into your project:** ```bashgit clone https://github.com/leo-lilinxiao/codex-autoresearch.gitcp -r codex-autoresearch your-project/.agents/skills/codex-autoresearch``` **Option B — Codex skill installer:** ```text$skill-installer install https://github.com/leo-lilinxiao/codex-autoresearch``` The skill lives at `.agents/skills/codex-autoresearch/` inside your project. No config file is required before first use. --- ## How to Activate Open Codex in your project directory and prefix your goal with `$codex-autoresearch`: ```text$codex-autoresearchI want to get rid of all `any` types in my TypeScript code``` Codex will:1. Scan the repo and infer scope, metric, verify command, and guard command.2. Present a confirmation summary — reply `go` (or correct anything).3. Run the loop unattended until you interrupt it or the goal is met. You never write config. Codex infers everything. --- ## Confirmation Flow Before the loop starts Codex always shows what it found and asks you to confirm. Example exchange: ```Codex: I found 47 `any` occurrences across src/**/*.ts. Confirmed: - Target: eliminate `any` types in src/**/*.ts - Metric: `any` count (current: 47), direction: lower - Verify: grep + tsc --noEmit as guard Need to confirm: - Run until all gone, or cap at N iterations? Reply "go" to start, or tell me what to change. You: Go, run overnight. Codex: Starting — baseline: 47. Iterating until interrupted.``` Up to five confirmation rounds are possible. After that, Codex proceeds. --- ## The Loop (internals) ```PHASE 0: Probe environment (CPU/GPU/RAM/toolchains), check for session resumePHASE 1: Read context + lessons file from prior run (if any) LOOP (forever or N times): 1. Review current state, git history, results log, lessons 2. Pick ONE hypothesis (apply perspectives, filter by environment) -- or N hypotheses if parallel mode is active 3. Make ONE atomic change 4. git commit (before verification) 5. Run verify command → did the target metric improve? Run guard command → did anything else break? 6. Improved → keep (extract lesson) Worse → approved rollback strategy (git revert) Crashed → fix or skip 7. Log the result to results log 8. Health check (disk, git, verify health) 9. If 3+ discards → REFINE; 5+ → PIVOT; 2 PIVOTs → web search 10. Repeat. Never stop. Never ask.``` The loop runs **unbounded** unless you say `Iterations: N` during confirmation. --- ## Dual-Gate Verification Two commands serve distinct purposes: | Gate | Purpose | Fails means ||------|---------|-------------|| **Verify** | Did the target metric improve? | Change discarded, reverted || **Guard** | Did anything else break? | Change reworked (up to 2 attempts), then reverted | Guard files are **never modified** by the loop. Example verify + guard pair for a Python coverage run: ```textVerify: pytest --cov=src --cov-report=term 2>&1 | grep TOTAL | awk '{print $NF}'Guard: python -m mypy src --ignore-missing-imports``` Example for TypeScript type cleanup: ```textVerify: grep -r "any" src --include="*.ts" | wc -lGuard: npx tsc --noEmit``` --- ## Modes Codex maps your sentence to one of seven modes automatically — you never pick a mode explicitly. ### `loop` — iterate toward a measurable target (default) ```text$codex-autoresearchImprove test coverage in src/ to at least 80%``` ```text$codex-autoresearchReduce bundle size — it's currently 2.3 MB, get it under 1 MB``` ### `plan` — turn a vague goal into a validated loop config ```text$codex-autoresearchI want to make our API faster but I don't know where to start``` Codex will interview you (p95 latency vs throughput? which endpoint?) and produce a ready-to-run loop config. ### `fix` — repair errors until count reaches zero ```text$codex-autoresearchpytest is failing, 12 tests broken after the refactor — fix them all``` ### `debug` — evidence-driven root-cause hunting ```text$codex-autoresearchOur API returns 503 randomly under load, no idea why``` Each iteration tests one falsifiable hypothesis. Codex presents evidence, not guesses. ### `security` — read-only STRIDE + OWASP audit ```text$codex-autoresearchIs this code secure?``` ### `ship` — readiness verification and release gating ```text$codex-autoresearchShip it``` ### `exec` — one-shot execution with no loop ```text$codex-autoresearchRun the benchmark suite and summarize results``` --- ## Inline Configuration (optional) You can override defaults inline during the confirmation step — no file edits needed: | Phrase | Effect ||--------|--------|| `Iterations: 20` | Cap the loop at 20 iterations || `Parallel: 3` | Test 3 hypotheses concurrently per round || `Guard: npm test` | Override the inferred guard command || `Verify: <command>` | Override the inferred verify command || `Scope: src/api/` | Restrict changes to a subdirectory | Example during confirmation: ```You: Go. Iterations: 30, Guard: npm test, Scope: src/api/``` --- ## Cross-Run Learning At the end of each iteration Codex writes a structured lesson to `.agents/skills/codex-autoresearch/lessons.md`: ```Iteration 7 — KEPTHypothesis: replace explicit `any` with inferred generic in src/utils/mapper.tsChange: added <T extends Record<string, unknown>> to mapKeys()Result: any count 31 → 29Lesson: Generic constraints on utility functions eliminate clusters of `any` downstream.``` On session resume Codex reads this file first. Each new run benefits from prior runs. **To resume an interrupted run:** ```text$codex-autoresearchResume``` Codex re-reads the lessons file, checks git state, re-establishes the baseline, and continues. --- ## Parallel Experiments Request parallel mode during confirmation or at any time: ```textYou: Go, parallel 4``` Codex runs four hypotheses concurrently, keeps the best result, discards the rest. Useful when hypothesis space is large. --- ## Pivot Protocol If the loop stalls, escalation happens automatically: | Consecutive discards | Action ||---------------------|--------|| 3 | **REFINE** — narrow hypothesis, try smaller atomic changes || 5 | **PIVOT** — change strategy entirely || 2 PIVOTs | **Web search** — Codex fetches external references to unstick itself | You are never asked for permission during escalation. The loop continues. --- ## Real Code Examples ### Example 1 — TypeScript `any` elimination (Python verify script) If you want a custom verify script instead of a one-liner: ```python# scripts/count_any.pyimport subprocess, sys result = subprocess.run( ["grep", "-r", "--include=*.ts", r"\bany\b", "src/"], capture_output=True, text=True)count = len(result.stdout.strip().splitlines())print(count)sys.exit(0) # always exit 0; the number is what matters``` Tell Codex during confirmation: ```textVerify: python scripts/count_any.pyGuard: npx tsc --noEmit``` ### Example 2 — pytest coverage loop (Python) ```python# scripts/coverage_pct.pyimport subprocess, re, sys out = subprocess.check_output( ["pytest", "--cov=src", "--cov-report=term", "-q"], stderr=subprocess.STDOUT, text=True)match = re.search(r"TOTAL\s+\d+\s+\d+\s+(\d+)%", out)if match: print(int(match.group(1))) sys.exit(0)print(0)sys.exit(0)``` ```text$codex-autoresearchImprove test coverage — target 85% Verify: python scripts/coverage_pct.pyGuard: python -m mypy srcDirection: higherTarget: 85Iterations: 50``` ### Example 3 — bundle size loop (Node.js project) ```bash# scripts/bundle_size.sh#!/usr/bin/env bashnpm run build --silent 2>/dev/nulldu -k dist/bundle.js | awk '{print $1}'``` ```text$codex-autoresearchReduce our JS bundle size, currently ~2300 KB, target under 900 KB Verify: bash scripts/bundle_size.shGuard: npm testDirection: lowerTarget: 900``` ### Example 4 — lint warning count (any language) ```bash# scripts/lint_count.sh#!/usr/bin/env bashnpx eslint src/ --format json 2>/dev/null \ | python3 -c "import sys,json; d=json.load(sys.stdin); print(sum(len(f['messages']) for f in d))"``` ```text$codex-autoresearchGet our ESLint warning count to zero Verify: bash scripts/lint_count.shDirection: lowerTarget: 0``` --- ## Unattended Runs For overnight or long runs, ensure Codex CLI approval settings do not interrupt `git commit` or `git revert` commands. The simplest option is to run in a disposable or sandboxed repo clone: ```bashgit clone . /tmp/autoresearch-sandboxcd /tmp/autoresearch-sandbox# launch Codex here with full permissions``` Results accumulate in git history. Pull the winning commits back to your main repo when done: ```bash# in your main repogit fetch /tmp/autoresearch-sandbox maingit cherry-pick <winning-commit-sha>``` --- ## Session Artifacts | File | Contents ||------|----------|| `.agents/skills/codex-autoresearch/lessons.md` | Structured lessons from every iteration || `.agents/skills/codex-autoresearch/results.log` | Full per-iteration log (metric value, kept/reverted, elapsed) || `.agents/skills/codex-autoresearch/session.json` | Current session state for resume | These files persist across Codex sessions. Delete them to start fresh. --- ## Troubleshooting **Loop reverts every change:**- Verify command may be returning a non-numeric value. Test it manually: `bash -c "<your verify command>"` should print a single number.- Metric direction may be wrong. Confirm `Direction: lower` or `Direction: higher` during setup. **Guard fires on unrelated files:**- Narrow scope: `Scope: src/specific-module/`- Or tell Codex explicitly: `Do not touch tests/` during confirmation. **Session resume picks up wrong baseline:**- Delete `session.json` to force a fresh baseline: `rm .agents/skills/codex-autoresearch/session.json` **Parallel mode produces merge conflicts:**- Codex handles this internally via the pivot protocol, but if it gets stuck, reduce parallelism: `Parallel: 2` **Codex asks questions mid-loop:**- This means a guard crash produced ambiguous output. Pre-empt it by specifying `Guard: <command> || true` if guard failures should be non-fatal, or by giving Codex fuller sandbox permissions so it can run git commands freely. **Loop hits PIVOT but makes no progress:**- Supply a seed hypothesis during confirmation: `Hint: try tree-shaking unused imports first`- Or run `plan` mode first to produce a richer hypothesis list before switching to `loop`. --- ## Quick Reference ```text# Start a loop$codex-autoresearch<your goal in one sentence> # Resume interrupted run$codex-autoresearchResume # Bounded run$codex-autoresearch<goal> — Iterations: 25 # Parallel hypotheses$codex-autoresearch<goal> — Parallel: 4 # Force a mode$codex-autoresearch fixpytest has 8 failures, repair them # Read-only audit$codex-autoresearch securityAudit src/api/ for injection vulnerabilities```Related skills
Agency Agents Ai Specialists
Install Agency Agents Ai Specialists skill for Claude Code from aradotso/trending-skills.
Agent Browser Automation
Install Agent Browser Automation skill for Claude Code from aradotso/trending-skills.
Antigravity Manager
Install Antigravity Manager skill for Claude Code from aradotso/trending-skills.