Name: Codex Autoresearch Loop
Author: Aradotso
Install
Terminal · npx
$npx skills add https://github.com/coreyhaines31/marketingskills --skill analytics-tracking
Works with Paperclip
How Codex Autoresearch Loop fits into a Paperclip company.

Codex Autoresearch Loop drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md452 linesmarkdown
Expand
1---2name: codex-autoresearch-loop3description: Self-directed iterative research skill for Codex that continuously cycles through modify, verify, retain or discard, and repeat until a measurable goal is reached.4triggers:5  - run autoresearch on my codebase6  - iterate autonomously until tests pass7  - improve my code overnight without interruption8  - set up a modify verify loop9  - keep improving coverage until I say stop10  - eliminate all type errors automatically11  - run codex autoresearch skill12  - continuously optimize this metric13---14 15# Codex Autoresearch16 17> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.18 19Codex Autoresearch is a Codex skill that runs an autonomous modify→verify→keep/revert loop on your codebase. You describe a measurable goal in one sentence; Codex confirms the plan, then iterates unattended — every improvement stacks in git, every failure reverts automatically — until interrupted or a cap is reached. Inspired by Karpathy's autoresearch concept, generalized beyond ML training to any software metric.20 21---22 23## Installation24 25**Option A — manual copy into your project:**26 27```bash28git clone https://github.com/leo-lilinxiao/codex-autoresearch.git29cp -r codex-autoresearch your-project/.agents/skills/codex-autoresearch30```31 32**Option B — Codex skill installer:**33 34```text35$skill-installer install https://github.com/leo-lilinxiao/codex-autoresearch36```37 38The skill lives at `.agents/skills/codex-autoresearch/` inside your project. No config file is required before first use.39 40---41 42## How to Activate43 44Open Codex in your project directory and prefix your goal with `$codex-autoresearch`:45 46```text47$codex-autoresearch48I want to get rid of all `any` types in my TypeScript code49```50 51Codex will:521. Scan the repo and infer scope, metric, verify command, and guard command.532. Present a confirmation summary — reply `go` (or correct anything).543. Run the loop unattended until you interrupt it or the goal is met.55 56You never write config. Codex infers everything.57 58---59 60## Confirmation Flow61 62Before the loop starts Codex always shows what it found and asks you to confirm. Example exchange:63 64```65Codex: I found 47 `any` occurrences across src/**/*.ts.66 67       Confirmed:68       - Target: eliminate `any` types in src/**/*.ts69       - Metric: `any` count (current: 47), direction: lower70       - Verify: grep + tsc --noEmit as guard71 72       Need to confirm:73       - Run until all gone, or cap at N iterations?74 75       Reply "go" to start, or tell me what to change.76 77You:   Go, run overnight.78 79Codex: Starting — baseline: 47. Iterating until interrupted.80```81 82Up to five confirmation rounds are possible. After that, Codex proceeds.83 84---85 86## The Loop (internals)87 88```89PHASE 0: Probe environment (CPU/GPU/RAM/toolchains), check for session resume90PHASE 1: Read context + lessons file from prior run (if any)91 92LOOP (forever or N times):93  1. Review current state, git history, results log, lessons94  2. Pick ONE hypothesis (apply perspectives, filter by environment)95     -- or N hypotheses if parallel mode is active96  3. Make ONE atomic change97  4. git commit (before verification)98  5. Run verify command  →  did the target metric improve?99     Run guard command   →  did anything else break?100  6. Improved → keep (extract lesson)101     Worse    → approved rollback strategy (git revert)102     Crashed  → fix or skip103  7. Log the result to results log104  8. Health check (disk, git, verify health)105  9. If 3+ discards → REFINE; 5+ → PIVOT; 2 PIVOTs → web search106 10. Repeat. Never stop. Never ask.107```108 109The loop runs **unbounded** unless you say `Iterations: N` during confirmation.110 111---112 113## Dual-Gate Verification114 115Two commands serve distinct purposes:116 117| Gate | Purpose | Fails means |118|------|---------|-------------|119| **Verify** | Did the target metric improve? | Change discarded, reverted |120| **Guard** | Did anything else break? | Change reworked (up to 2 attempts), then reverted |121 122Guard files are **never modified** by the loop.123 124Example verify + guard pair for a Python coverage run:125 126```text127Verify: pytest --cov=src --cov-report=term 2>&1 | grep TOTAL | awk '{print $NF}'128Guard:  python -m mypy src --ignore-missing-imports129```130 131Example for TypeScript type cleanup:132 133```text134Verify: grep -r "any" src --include="*.ts" | wc -l135Guard:  npx tsc --noEmit136```137 138---139 140## Modes141 142Codex maps your sentence to one of seven modes automatically — you never pick a mode explicitly.143 144### `loop` — iterate toward a measurable target (default)145 146```text147$codex-autoresearch148Improve test coverage in src/ to at least 80%149```150 151```text152$codex-autoresearch153Reduce bundle size — it's currently 2.3 MB, get it under 1 MB154```155 156### `plan` — turn a vague goal into a validated loop config157 158```text159$codex-autoresearch160I want to make our API faster but I don't know where to start161```162 163Codex will interview you (p95 latency vs throughput? which endpoint?) and produce a ready-to-run loop config.164 165### `fix` — repair errors until count reaches zero166 167```text168$codex-autoresearch169pytest is failing, 12 tests broken after the refactor — fix them all170```171 172### `debug` — evidence-driven root-cause hunting173 174```text175$codex-autoresearch176Our API returns 503 randomly under load, no idea why177```178 179Each iteration tests one falsifiable hypothesis. Codex presents evidence, not guesses.180 181### `security` — read-only STRIDE + OWASP audit182 183```text184$codex-autoresearch185Is this code secure?186```187 188### `ship` — readiness verification and release gating189 190```text191$codex-autoresearch192Ship it193```194 195### `exec` — one-shot execution with no loop196 197```text198$codex-autoresearch199Run the benchmark suite and summarize results200```201 202---203 204## Inline Configuration (optional)205 206You can override defaults inline during the confirmation step — no file edits needed:207 208| Phrase | Effect |209|--------|--------|210| `Iterations: 20` | Cap the loop at 20 iterations |211| `Parallel: 3` | Test 3 hypotheses concurrently per round |212| `Guard: npm test` | Override the inferred guard command |213| `Verify: <command>` | Override the inferred verify command |214| `Scope: src/api/` | Restrict changes to a subdirectory |215 216Example during confirmation:217 218```219You:   Go. Iterations: 30, Guard: npm test, Scope: src/api/220```221 222---223 224## Cross-Run Learning225 226At the end of each iteration Codex writes a structured lesson to `.agents/skills/codex-autoresearch/lessons.md`:227 228```229Iteration 7 — KEPT230Hypothesis: replace explicit `any` with inferred generic in src/utils/mapper.ts231Change: added <T extends Record<string, unknown>> to mapKeys()232Result: any count 31 → 29233Lesson: Generic constraints on utility functions eliminate clusters of `any` downstream.234```235 236On session resume Codex reads this file first. Each new run benefits from prior runs.237 238**To resume an interrupted run:**239 240```text241$codex-autoresearch242Resume243```244 245Codex re-reads the lessons file, checks git state, re-establishes the baseline, and continues.246 247---248 249## Parallel Experiments250 251Request parallel mode during confirmation or at any time:252 253```text254You:   Go, parallel 4255```256 257Codex runs four hypotheses concurrently, keeps the best result, discards the rest. Useful when hypothesis space is large.258 259---260 261## Pivot Protocol262 263If the loop stalls, escalation happens automatically:264 265| Consecutive discards | Action |266|---------------------|--------|267| 3 | **REFINE** — narrow hypothesis, try smaller atomic changes |268| 5 | **PIVOT** — change strategy entirely |269| 2 PIVOTs | **Web search** — Codex fetches external references to unstick itself |270 271You are never asked for permission during escalation. The loop continues.272 273---274 275## Real Code Examples276 277### Example 1 — TypeScript `any` elimination (Python verify script)278 279If you want a custom verify script instead of a one-liner:280 281```python282# scripts/count_any.py283import subprocess, sys284 285result = subprocess.run(286    ["grep", "-r", "--include=*.ts", r"\bany\b", "src/"],287    capture_output=True, text=True288)289count = len(result.stdout.strip().splitlines())290print(count)291sys.exit(0)  # always exit 0; the number is what matters292```293 294Tell Codex during confirmation:295 296```text297Verify: python scripts/count_any.py298Guard:  npx tsc --noEmit299```300 301### Example 2 — pytest coverage loop (Python)302 303```python304# scripts/coverage_pct.py305import subprocess, re, sys306 307out = subprocess.check_output(308    ["pytest", "--cov=src", "--cov-report=term", "-q"],309    stderr=subprocess.STDOUT, text=True310)311match = re.search(r"TOTAL\s+\d+\s+\d+\s+(\d+)%", out)312if match:313    print(int(match.group(1)))314    sys.exit(0)315print(0)316sys.exit(0)317```318 319```text320$codex-autoresearch321Improve test coverage — target 85%322 323Verify: python scripts/coverage_pct.py324Guard:  python -m mypy src325Direction: higher326Target: 85327Iterations: 50328```329 330### Example 3 — bundle size loop (Node.js project)331 332```bash333# scripts/bundle_size.sh334#!/usr/bin/env bash335npm run build --silent 2>/dev/null336du -k dist/bundle.js | awk '{print $1}'337```338 339```text340$codex-autoresearch341Reduce our JS bundle size, currently ~2300 KB, target under 900 KB342 343Verify: bash scripts/bundle_size.sh344Guard:  npm test345Direction: lower346Target: 900347```348 349### Example 4 — lint warning count (any language)350 351```bash352# scripts/lint_count.sh353#!/usr/bin/env bash354npx eslint src/ --format json 2>/dev/null \355  | python3 -c "import sys,json; d=json.load(sys.stdin); print(sum(len(f['messages']) for f in d))"356```357 358```text359$codex-autoresearch360Get our ESLint warning count to zero361 362Verify: bash scripts/lint_count.sh363Direction: lower364Target: 0365```366 367---368 369## Unattended Runs370 371For overnight or long runs, ensure Codex CLI approval settings do not interrupt `git commit` or `git revert` commands. The simplest option is to run in a disposable or sandboxed repo clone:372 373```bash374git clone . /tmp/autoresearch-sandbox375cd /tmp/autoresearch-sandbox376# launch Codex here with full permissions377```378 379Results accumulate in git history. Pull the winning commits back to your main repo when done:380 381```bash382# in your main repo383git fetch /tmp/autoresearch-sandbox main384git cherry-pick <winning-commit-sha>385```386 387---388 389## Session Artifacts390 391| File | Contents |392|------|----------|393| `.agents/skills/codex-autoresearch/lessons.md` | Structured lessons from every iteration |394| `.agents/skills/codex-autoresearch/results.log` | Full per-iteration log (metric value, kept/reverted, elapsed) |395| `.agents/skills/codex-autoresearch/session.json` | Current session state for resume |396 397These files persist across Codex sessions. Delete them to start fresh.398 399---400 401## Troubleshooting402 403**Loop reverts every change:**404- Verify command may be returning a non-numeric value. Test it manually: `bash -c "<your verify command>"` should print a single number.405- Metric direction may be wrong. Confirm `Direction: lower` or `Direction: higher` during setup.406 407**Guard fires on unrelated files:**408- Narrow scope: `Scope: src/specific-module/`409- Or tell Codex explicitly: `Do not touch tests/` during confirmation.410 411**Session resume picks up wrong baseline:**412- Delete `session.json` to force a fresh baseline: `rm .agents/skills/codex-autoresearch/session.json`413 414**Parallel mode produces merge conflicts:**415- Codex handles this internally via the pivot protocol, but if it gets stuck, reduce parallelism: `Parallel: 2`416 417**Codex asks questions mid-loop:**418- This means a guard crash produced ambiguous output. Pre-empt it by specifying `Guard: <command> || true` if guard failures should be non-fatal, or by giving Codex fuller sandbox permissions so it can run git commands freely.419 420**Loop hits PIVOT but makes no progress:**421- Supply a seed hypothesis during confirmation: `Hint: try tree-shaking unused imports first`422- Or run `plan` mode first to produce a richer hypothesis list before switching to `loop`.423 424---425 426## Quick Reference427 428```text429# Start a loop430$codex-autoresearch431<your goal in one sentence>432 433# Resume interrupted run434$codex-autoresearch435Resume436 437# Bounded run438$codex-autoresearch439<goal> — Iterations: 25440 441# Parallel hypotheses442$codex-autoresearch443<goal> — Parallel: 4444 445# Force a mode446$codex-autoresearch fix447pytest has 8 failures, repair them448 449# Read-only audit450$codex-autoresearch security451Audit src/api/ for injection vulnerabilities452```
Related skills
Agency Agents Ai Specialists

Install Agency Agents Ai Specialists skill for Claude Code from aradotso/trending-skills.
Agent Browser Automation

Install Agent Browser Automation skill for Claude Code from aradotso/trending-skills.
Antigravity Manager

Install Antigravity Manager skill for Claude Code from aradotso/trending-skills.