Install
Terminal · npx$
npx skills add https://github.com/obra/superpowers --skill test-driven-developmentWorks with Paperclip
How Ai Regression Testing fits into a Paperclip company.
Ai Regression Testing drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
S
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore packSource file
SKILL.md385 linesExpandCollapse
---name: ai-regression-testingdescription: Regression testing strategies for AI-assisted development. Sandbox-mode API testing without database dependencies, automated bug-check workflows, and patterns to catch AI blind spots where the same model writes and reviews code.origin: ECC--- # AI Regression Testing Testing patterns specifically designed for AI-assisted development, where the same model writes code and reviews it — creating systematic blind spots that only automated tests can catch. ## When to Activate - AI agent (Claude Code, Cursor, Codex) has modified API routes or backend logic- A bug was found and fixed — need to prevent re-introduction- Project has a sandbox/mock mode that can be leveraged for DB-free testing- Running `/bug-check` or similar review commands after code changes- Multiple code paths exist (sandbox vs production, feature flags, etc.) ## The Core Problem When an AI writes code and then reviews its own work, it carries the same assumptions into both steps. This creates a predictable failure pattern: ```AI writes fix → AI reviews fix → AI says "looks correct" → Bug still exists``` **Real-world example** (observed in production): ```Fix 1: Added notification_settings to API response → Forgot to add it to the SELECT query → AI reviewed and missed it (same blind spot) Fix 2: Added it to SELECT query → TypeScript build error (column not in generated types) → AI reviewed Fix 1 but didn't catch the SELECT issue Fix 3: Changed to SELECT * → Fixed production path, forgot sandbox path → AI reviewed and missed it AGAIN (4th occurrence) Fix 4: Test caught it instantly on first run PASS:``` The pattern: **sandbox/production path inconsistency** is the #1 AI-introduced regression. ## Sandbox-Mode API Testing Most projects with AI-friendly architecture have a sandbox/mock mode. This is the key to fast, DB-free API testing. ### Setup (Vitest + Next.js App Router) ```typescript// vitest.config.tsimport { defineConfig } from "vitest/config";import path from "path"; export default defineConfig({ test: { environment: "node", globals: true, include: ["__tests__/**/*.test.ts"], setupFiles: ["__tests__/setup.ts"], }, resolve: { alias: { "@": path.resolve(__dirname, "."), }, },});``` ```typescript// __tests__/setup.ts// Force sandbox mode — no database neededprocess.env.SANDBOX_MODE = "true";process.env.NEXT_PUBLIC_SUPABASE_URL = "";process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";``` ### Test Helper for Next.js API Routes ```typescript// __tests__/helpers.tsimport { NextRequest } from "next/server"; export function createTestRequest( url: string, options?: { method?: string; body?: Record<string, unknown>; headers?: Record<string, string>; sandboxUserId?: string; },): NextRequest { const { method = "GET", body, headers = {}, sandboxUserId } = options || {}; const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`; const reqHeaders: Record<string, string> = { ...headers }; if (sandboxUserId) { reqHeaders["x-sandbox-user-id"] = sandboxUserId; } const init: { method: string; headers: Record<string, string>; body?: string } = { method, headers: reqHeaders, }; if (body) { init.body = JSON.stringify(body); reqHeaders["content-type"] = "application/json"; } return new NextRequest(fullUrl, init);} export async function parseResponse(response: Response) { const json = await response.json(); return { status: response.status, json };}``` ### Writing Regression Tests The key principle: **write tests for bugs that were found, not for code that works**. ```typescript// __tests__/api/user/profile.test.tsimport { describe, it, expect } from "vitest";import { createTestRequest, parseResponse } from "../../helpers";import { GET, PATCH } from "@/app/api/user/profile/route"; // Define the contract — what fields MUST be in the responseconst REQUIRED_FIELDS = [ "id", "email", "full_name", "phone", "role", "created_at", "avatar_url", "notification_settings", // ← Added after bug found it missing]; describe("GET /api/user/profile", () => { it("returns all required fields", async () => { const req = createTestRequest("/api/user/profile"); const res = await GET(req); const { status, json } = await parseResponse(res); expect(status).toBe(200); for (const field of REQUIRED_FIELDS) { expect(json.data).toHaveProperty(field); } }); // Regression test — this exact bug was introduced by AI 4 times it("notification_settings is not undefined (BUG-R1 regression)", async () => { const req = createTestRequest("/api/user/profile"); const res = await GET(req); const { json } = await parseResponse(res); expect("notification_settings" in json.data).toBe(true); const ns = json.data.notification_settings; expect(ns === null || typeof ns === "object").toBe(true); });});``` ### Testing Sandbox/Production Parity The most common AI regression: fixing production path but forgetting sandbox path (or vice versa). ```typescript// Test that sandbox responses match the expected contractdescribe("GET /api/user/messages (conversation list)", () => { it("includes partner_name in sandbox mode", async () => { const req = createTestRequest("/api/user/messages", { sandboxUserId: "user-001", }); const res = await GET(req); const { json } = await parseResponse(res); // This caught a bug where partner_name was added // to production path but not sandbox path if (json.data.length > 0) { for (const conv of json.data) { expect("partner_name" in conv).toBe(true); } } });});``` ## Integrating Tests into Bug-Check Workflow ### Custom Command Definition ```markdown<!-- .claude/commands/bug-check.md --># Bug Check ## Step 1: Automated Tests (mandatory, cannot skip) Run these commands FIRST before any code review: npm run test # Vitest test suite npm run build # TypeScript type check + build - If tests fail → report as highest priority bug- If build fails → report type errors as highest priority- Only proceed to Step 2 if both pass ## Step 2: Code Review (AI review) 1. Sandbox / production path consistency2. API response shape matches frontend expectations3. SELECT clause completeness4. Error handling with rollback5. Optimistic update race conditions ## Step 3: For each bug fixed, propose a regression test``` ### The Workflow ```User: "バグチェックして" (or "/bug-check") │ ├─ Step 1: npm run test │ ├─ FAIL → Bug found mechanically (no AI judgment needed) │ └─ PASS → Continue │ ├─ Step 2: npm run build │ ├─ FAIL → Type error found mechanically │ └─ PASS → Continue │ ├─ Step 3: AI code review (with known blind spots in mind) │ └─ Findings reported │ └─ Step 4: For each fix, write a regression test └─ Next bug-check catches if fix breaks``` ## Common AI Regression Patterns ### Pattern 1: Sandbox/Production Path Mismatch **Frequency**: Most common (observed in 3 out of 4 regressions) ```typescript// FAIL: AI adds field to production path onlyif (isSandboxMode()) { return { data: { id, email, name } }; // Missing new field}// Production pathreturn { data: { id, email, name, notification_settings } }; // PASS: Both paths must return the same shapeif (isSandboxMode()) { return { data: { id, email, name, notification_settings: null } };}return { data: { id, email, name, notification_settings } };``` **Test to catch it**: ```typescriptit("sandbox and production return same fields", async () => { // In test env, sandbox mode is forced ON const res = await GET(createTestRequest("/api/user/profile")); const { json } = await parseResponse(res); for (const field of REQUIRED_FIELDS) { expect(json.data).toHaveProperty(field); }});``` ### Pattern 2: SELECT Clause Omission **Frequency**: Common with Supabase/Prisma when adding new columns ```typescript// FAIL: New column added to response but not to SELECTconst { data } = await supabase .from("users") .select("id, email, name") // notification_settings not here .single(); return { data: { ...data, notification_settings: data.notification_settings } };// → notification_settings is always undefined // PASS: Use SELECT * or explicitly include new columnsconst { data } = await supabase .from("users") .select("*") .single();``` ### Pattern 3: Error State Leakage **Frequency**: Moderate — when adding error handling to existing components ```typescript// FAIL: Error state set but old data not clearedcatch (err) { setError("Failed to load"); // reservations still shows data from previous tab!} // PASS: Clear related state on errorcatch (err) { setReservations([]); // Clear stale data setError("Failed to load");}``` ### Pattern 4: Optimistic Update Without Proper Rollback ```typescript// FAIL: No rollback on failureconst handleRemove = async (id: string) => { setItems(prev => prev.filter(i => i.id !== id)); await fetch(`/api/items/${id}`, { method: "DELETE" }); // If API fails, item is gone from UI but still in DB}; // PASS: Capture previous state and rollback on failureconst handleRemove = async (id: string) => { const prevItems = [...items]; setItems(prev => prev.filter(i => i.id !== id)); try { const res = await fetch(`/api/items/${id}`, { method: "DELETE" }); if (!res.ok) throw new Error("API error"); } catch { setItems(prevItems); // Rollback alert("削除に失敗しました"); }};``` ## Strategy: Test Where Bugs Were Found Don't aim for 100% coverage. Instead: ```Bug found in /api/user/profile → Write test for profile APIBug found in /api/user/messages → Write test for messages APIBug found in /api/user/favorites → Write test for favorites APINo bug in /api/user/notifications → Don't write test (yet)``` **Why this works with AI development:** 1. AI tends to make the **same category of mistake** repeatedly2. Bugs cluster in complex areas (auth, multi-path logic, state management)3. Once tested, that exact regression **cannot happen again**4. Test count grows organically with bug fixes — no wasted effort ## Quick Reference | AI Regression Pattern | Test Strategy | Priority ||---|---|---|| Sandbox/production mismatch | Assert same response shape in sandbox mode | High || SELECT clause omission | Assert all required fields in response | High || Error state leakage | Assert state cleanup on error | Medium || Missing rollback | Assert state restored on API failure | Medium || Type cast masking null | Assert field is not undefined | Medium | ## DO / DON'T **DO:**- Write tests immediately after finding a bug (before fixing it if possible)- Test the API response shape, not the implementation- Run tests as the first step of every bug-check- Keep tests fast (< 1 second total with sandbox mode)- Name tests after the bug they prevent (e.g., "BUG-R1 regression") **DON'T:**- Write tests for code that has never had a bug- Trust AI self-review as a substitute for automated tests- Skip sandbox path testing because "it's just mock data"- Write integration tests when unit tests suffice- Aim for coverage percentage — aim for regression preventionRelated skills
Agent Eval
Install Agent Eval skill for Claude Code from affaan-m/everything-claude-code.
Agent Harness Construction
Install Agent Harness Construction skill for Claude Code from affaan-m/everything-claude-code.
Agent Payment X402
Install Agent Payment X402 skill for Claude Code from affaan-m/everything-claude-code.