Claude Agent Skill · by Aradotso

Agent Browser Automation

Install Agent Browser Automation skill for Claude Code from aradotso/trending-skills.

Install
Terminal · npx
$npx skills add https://github.com/vercel-labs/agent-skills --skill vercel-react-best-practices
Works with Paperclip

How Agent Browser Automation fits into a Paperclip company.

Agent Browser Automation drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

S
SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59
Explore pack
Source file
SKILL.md437 lines
Expand
---name: agent-browser-automationdescription: Headless browser automation CLI for AI agents using native Rust binary with Chrome DevTools Protocoltriggers:  - automate browser with agent-browser  - use agent-browser for web scraping  - browser automation CLI for AI  - take screenshot with agent-browser  - click and fill forms with agent-browser  - agent-browser snapshot accessibility tree  - batch browser commands with agent-browser  - install agent-browser for automation--- # agent-browser > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. `agent-browser` is a headless browser automation CLI built in Rust, designed for AI agents. It wraps Chrome via the Chrome DevTools Protocol (CDP) and exposes a fast, ergonomic command-line interface for navigation, interaction, accessibility snapshots, screenshots, network interception, and more — with no Node.js or Playwright runtime required. ## Installation ### Recommended (npm global)```bashnpm install -g agent-browseragent-browser install  # Download Chrome for Testing (first time only)``` ### macOS (Homebrew)```bashbrew install agent-browseragent-browser install``` ### Rust / Cargo```bashcargo install agent-browseragent-browser install``` ### Local project dependency```bashnpm install agent-browser# Add to package.json scripts or invoke via npx``` ### Linux (with system dependencies)```bashagent-browser install --with-deps``` ## Quick Start ```bashagent-browser open https://example.comagent-browser snapshot                        # Accessibility tree with @refs (best for AI)agent-browser click @e2                       # Click by ref from snapshotagent-browser fill @e3 "hello@example.com"   # Fill by refagent-browser get text @e1                    # Get text contentagent-browser screenshot page.pngagent-browser close``` ## Core Commands ### Navigation```bashagent-browser open <url>           # Navigate (aliases: goto, navigate)agent-browser get url              # Get current URLagent-browser get title            # Get page titleagent-browser close                # Close browser (aliases: quit, exit)``` ### Accessibility Snapshot (recommended for AI agents)```bashagent-browser snapshot             # Returns accessibility tree with @ref IDsagent-browser snapshot -i          # Interactive / compact mode``` Snapshot output includes `@eN` refs you can use directly:```@e1 [button] "Submit"@e2 [textbox] "Email" value=""@e3 [link] "Sign in"``` Then act on them:```bashagent-browser fill @e2 "user@example.com"agent-browser click @e1``` ### Interaction```bashagent-browser click <sel>                     # Click elementagent-browser dblclick <sel>                  # Double-clickagent-browser fill <sel> <text>               # Clear and fill inputagent-browser type <sel> <text>               # Type into elementagent-browser press <key>                     # Press key (Enter, Tab, Control+a)agent-browser keyboard type <text>            # Type at current focus (real keystrokes)agent-browser keyboard inserttext <text>      # Insert text without key eventsagent-browser hover <sel>                     # Hover elementagent-browser select <sel> <value>            # Select dropdown optionagent-browser check <sel>                     # Check checkboxagent-browser uncheck <sel>                   # Uncheck checkboxagent-browser scroll down 500                 # Scroll (up/down/left/right, optional px)agent-browser scroll down --selector "#feed"  # Scroll within elementagent-browser scrollintoview <sel>            # Scroll element into viewagent-browser drag <src> <target>             # Drag and dropagent-browser upload <sel> /path/file.pdf     # Upload file``` ### Screenshots & PDF```bashagent-browser screenshot                          # Save to temp dir, print pathagent-browser screenshot page.png                 # Save to pathagent-browser screenshot --full page.png          # Full-page screenshotagent-browser screenshot --annotate               # Numbered element labels overlayagent-browser screenshot --screenshot-dir ./shots # Custom output directoryagent-browser screenshot --screenshot-format jpeg --screenshot-quality 80agent-browser pdf output.pdf                      # Save page as PDF``` ### Getting Element Info```bashagent-browser get text <sel>           # Text contentagent-browser get html <sel>           # innerHTMLagent-browser get value <sel>          # Input valueagent-browser get attr <sel> <attr>    # Attribute valueagent-browser get count <sel>          # Count matching elementsagent-browser get box <sel>            # Bounding boxagent-browser get styles <sel>         # Computed stylesagent-browser get cdp-url              # CDP WebSocket URL``` ### State Checks```bashagent-browser is visible <sel>agent-browser is enabled <sel>agent-browser is checked <sel>``` ### Semantic Locators (find)```bashagent-browser find role button click --name "Submit"agent-browser find text "Sign In" clickagent-browser find label "Email" fill "test@example.com"agent-browser find placeholder "Search..." fill "rust"agent-browser find testid "login-btn" clickagent-browser find first ".item" clickagent-browser find nth 2 "a" textagent-browser find role textbox fill "hello" --name "Username"``` **Actions:** `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text` ### Waiting```bashagent-browser wait "#modal"                          # Wait for element visibleagent-browser wait 2000                              # Wait N millisecondsagent-browser wait --text "Welcome back"             # Wait for textagent-browser wait --url "**/dashboard"              # Wait for URL patternagent-browser wait --load networkidle                # Wait for load stateagent-browser wait --fn "window.appReady === true"   # Wait for JS conditionagent-browser wait "#spinner" --state hidden         # Wait for element to disappear``` **Load states:** `load`, `domcontentloaded`, `networkidle` ### JavaScript Eval```bashagent-browser eval "document.title"agent-browser eval "JSON.stringify(window.__STATE__)"agent-browser eval -b "BASE64_ENCODED_JS"echo "return document.body.innerHTML" | agent-browser eval --stdin``` ### Batch Execution (efficient multi-step)```bashecho '[  ["open", "https://example.com"],  ["snapshot", "-i"],  ["fill", "@e2", "user@example.com"],  ["click", "@e1"],  ["screenshot", "result.png"]]' | agent-browser batch --json # Stop on first failureagent-browser batch --bail < commands.json``` ### Tabs & Frames```bashagent-browser tab                    # List tabsagent-browser tab new https://...    # New tab with URLagent-browser tab 2                  # Switch to tab 2agent-browser tab close              # Close current tabagent-browser frame "#my-iframe"     # Switch into iframeagent-browser frame main             # Return to main frame``` ### Cookies & Storage```bashagent-browser cookiesagent-browser cookies set session_id "abc123"agent-browser cookies clear agent-browser storage localagent-browser storage local set theme darkagent-browser storage local clearagent-browser storage session set cart '{"items":[]}'``` ### Network```bashagent-browser network route "**/api/users" --body '{"users":[]}'  # Mock responseagent-browser network route "**/ads/**" --abort                    # Block requestsagent-browser network unroute                                       # Remove all routesagent-browser network requests --filter api                        # View requestsagent-browser network har startagent-browser network har stop recording.har``` ### Browser Settings```bashagent-browser set viewport 1280 800agent-browser set viewport 375 812 2        # With device pixel ratio (retina)agent-browser set device "iPhone 14"agent-browser set geo 37.7749 -122.4194agent-browser set offline onagent-browser set headers '{"X-Custom":"value"}'agent-browser set credentials admin secretagent-browser set media dark``` ### Auth State```bashagent-browser state save ./auth.json    # Save cookies + localStorageagent-browser state load ./auth.json    # Restore auth stateagent-browser state list                # List saved statesagent-browser state show auth.json      # Summary of saved state``` ### Dialogs```bashagent-browser dialog accept             # Accept alert/confirm/promptagent-browser dialog accept "My input"  # Accept prompt with textagent-browser dialog dismiss``` ### Clipboard```bashagent-browser clipboard readagent-browser clipboard write "Hello, World!"agent-browser clipboard copy           # Ctrl+C current selectionagent-browser clipboard paste          # Ctrl+V``` ### Diff & Visual Testing```bashagent-browser diff snapshot                                  # vs last snapshotagent-browser diff snapshot --baseline before.txt            # vs saved fileagent-browser diff snapshot --selector "#main" --compactagent-browser diff screenshot --baseline before.pngagent-browser diff screenshot --baseline b.png -o diff.pngagent-browser diff url https://v1.example.com https://v2.example.comagent-browser diff url https://v1.example.com https://v2.example.com --screenshotagent-browser diff url https://v1.example.com https://v2.example.com --selector "#content"``` ### Debug & Profiling```bashagent-browser trace start trace.zipagent-browser trace stopagent-browser profiler startagent-browser profiler stop profile.jsonagent-browser console                  # View console messagesagent-browser errors                   # View uncaught JS exceptionsagent-browser highlight "#button"      # Visually highlight elementagent-browser inspect                  # Open Chrome DevToolsagent-browser connect 9222             # Connect to existing browser via CDP port``` ## Common Patterns ### Login flow and save session```bash#!/bin/bashagent-browser open https://app.example.com/loginagent-browser fill "#email" "$LOGIN_EMAIL"agent-browser fill "#password" "$LOGIN_PASSWORD"agent-browser click "[type=submit]"agent-browser wait --url "**/dashboard"agent-browser state save ./session.json``` ### AI agent loop with snapshot-driven interaction```bash#!/bin/bashagent-browser open https://app.example.comagent-browser state load ./session.json # Get snapshot, parse @refs, actSNAPSHOT=$(agent-browser snapshot)echo "$SNAPSHOT" # Agent determines @e5 is the search boxagent-browser fill @e5 "quarterly report"agent-browser press Enteragent-browser wait --load networkidleagent-browser snapshotagent-browser screenshot results.png``` ### Batch commands from a script (JSON)```bashcat > commands.json << 'EOF'[  ["open", "https://news.ycombinator.com"],  ["wait", "--load", "networkidle"],  ["get", "title"],  ["snapshot"],  ["screenshot", "hn.png"]]EOF agent-browser batch --json < commands.json``` ### Scrape with mocked network```bashagent-browser open https://api-heavy-app.example.comagent-browser network route "**/api/slow-endpoint" --body '{"data":"mocked"}'agent-browser snapshotagent-browser network unroute``` ### Full-page screenshot with annotations```bashagent-browser open https://example.comagent-browser wait --load networkidleagent-browser screenshot --full --annotate annotated.png``` ### Connect to already-running Chrome```bash# Start Chrome with remote debugginggoogle-chrome --remote-debugging-port=9222 & agent-browser connect 9222agent-browser open https://example.comagent-browser snapshot``` ### Emulate mobile device```bashagent-browser set device "iPhone 14"agent-browser open https://example.comagent-browser screenshot mobile.png``` ### HAR recording for network analysis```bashagent-browser open https://example.comagent-browser network har startagent-browser click "#load-data"agent-browser wait --load networkidleagent-browser network har stop session.har``` ## Selector Reference | Format | Example | Notes ||--------|---------|-------|| `@ref` | `@e1`, `@e12` | From `snapshot` output — preferred for AI || CSS | `#id`, `.class`, `[attr=val]` | Standard CSS selectors || Text | `"Sign In"` | Exact text match || XPath | `//button[@type='submit']` | Full XPath | ## Troubleshooting ### Chrome not found```bashagent-browser install              # Downloads Chrome for Testingagent-browser install --with-deps  # Linux: also installs system libs``` ### Element not found / timing issues```bashagent-browser wait "#my-element"              # Wait for visibility firstagent-browser wait --load networkidle         # Wait for page to settleagent-browser wait --fn "!!document.querySelector('#app')"``` ### Selector issues — use snapshot refs instead```bash# Instead of fragile CSS:agent-browser click ".btn.btn-primary.submit-form" # Use snapshot refs:agent-browser snapshot  # Find @e7 = [button] "Submit"agent-browser click @e7``` ### Debug what's on the page```bashagent-browser screenshot debug.png        # Visual checkagent-browser snapshot                    # Accessibility treeagent-browser console                     # JS console outputagent-browser errors                      # Uncaught exceptionsagent-browser eval "document.readyState"``` ### Auth issues between sessions```bashagent-browser state save ./auth.json   # After successful loginagent-browser state load ./auth.json   # At start of next session``` ### Handling alerts/dialogs```bash# Set up handler BEFORE the action that triggers dialogagent-browser dialog acceptagent-browser click "#delete-button"``` ### Performance — use batch for multi-step workflows```bash# Slow: one process per commandagent-browser open https://example.comagent-browser fill "#q" "search"agent-browser click "#submit" # Fast: single process, multiple commandsecho '[["open","https://example.com"],["fill","#q","search"],["click","#submit"]]' \  | agent-browser batch --json```