Install

Terminal · npx

$npx skills add https://github.com/vercel-labs/agent-skills --skill vercel-react-best-practices

Works with Paperclip

How Liteparse fits into a Paperclip company.

Liteparse drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md222 linesmarkdown

Expand

1---2name: liteparse3description: Use this skill when the user asks to parse, perform multi-format document conversion or spatially extract text from an unstructured file (PDF, DOCX, PPTX, XLSX, images, etc.) locally without cloud dependencies.4compatibility: Requires Node 18+ and `@llamaindex/liteparse` installed globally via npm (`npm i -g @llamaindex/liteparse`)5license: MIT6metadata:7  author: LlamaIndex8  version: "0.1.0"9---10 11# LiteParse Skill12 13Parse unstructured documents (PDF, DOCX, PPTX, XLSX, images, and more) locally with LiteParse: fast, lightweight, no cloud dependencies or LLM required.14 15## Initial Setup16 17When this skill is invoked, respond with:18 19```20I'm ready to use LiteParse to parse files locally. Before we begin, please confirm that:21 22- `@llamaindex/liteparse` is installed globally (`npm i -g @llamaindex/liteparse`)23- The `lit` CLI command is available in your terminal24 25If both are set, please provide:26 271. One or more files to parse (PDF, DOCX, PPTX, XLSX, images, etc.)282. Any specific options: output format (json/text), page ranges, OCR preferences, DPI, etc.293. What you'd like to do with the parsed content.30 31I will produce the appropriate `lit` CLI command or TypeScript script, and once approved, report the results.32```33 34Then wait for the user's input.35 36---37 38## Step 0 — Install LiteParse (if needed)39 40If `liteparse` is not yet installed, install it globally:41 42```bash43npm i -g @llamaindex/liteparse44```45 46Verify installation:47 48```bash49lit --version50```51 52For Office document support (DOCX, PPTX, XLSX), LibreOffice is required:53 54```bash55# macOS56brew install --cask libreoffice57 58# Ubuntu/Debian59apt-get install libreoffice60```61 62For image parsing, ImageMagick is required:63```bash64# macOS65brew install imagemagick66 67# Ubuntu/Debian68apt-get install imagemagick69```70 71---72 73## Step 1 — Produce the CLI Command or Script74 75### Parse a Single File76 77```bash78# Basic text extraction79lit parse document.pdf80 81# JSON output saved to a file82lit parse document.pdf --format json -o output.json83 84# Specific page range85lit parse document.pdf --target-pages "1-5,10,15-20"86 87# Disable OCR (faster, text-only PDFs)88lit parse document.pdf --no-ocr89 90# Use an external HTTP OCR server for higher accuracy91lit parse document.pdf --ocr-server-url http://localhost:8828/ocr92 93# Higher DPI for better quality94lit parse document.pdf --dpi 30095```96 97### Batch Parse a Directory98 99```bash100lit batch-parse ./input-directory ./output-directory101 102# Only process PDFs, recursively103lit batch-parse ./input ./output --extension .pdf --recursive104```105 106### Generate Page Screenshots107 108Screenshots are useful for LLM agents that need to see visual layout.109 110```bash111# All pages112lit screenshot document.pdf -o ./screenshots113 114# Specific pages115lit screenshot document.pdf --pages "1,3,5" -o ./screenshots116 117# High-DPI PNG118lit screenshot document.pdf --dpi 300 --format png -o ./screenshots119 120# Page range121lit screenshot document.pdf --pages "1-10" -o ./screenshots122```123 124---125 126## Step 3 — Key Options Reference127 128### OCR Options129 130| Option | Description |131|--------|-------------|132| (default) | Tesseract.js — zero setup, built-in |133| `--ocr-language fra` | Set OCR language (ISO code) |134| `--ocr-server-url <url>` | Use external HTTP OCR server (EasyOCR, PaddleOCR, custom) |135| `--no-ocr` | Disable OCR entirely |136 137### Output Options138 139| Option | Description |140|--------|-------------|141| `--format json` | Structured JSON with bounding boxes |142| `--format text` | Plain text (default) |143| `-o <file>` | Save output to file |144 145### Performance / Quality Options146 147| Option | Description |148|--------|-------------|149| `--dpi <n>` | Rendering DPI (default: 150; use 300 for high quality) |150| `--max-pages <n>` | Limit pages parsed |151| `--target-pages <pages>` | Parse specific pages (e.g. `"1-5,10"`) |152| `--no-precise-bbox` | Disable precise bounding boxes (faster) |153| `--skip-diagonal-text` | Ignore rotated/diagonal text |154| `--preserve-small-text` | Keep very small text that would otherwise be dropped |155 156---157 158## Step 4 — Using a Config File159 160For repeated use with consistent options, generate a `liteparse.config.json`:161 162```json163{164  "ocrLanguage": "en",165  "ocrEnabled": true,166  "maxPages": 1000,167  "dpi": 150,168  "outputFormat": "json",169  "preciseBoundingBox": true,170  "skipDiagonalText": false,171  "preserveVerySmallText": false172}173```174 175For an HTTP OCR server:176 177```json178{179  "ocrServerUrl": "http://localhost:8828/ocr",180  "ocrLanguage": "en",181  "outputFormat": "json"182}183```184 185Use with:186 187```bash188lit parse document.pdf --config liteparse.config.json189```190 191---192 193## Step 5 — HTTP OCR Server API (Advanced)194 195If the user wants to plug in a custom OCR backend, the server must implement:196 197- **Endpoint**: `POST /ocr`198- **Accepts**: `file` (multipart) and `language` (string) parameters199- **Returns**:200```json201{202  "results": [203    { "text": "Hello", "bbox": [x1, y1, x2, y2], "confidence": 0.98 }204  ]205}206```207 208Ready-to-use wrappers exist for EasyOCR and PaddleOCR in the LiteParse repo.209 210---211 212## Supported Input Formats213 214| Category | Formats |215|----------|---------|216| PDF | `.pdf` |217| Word | `.doc`, `.docx`, `.docm`, `.odt`, `.rtf` |218| PowerPoint | `.ppt`, `.pptx`, `.pptm`, `.odp` |219| Spreadsheets | `.xls`, `.xlsx`, `.xlsm`, `.ods`, `.csv`, `.tsv` |220| Images | `.jpg`, `.jpeg`, `.png`, `.gif`, `.bmp`, `.tiff`, `.webp`, `.svg` |221 222Office documents require LibreOffice; images require ImageMagick. LiteParse auto-converts these formats to PDF before parsing.

Related skills

1password

Install 1password skill for Claude Code from steipete/clawdis.

3d Web Experience

Install 3d Web Experience skill for Claude Code from sickn33/antigravity-awesome-skills.

Ab Test Setup

This handles the full A/B testing workflow from hypothesis formation to statistical analysis. It walks you through proper test design, calculates sample sizes,