Name: Llmfit Hardware Model Matcher
Author: Aradotso
Install
Terminal · npx
$npx skills add https://github.com/vercel-labs/agent-skills --skill vercel-react-best-practices
Works with Paperclip
How Llmfit Hardware Model Matcher fits into a Paperclip company.

Llmfit Hardware Model Matcher drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md484 linesmarkdown
Expand
1---2name: llmfit-hardware-model-matcher3description: Terminal tool that detects your hardware and recommends which LLM models will actually run well on your system4triggers:5  - "find LLM models that fit my hardware"6  - "which AI models can I run locally"7  - "recommend models for my GPU RAM"8  - "check if a model will run on my machine"9  - "llmfit model recommendations"10  - "local LLM hardware compatibility"11  - "what LLM fits my system specs"12  - "score models for my computer"13---14 15# llmfit Hardware Model Matcher16 17> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.18 19llmfit detects your system's RAM, CPU, and GPU then scores hundreds of LLM models across quality, speed, fit, and context dimensions — telling you exactly which models will run well on your hardware. It ships with an interactive TUI and a CLI, supports multi-GPU, MoE architectures, dynamic quantization, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner).20 21---22 23## Installation24 25### macOS / Linux (Homebrew)26```sh27brew install llmfit28```29 30### Quick install script31```sh32curl -fsSL https://llmfit.axjns.dev/install.sh | sh33 34# Without sudo, installs to ~/.local/bin35curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local36```37 38### Windows (Scoop)39```sh40scoop install llmfit41```42 43### Docker / Podman44```sh45docker run ghcr.io/alexsjones/llmfit46 47# With jq for scripting48podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'49```50 51### From source (Rust)52```sh53git clone https://github.com/AlexsJones/llmfit.git54cd llmfit55cargo build --release56# binary at target/release/llmfit57```58 59---60 61## Core Concepts62 63- **Fit tiers**: `perfect` (runs great), `good` (runs well), `marginal` (runs but tight), `too_tight` (won't run)64- **Scoring dimensions**: quality, speed (tok/s estimate), fit (memory headroom), context capacity65- **Run modes**: GPU, CPU+GPU offload, CPU-only, MoE66- **Quantization**: automatically selects best quant (e.g. Q4_K_M, Q5_K_S, mlx-4bit) for your hardware67- **Providers**: Ollama, llama.cpp, MLX, Docker Model Runner68 69---70 71## Key Commands72 73### Launch Interactive TUI74```sh75llmfit76```77 78### CLI Table Output79```sh80llmfit --cli81```82 83### Show System Hardware Detection84```sh85llmfit system86llmfit --json system   # JSON output87```88 89### List All Models90```sh91llmfit list92```93 94### Search Models95```sh96llmfit search "llama 8b"97llmfit search "mistral"98llmfit search "qwen coding"99```100 101### Fit Analysis102```sh103# All runnable models ranked by fit104llmfit fit105 106# Only perfect fits, top 5107llmfit fit --perfect -n 5108 109# JSON output110llmfit --json fit -n 10111```112 113### Model Detail114```sh115llmfit info "Mistral-7B"116llmfit info "Llama-3.1-70B"117```118 119### Recommendations120```sh121# Top 5 recommendations (JSON default)122llmfit recommend --json --limit 5123 124# Filter by use case: general, coding, reasoning, chat, multimodal, embedding125llmfit recommend --json --use-case coding --limit 3126llmfit recommend --json --use-case reasoning --limit 5127```128 129### Hardware Planning (invert: what hardware do I need?)130```sh131llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192132llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit133llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json134llmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json135```136 137### REST API Server (for cluster scheduling)138```sh139llmfit serve140llmfit serve --host 0.0.0.0 --port 8787141```142 143---144 145## Hardware Overrides146 147When autodetection fails (VMs, broken nvidia-smi, passthrough setups):148 149```sh150# Override GPU VRAM151llmfit --memory=32G152llmfit --memory=24G --cli153llmfit --memory=24G fit --perfect -n 5154llmfit --memory=24G recommend --json155 156# Megabytes157llmfit --memory=32000M158 159# Works with any subcommand160llmfit --memory=16G info "Llama-3.1-70B"161```162 163Accepted suffixes: `G`/`GB`/`GiB`, `M`/`MB`/`MiB`, `T`/`TB`/`TiB` (case-insensitive).164 165### Context Length Cap166```sh167# Estimate memory fit at 4K context168llmfit --max-context 4096 --cli169 170# With subcommands171llmfit --max-context 8192 fit --perfect -n 5172llmfit --max-context 16384 recommend --json --limit 5173 174# Environment variable alternative175export OLLAMA_CONTEXT_LENGTH=8192176llmfit recommend --json177```178 179---180 181## REST API Reference182 183Start the server:184```sh185llmfit serve --host 0.0.0.0 --port 8787186```187 188### Endpoints189 190```sh191# Health check192curl http://localhost:8787/health193 194# Node hardware info195curl http://localhost:8787/api/v1/system196 197# Full model list with filters198curl "http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20"199 200# Top runnable models for this node (key scheduling endpoint)201curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"202 203# Search by model name/provider204curl "http://localhost:8787/api/v1/models/Mistral?runtime=any"205```206 207### Query Parameters for `/models` and `/models/top`208 209| Param | Values | Description |210|---|---|---|211| `limit` / `n` | integer | Max rows returned |212| `min_fit` | `perfect\|good\|marginal\|too_tight` | Minimum fit tier |213| `perfect` | `true\|false` | Force perfect-only |214| `runtime` | `any\|mlx\|llamacpp` | Filter by runtime |215| `use_case` | `general\|coding\|reasoning\|chat\|multimodal\|embedding` | Use case filter |216| `provider` | string | Substring match on provider |217| `search` | string | Free-text across name/provider/size/use-case |218| `sort` | `score\|tps\|params\|mem\|ctx\|date\|use_case` | Sort column |219| `include_too_tight` | `true\|false` | Include non-runnable models |220| `max_context` | integer | Per-request context cap |221 222---223 224## Scripting & Automation Examples225 226### Bash: Get top coding models as JSON227```bash228#!/bin/bash229# Get top 3 coding models that fit perfectly230llmfit recommend --json --use-case coding --limit 3 | \231  jq -r '.models[] | "\(.name) (\(.score)) - \(.quantization)"'232```233 234### Bash: Check if a specific model fits235```bash236#!/bin/bash237MODEL="Mistral-7B"238RESULT=$(llmfit info "$MODEL" --json 2>/dev/null)239FIT=$(echo "$RESULT" | jq -r '.fit')240if [[ "$FIT" == "perfect" || "$FIT" == "good" ]]; then241  echo "$MODEL will run well (fit: $FIT)"242else243  echo "$MODEL may not run well (fit: $FIT)"244fi245```246 247### Bash: Auto-pull top Ollama model248```bash249#!/bin/bash250# Get the top fitting model name and pull it with Ollama251TOP_MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')252echo "Pulling: $TOP_MODEL"253ollama pull "$TOP_MODEL"254```255 256### Python: Query the REST API257```python258import requests259 260BASE_URL = "http://localhost:8787"261 262def get_system_info():263    resp = requests.get(f"{BASE_URL}/api/v1/system")264    return resp.json()265 266def get_top_models(use_case="coding", limit=5, min_fit="good"):267    params = {268        "use_case": use_case,269        "limit": limit,270        "min_fit": min_fit,271        "sort": "score"272    }273    resp = requests.get(f"{BASE_URL}/api/v1/models/top", params=params)274    return resp.json()275 276def search_models(query, runtime="any"):277    resp = requests.get(278        f"{BASE_URL}/api/v1/models/{query}",279        params={"runtime": runtime}280    )281    return resp.json()282 283# Example usage284system = get_system_info()285print(f"GPU: {system.get('gpu_name')} | VRAM: {system.get('vram_gb')}GB")286 287models = get_top_models(use_case="reasoning", limit=3)288for m in models.get("models", []):289    print(f"{m['name']}: score={m['score']}, fit={m['fit']}, quant={m['quantization']}")290```291 292### Python: Hardware-aware model selector for agents293```python294import subprocess295import json296 297def get_best_model_for_task(use_case: str, min_fit: str = "good") -> dict:298    """Use llmfit to select the best model for a given task."""299    result = subprocess.run(300        ["llmfit", "recommend", "--json", "--use-case", use_case, "--limit", "1"],301        capture_output=True,302        text=True303    )304    data = json.loads(result.stdout)305    models = data.get("models", [])306    return models[0] if models else None307 308def plan_hardware_requirements(model_name: str, context: int = 4096) -> dict:309    """Get hardware requirements for running a specific model."""310    result = subprocess.run(311        ["llmfit", "plan", model_name, "--context", str(context), "--json"],312        capture_output=True,313        text=True314    )315    return json.loads(result.stdout)316 317# Select best coding model318best = get_best_model_for_task("coding")319if best:320    print(f"Best coding model: {best['name']}")321    print(f"  Quantization: {best['quantization']}")322    print(f"  Estimated tok/s: {best['tps']}")323    print(f"  Memory usage: {best['mem_pct']}%")324 325# Plan hardware for a specific model326plan = plan_hardware_requirements("Qwen/Qwen3-4B-MLX-4bit", context=8192)327print(f"Min VRAM needed: {plan['hardware']['min_vram_gb']}GB")328print(f"Recommended VRAM: {plan['hardware']['recommended_vram_gb']}GB")329```330 331### Docker Compose: Node scheduler pattern332```yaml333version: "3.8"334services:335  llmfit-api:336    image: ghcr.io/alexsjones/llmfit337    command: serve --host 0.0.0.0 --port 8787338    ports:339      - "8787:8787"340    environment:341      - OLLAMA_CONTEXT_LENGTH=8192342    devices:343      - /dev/nvidia0:/dev/nvidia0  # pass GPU through344```345 346---347 348## TUI Key Reference349 350| Key | Action |351|---|---|352| `↑`/`↓` or `j`/`k` | Navigate models |353| `/` | Search (name, provider, params, use case) |354| `Esc`/`Enter` | Exit search |355| `Ctrl-U` | Clear search |356| `f` | Cycle fit filter: All → Runnable → Perfect → Good → Marginal |357| `a` | Cycle availability: All → GGUF Avail → Installed |358| `s` | Cycle sort: Score → Params → Mem% → Ctx → Date → Use Case |359| `t` | Cycle color theme (auto-saved) |360| `v` | Visual mode (multi-select for comparison) |361| `V` | Select mode (column-based filtering) |362| `p` | Plan mode (what hardware needed for this model?) |363| `P` | Provider filter popup |364| `U` | Use-case filter popup |365| `C` | Capability filter popup |366| `m` | Mark model for comparison |367| `c` | Compare view (marked vs selected) |368| `d` | Download model (via detected runtime) |369| `r` | Refresh installed models from runtimes |370| `Enter` | Toggle detail view |371| `g`/`G` | Jump to top/bottom |372| `q` | Quit |373 374### Themes375`t` cycles: Default → Dracula → Solarized → Nord → Monokai → Gruvbox  376Theme saved to `~/.config/llmfit/theme`377 378---379 380## GPU Detection Details381 382| GPU Vendor | Detection Method |383|---|---|384| NVIDIA | `nvidia-smi` (multi-GPU, aggregates VRAM) |385| AMD | `rocm-smi` |386| Intel Arc | sysfs (discrete) / `lspci` (integrated) |387| Apple Silicon | `system_profiler` (unified memory = VRAM) |388| Ascend | `npu-smi` |389 390---391 392## Common Patterns393 394### "What can I run on my 16GB M2 Mac?"395```sh396llmfit fit --perfect -n 10397# or interactively398llmfit399# press 'f' to filter to Perfect fit400```401 402### "I have a 3090 (24GB VRAM), what coding models fit?"403```sh404llmfit recommend --json --use-case coding | jq '.models[]'405# or with manual override if detection fails406llmfit --memory=24G recommend --json --use-case coding407```408 409### "Can Llama 70B run on my machine?"410```sh411llmfit info "Llama-3.1-70B"412# Plan what hardware you'd need413llmfit plan "Llama-3.1-70B" --context 4096 --json414```415 416### "Show me only models already installed in Ollama"417```sh418llmfit419# press 'a' to cycle to Installed filter420# or421llmfit fit -n 20  # run, press 'i' in TUI for installed-first422```423 424### "Script: find best model and start Ollama"425```bash426MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')427ollama serve &428ollama run "$MODEL"429```430 431### "API: poll node capabilities for cluster scheduler"432```bash433# Check node, get top 3 good+ models for reasoning434curl -s "http://node1:8787/api/v1/models/top?limit=3&min_fit=good&use_case=reasoning" | \435  jq '.models[].name'436```437 438---439 440## Troubleshooting441 442**GPU not detected / wrong VRAM reported**443```sh444# Verify detection445llmfit system446 447# Manual override448llmfit --memory=24G --cli449```450 451**`nvidia-smi` not found but you have an NVIDIA GPU**452```sh453# Install CUDA toolkit or nvidia-utils, then retry454# Or override manually:455llmfit --memory=8G fit --perfect456```457 458**Models show as too_tight but you have enough RAM**459```sh460# llmfit may be using context-inflated estimates; cap context461llmfit --max-context 2048 fit --perfect -n 10462```463 464**REST API: test endpoints**465```sh466# Spawn server and run validation suite467python3 scripts/test_api.py --spawn468 469# Test already-running server470python3 scripts/test_api.py --base-url http://127.0.0.1:8787471```472 473**Apple Silicon: VRAM shows as system RAM (expected)**474```sh475# This is correct — Apple Silicon uses unified memory476# llmfit accounts for this automatically477llmfit system  # should show backend: Metal478```479 480**Context length environment variable**481```sh482export OLLAMA_CONTEXT_LENGTH=4096483llmfit recommend --json  # uses 4096 as context cap484```
Related skills
Agency Agents Ai Specialists

Install Agency Agents Ai Specialists skill for Claude Code from aradotso/trending-skills.
Agent Browser Automation

Install Agent Browser Automation skill for Claude Code from aradotso/trending-skills.
Antigravity Manager

Install Antigravity Manager skill for Claude Code from aradotso/trending-skills.