Install
Terminal · npx$
npx skills add https://github.com/vercel-labs/agent-skills --skill vercel-react-best-practicesWorks with Paperclip
How Llmfit Hardware Model Matcher fits into a Paperclip company.
Llmfit Hardware Model Matcher drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
S
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore packSource file
SKILL.md484 linesExpandCollapse
---name: llmfit-hardware-model-matcherdescription: Terminal tool that detects your hardware and recommends which LLM models will actually run well on your systemtriggers: - "find LLM models that fit my hardware" - "which AI models can I run locally" - "recommend models for my GPU RAM" - "check if a model will run on my machine" - "llmfit model recommendations" - "local LLM hardware compatibility" - "what LLM fits my system specs" - "score models for my computer"--- # llmfit Hardware Model Matcher > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. llmfit detects your system's RAM, CPU, and GPU then scores hundreds of LLM models across quality, speed, fit, and context dimensions — telling you exactly which models will run well on your hardware. It ships with an interactive TUI and a CLI, supports multi-GPU, MoE architectures, dynamic quantization, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner). --- ## Installation ### macOS / Linux (Homebrew)```shbrew install llmfit``` ### Quick install script```shcurl -fsSL https://llmfit.axjns.dev/install.sh | sh # Without sudo, installs to ~/.local/bincurl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local``` ### Windows (Scoop)```shscoop install llmfit``` ### Docker / Podman```shdocker run ghcr.io/alexsjones/llmfit # With jq for scriptingpodman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'``` ### From source (Rust)```shgit clone https://github.com/AlexsJones/llmfit.gitcd llmfitcargo build --release# binary at target/release/llmfit``` --- ## Core Concepts - **Fit tiers**: `perfect` (runs great), `good` (runs well), `marginal` (runs but tight), `too_tight` (won't run)- **Scoring dimensions**: quality, speed (tok/s estimate), fit (memory headroom), context capacity- **Run modes**: GPU, CPU+GPU offload, CPU-only, MoE- **Quantization**: automatically selects best quant (e.g. Q4_K_M, Q5_K_S, mlx-4bit) for your hardware- **Providers**: Ollama, llama.cpp, MLX, Docker Model Runner --- ## Key Commands ### Launch Interactive TUI```shllmfit``` ### CLI Table Output```shllmfit --cli``` ### Show System Hardware Detection```shllmfit systemllmfit --json system # JSON output``` ### List All Models```shllmfit list``` ### Search Models```shllmfit search "llama 8b"llmfit search "mistral"llmfit search "qwen coding"``` ### Fit Analysis```sh# All runnable models ranked by fitllmfit fit # Only perfect fits, top 5llmfit fit --perfect -n 5 # JSON outputllmfit --json fit -n 10``` ### Model Detail```shllmfit info "Mistral-7B"llmfit info "Llama-3.1-70B"``` ### Recommendations```sh# Top 5 recommendations (JSON default)llmfit recommend --json --limit 5 # Filter by use case: general, coding, reasoning, chat, multimodal, embeddingllmfit recommend --json --use-case coding --limit 3llmfit recommend --json --use-case reasoning --limit 5``` ### Hardware Planning (invert: what hardware do I need?)```shllmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bitllmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --jsonllmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json``` ### REST API Server (for cluster scheduling)```shllmfit servellmfit serve --host 0.0.0.0 --port 8787``` --- ## Hardware Overrides When autodetection fails (VMs, broken nvidia-smi, passthrough setups): ```sh# Override GPU VRAMllmfit --memory=32Gllmfit --memory=24G --clillmfit --memory=24G fit --perfect -n 5llmfit --memory=24G recommend --json # Megabytesllmfit --memory=32000M # Works with any subcommandllmfit --memory=16G info "Llama-3.1-70B"``` Accepted suffixes: `G`/`GB`/`GiB`, `M`/`MB`/`MiB`, `T`/`TB`/`TiB` (case-insensitive). ### Context Length Cap```sh# Estimate memory fit at 4K contextllmfit --max-context 4096 --cli # With subcommandsllmfit --max-context 8192 fit --perfect -n 5llmfit --max-context 16384 recommend --json --limit 5 # Environment variable alternativeexport OLLAMA_CONTEXT_LENGTH=8192llmfit recommend --json``` --- ## REST API Reference Start the server:```shllmfit serve --host 0.0.0.0 --port 8787``` ### Endpoints ```sh# Health checkcurl http://localhost:8787/health # Node hardware infocurl http://localhost:8787/api/v1/system # Full model list with filterscurl "http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20" # Top runnable models for this node (key scheduling endpoint)curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding" # Search by model name/providercurl "http://localhost:8787/api/v1/models/Mistral?runtime=any"``` ### Query Parameters for `/models` and `/models/top` | Param | Values | Description ||---|---|---|| `limit` / `n` | integer | Max rows returned || `min_fit` | `perfect\|good\|marginal\|too_tight` | Minimum fit tier || `perfect` | `true\|false` | Force perfect-only || `runtime` | `any\|mlx\|llamacpp` | Filter by runtime || `use_case` | `general\|coding\|reasoning\|chat\|multimodal\|embedding` | Use case filter || `provider` | string | Substring match on provider || `search` | string | Free-text across name/provider/size/use-case || `sort` | `score\|tps\|params\|mem\|ctx\|date\|use_case` | Sort column || `include_too_tight` | `true\|false` | Include non-runnable models || `max_context` | integer | Per-request context cap | --- ## Scripting & Automation Examples ### Bash: Get top coding models as JSON```bash#!/bin/bash# Get top 3 coding models that fit perfectlyllmfit recommend --json --use-case coding --limit 3 | \ jq -r '.models[] | "\(.name) (\(.score)) - \(.quantization)"'``` ### Bash: Check if a specific model fits```bash#!/bin/bashMODEL="Mistral-7B"RESULT=$(llmfit info "$MODEL" --json 2>/dev/null)FIT=$(echo "$RESULT" | jq -r '.fit')if [[ "$FIT" == "perfect" || "$FIT" == "good" ]]; then echo "$MODEL will run well (fit: $FIT)"else echo "$MODEL may not run well (fit: $FIT)"fi``` ### Bash: Auto-pull top Ollama model```bash#!/bin/bash# Get the top fitting model name and pull it with OllamaTOP_MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')echo "Pulling: $TOP_MODEL"ollama pull "$TOP_MODEL"``` ### Python: Query the REST API```pythonimport requests BASE_URL = "http://localhost:8787" def get_system_info(): resp = requests.get(f"{BASE_URL}/api/v1/system") return resp.json() def get_top_models(use_case="coding", limit=5, min_fit="good"): params = { "use_case": use_case, "limit": limit, "min_fit": min_fit, "sort": "score" } resp = requests.get(f"{BASE_URL}/api/v1/models/top", params=params) return resp.json() def search_models(query, runtime="any"): resp = requests.get( f"{BASE_URL}/api/v1/models/{query}", params={"runtime": runtime} ) return resp.json() # Example usagesystem = get_system_info()print(f"GPU: {system.get('gpu_name')} | VRAM: {system.get('vram_gb')}GB") models = get_top_models(use_case="reasoning", limit=3)for m in models.get("models", []): print(f"{m['name']}: score={m['score']}, fit={m['fit']}, quant={m['quantization']}")``` ### Python: Hardware-aware model selector for agents```pythonimport subprocessimport json def get_best_model_for_task(use_case: str, min_fit: str = "good") -> dict: """Use llmfit to select the best model for a given task.""" result = subprocess.run( ["llmfit", "recommend", "--json", "--use-case", use_case, "--limit", "1"], capture_output=True, text=True ) data = json.loads(result.stdout) models = data.get("models", []) return models[0] if models else None def plan_hardware_requirements(model_name: str, context: int = 4096) -> dict: """Get hardware requirements for running a specific model.""" result = subprocess.run( ["llmfit", "plan", model_name, "--context", str(context), "--json"], capture_output=True, text=True ) return json.loads(result.stdout) # Select best coding modelbest = get_best_model_for_task("coding")if best: print(f"Best coding model: {best['name']}") print(f" Quantization: {best['quantization']}") print(f" Estimated tok/s: {best['tps']}") print(f" Memory usage: {best['mem_pct']}%") # Plan hardware for a specific modelplan = plan_hardware_requirements("Qwen/Qwen3-4B-MLX-4bit", context=8192)print(f"Min VRAM needed: {plan['hardware']['min_vram_gb']}GB")print(f"Recommended VRAM: {plan['hardware']['recommended_vram_gb']}GB")``` ### Docker Compose: Node scheduler pattern```yamlversion: "3.8"services: llmfit-api: image: ghcr.io/alexsjones/llmfit command: serve --host 0.0.0.0 --port 8787 ports: - "8787:8787" environment: - OLLAMA_CONTEXT_LENGTH=8192 devices: - /dev/nvidia0:/dev/nvidia0 # pass GPU through``` --- ## TUI Key Reference | Key | Action ||---|---|| `↑`/`↓` or `j`/`k` | Navigate models || `/` | Search (name, provider, params, use case) || `Esc`/`Enter` | Exit search || `Ctrl-U` | Clear search || `f` | Cycle fit filter: All → Runnable → Perfect → Good → Marginal || `a` | Cycle availability: All → GGUF Avail → Installed || `s` | Cycle sort: Score → Params → Mem% → Ctx → Date → Use Case || `t` | Cycle color theme (auto-saved) || `v` | Visual mode (multi-select for comparison) || `V` | Select mode (column-based filtering) || `p` | Plan mode (what hardware needed for this model?) || `P` | Provider filter popup || `U` | Use-case filter popup || `C` | Capability filter popup || `m` | Mark model for comparison || `c` | Compare view (marked vs selected) || `d` | Download model (via detected runtime) || `r` | Refresh installed models from runtimes || `Enter` | Toggle detail view || `g`/`G` | Jump to top/bottom || `q` | Quit | ### Themes`t` cycles: Default → Dracula → Solarized → Nord → Monokai → Gruvbox Theme saved to `~/.config/llmfit/theme` --- ## GPU Detection Details | GPU Vendor | Detection Method ||---|---|| NVIDIA | `nvidia-smi` (multi-GPU, aggregates VRAM) || AMD | `rocm-smi` || Intel Arc | sysfs (discrete) / `lspci` (integrated) || Apple Silicon | `system_profiler` (unified memory = VRAM) || Ascend | `npu-smi` | --- ## Common Patterns ### "What can I run on my 16GB M2 Mac?"```shllmfit fit --perfect -n 10# or interactivelyllmfit# press 'f' to filter to Perfect fit``` ### "I have a 3090 (24GB VRAM), what coding models fit?"```shllmfit recommend --json --use-case coding | jq '.models[]'# or with manual override if detection failsllmfit --memory=24G recommend --json --use-case coding``` ### "Can Llama 70B run on my machine?"```shllmfit info "Llama-3.1-70B"# Plan what hardware you'd needllmfit plan "Llama-3.1-70B" --context 4096 --json``` ### "Show me only models already installed in Ollama"```shllmfit# press 'a' to cycle to Installed filter# orllmfit fit -n 20 # run, press 'i' in TUI for installed-first``` ### "Script: find best model and start Ollama"```bashMODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')ollama serve &ollama run "$MODEL"``` ### "API: poll node capabilities for cluster scheduler"```bash# Check node, get top 3 good+ models for reasoningcurl -s "http://node1:8787/api/v1/models/top?limit=3&min_fit=good&use_case=reasoning" | \ jq '.models[].name'``` --- ## Troubleshooting **GPU not detected / wrong VRAM reported**```sh# Verify detectionllmfit system # Manual overridellmfit --memory=24G --cli``` **`nvidia-smi` not found but you have an NVIDIA GPU**```sh# Install CUDA toolkit or nvidia-utils, then retry# Or override manually:llmfit --memory=8G fit --perfect``` **Models show as too_tight but you have enough RAM**```sh# llmfit may be using context-inflated estimates; cap contextllmfit --max-context 2048 fit --perfect -n 10``` **REST API: test endpoints**```sh# Spawn server and run validation suitepython3 scripts/test_api.py --spawn # Test already-running serverpython3 scripts/test_api.py --base-url http://127.0.0.1:8787``` **Apple Silicon: VRAM shows as system RAM (expected)**```sh# This is correct — Apple Silicon uses unified memory# llmfit accounts for this automaticallyllmfit system # should show backend: Metal``` **Context length environment variable**```shexport OLLAMA_CONTEXT_LENGTH=4096llmfit recommend --json # uses 4096 as context cap```Related skills
Agency Agents Ai Specialists
Install Agency Agents Ai Specialists skill for Claude Code from aradotso/trending-skills.
Agent Browser Automation
Install Agent Browser Automation skill for Claude Code from aradotso/trending-skills.
Antigravity Manager
Install Antigravity Manager skill for Claude Code from aradotso/trending-skills.