Llmfit Hardware Model Matcher

Name: Llmfit Hardware Model Matcher
Author: aradotso

aradotso/trending-skills

1.4k installs
66 repo stars
Updated July 9, 2026
aradotso/trending-skills

llmfit-hardware-model-matcher is an agent skill that terminal tool that detects your hardware and recommends which llm models will actually run well on your system.

About

llmfit-hardware-model-matcher is an agent skill from aradotso/trending-skills that terminal tool that detects your hardware and recommends which llm models will actually run well on your system. # llmfit Hardware Model Matcher > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. llmfit detects your system's RAM, CPU, and GPU then scores hundreds of LLM models across quality, speed, fit, and context dimensions — telling you exactly which models will run well on your hardware. It ships with an interactive TUI and a CLI, supp Developers invoke llmfit-hardware-model-matcher during idea/discover work for ai & agent building tasks. The skill documents triggers, prerequisites, and step-by-step workflows grounded in SKILL.md. Compatible with Claude Code, Cursor, and Codex agent runtimes that load marketplace skills.

llmfit Hardware Model Matcher
Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.
curl -fsSL https://llmfit.axjns.dev/install.sh | sh
Without sudo, installs to ~/.local/bin
curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local

Llmfit Hardware Model Matcher by the numbers

1,368 all-time installs (skills.sh)
+7 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #842 of 16,659 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: CRITICAL risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

llmfit-hardware-model-matcher capabilities & compatibility

Capabilities: llmfit hardware model matcher · skill by [ara.so](https://ara.so) — daily 2026 s · curl fssl https://llmfit.axjns.dev/install.sh | · without sudo, installs to ~/.local/bin
Use cases: orchestration

From the docs

What llmfit-hardware-model-matcher says it does

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

SKILL.md

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

SKILL.md

curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local

SKILL.md

npx skills add https://github.com/aradotso/trending-skills --skill llmfit-hardware-model-matcher

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/aradotso/trending-skills/llmfit-hardware-model-matcher.svg)](https://skillselion.com/skills/aradotso/trending-skills/llmfit-hardware-model-matcher)

Installs	1.4k
repo stars	★ 66
Security audit	0 / 3 scanners passed
Last updated	July 9, 2026
Repository	aradotso/trending-skills ↗

What it does

Terminal tool that detects your hardware and recommends which LLM models will actually run well on your system

Who is it for?

Developers working on ai & agent building during idea tasks.

Skip if: Tasks outside AI & Agent Building scope described in SKILL.md.

When should I use this skill?

Terminal tool that detects your hardware and recommends which LLM models will actually run well on your system

What you get

Completed ai & agent building workflow aligned with SKILL.md steps.

Ranked local LLM model shortlist
Hardware compatibility scores

By the numbers

Scores hundreds of LLM models against detected hardware
Reads RAM, CPU, and GPU specs from the host system

Files

SKILL.mdMarkdownGitHub ↗

llmfit Hardware Model Matcher

Skill by ara.so — Daily 2026 Skills collection.

llmfit detects your system's RAM, CPU, and GPU then scores hundreds of LLM models across quality, speed, fit, and context dimensions — telling you exactly which models will run well on your hardware. It ships with an interactive TUI and a CLI, supports multi-GPU, MoE architectures, dynamic quantization, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner).

---

Installation

macOS / Linux (Homebrew)

brew install llmfit

Quick install script

curl -fsSL https://llmfit.axjns.dev/install.sh | sh

# Without sudo, installs to ~/.local/bin
curl -fsSL https://llmfit.axjns.dev/install.sh | sh -s -- --local

Windows (Scoop)

scoop install llmfit

Docker / Podman

docker run ghcr.io/alexsjones/llmfit

# With jq for scripting
podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'

From source (Rust)

git clone https://github.com/AlexsJones/llmfit.git
cd llmfit
cargo build --release
# binary at target/release/llmfit

---

Core Concepts

Fit tiers: perfect (runs great), good (runs well), marginal (runs but tight), too_tight (won't run)
Scoring dimensions: quality, speed (tok/s estimate), fit (memory headroom), context capacity
Run modes: GPU, CPU+GPU offload, CPU-only, MoE
Quantization: automatically selects best quant (e.g. Q4_K_M, Q5_K_S, mlx-4bit) for your hardware
Providers: Ollama, llama.cpp, MLX, Docker Model Runner

---

Key Commands

Launch Interactive TUI

llmfit

CLI Table Output

llmfit --cli

Show System Hardware Detection

llmfit system
llmfit --json system   # JSON output

List All Models

llmfit list

Search Models

llmfit search "llama 8b"
llmfit search "mistral"
llmfit search "qwen coding"

Fit Analysis

# All runnable models ranked by fit
llmfit fit

# Only perfect fits, top 5
llmfit fit --perfect -n 5

# JSON output
llmfit --json fit -n 10

Model Detail

llmfit info "Mistral-7B"
llmfit info "Llama-3.1-70B"

Recommendations

# Top 5 recommendations (JSON default)
llmfit recommend --json --limit 5

# Filter by use case: general, coding, reasoning, chat, multimodal, embedding
llmfit recommend --json --use-case coding --limit 3
llmfit recommend --json --use-case reasoning --limit 5

Hardware Planning (invert: what hardware do I need?)

llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json
llmfit plan "Qwen/Qwen2.5-Coder-0.5B-Instruct" --context 8192 --json

REST API Server (for cluster scheduling)

llmfit serve
llmfit serve --host 0.0.0.0 --port 8787

---

Hardware Overrides

When autodetection fails (VMs, broken nvidia-smi, passthrough setups):

# Override GPU VRAM
llmfit --memory=32G
llmfit --memory=24G --cli
llmfit --memory=24G fit --perfect -n 5
llmfit --memory=24G recommend --json

# Megabytes
llmfit --memory=32000M

# Works with any subcommand
llmfit --memory=16G info "Llama-3.1-70B"

Accepted suffixes: G/GB/GiB, M/MB/MiB, T/TB/TiB (case-insensitive).

Context Length Cap

# Estimate memory fit at 4K context
llmfit --max-context 4096 --cli

# With subcommands
llmfit --max-context 8192 fit --perfect -n 5
llmfit --max-context 16384 recommend --json --limit 5

# Environment variable alternative
export OLLAMA_CONTEXT_LENGTH=8192
llmfit recommend --json

---

REST API Reference

Start the server:

llmfit serve --host 0.0.0.0 --port 8787

Endpoints

# Health check
curl http://localhost:8787/health

# Node hardware info
curl http://localhost:8787/api/v1/system

# Full model list with filters
curl "http://localhost:8787/api/v1/models?min_fit=marginal&runtime=llamacpp&sort=score&limit=20"

# Top runnable models for this node (key scheduling endpoint)
curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"

# Search by model name/provider
curl "http://localhost:8787/api/v1/models/Mistral?runtime=any"

Query Parameters for `/models` and `/models/top`

Param	Values	Description
`limit` / `n`	integer	Max rows returned
`min_fit`	`perfect\	good\
`perfect`	`true\	false`
`runtime`	`any\	mlx\
`use_case`	`general\	coding\
`provider`	string	Substring match on provider
`search`	string	Free-text across name/provider/size/use-case
`sort`	`score\	tps\
`include_too_tight`	`true\	false`
`max_context`	integer	Per-request context cap

---

Scripting & Automation Examples

Bash: Get top coding models as JSON

#!/bin/bash
# Get top 3 coding models that fit perfectly
llmfit recommend --json --use-case coding --limit 3 | \
  jq -r '.models[] | "\(.name) (\(.score)) - \(.quantization)"'

Bash: Check if a specific model fits

#!/bin/bash
MODEL="Mistral-7B"
RESULT=$(llmfit info "$MODEL" --json 2>/dev/null)
FIT=$(echo "$RESULT" | jq -r '.fit')
if [[ "$FIT" == "perfect" || "$FIT" == "good" ]]; then
  echo "$MODEL will run well (fit: $FIT)"
else
  echo "$MODEL may not run well (fit: $FIT)"
fi

Bash: Auto-pull top Ollama model

#!/bin/bash
# Get the top fitting model name and pull it with Ollama
TOP_MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')
echo "Pulling: $TOP_MODEL"
ollama pull "$TOP_MODEL"

Python: Query the REST API

import requests

BASE_URL = "http://localhost:8787"

def get_system_info():
    resp = requests.get(f"{BASE_URL}/api/v1/system")
    return resp.json()

def get_top_models(use_case="coding", limit=5, min_fit="good"):
    params = {
        "use_case": use_case,
        "limit": limit,
        "min_fit": min_fit,
        "sort": "score"
    }
    resp = requests.get(f"{BASE_URL}/api/v1/models/top", params=params)
    return resp.json()

def search_models(query, runtime="any"):
    resp = requests.get(
        f"{BASE_URL}/api/v1/models/{query}",
        params={"runtime": runtime}
    )
    return resp.json()

# Example usage
system = get_system_info()
print(f"GPU: {system.get('gpu_name')} | VRAM: {system.get('vram_gb')}GB")

models = get_top_models(use_case="reasoning", limit=3)
for m in models.get("models", []):
    print(f"{m['name']}: score={m['score']}, fit={m['fit']}, quant={m['quantization']}")

Python: Hardware-aware model selector for agents

import subprocess
import json

def get_best_model_for_task(use_case: str, min_fit: str = "good") -> dict:
    """Use llmfit to select the best model for a given task."""
    result = subprocess.run(
        ["llmfit", "recommend", "--json", "--use-case", use_case, "--limit", "1"],
        capture_output=True,
        text=True
    )
    data = json.loads(result.stdout)
    models = data.get("models", [])
    return models[0] if models else None

def plan_hardware_requirements(model_name: str, context: int = 4096) -> dict:
    """Get hardware requirements for running a specific model."""
    result = subprocess.run(
        ["llmfit", "plan", model_name, "--context", str(context), "--json"],
        capture_output=True,
        text=True
    )
    return json.loads(result.stdout)

# Select best coding model
best = get_best_model_for_task("coding")
if best:
    print(f"Best coding model: {best['name']}")
    print(f"  Quantization: {best['quantization']}")
    print(f"  Estimated tok/s: {best['tps']}")
    print(f"  Memory usage: {best['mem_pct']}%")

# Plan hardware for a specific model
plan = plan_hardware_requirements("Qwen/Qwen3-4B-MLX-4bit", context=8192)
print(f"Min VRAM needed: {plan['hardware']['min_vram_gb']}GB")
print(f"Recommended VRAM: {plan['hardware']['recommended_vram_gb']}GB")

Docker Compose: Node scheduler pattern

version: "3.8"
services:
  llmfit-api:
    image: ghcr.io/alexsjones/llmfit
    command: serve --host 0.0.0.0 --port 8787
    ports:
      - "8787:8787"
    environment:
      - OLLAMA_CONTEXT_LENGTH=8192
    devices:
      - /dev/nvidia0:/dev/nvidia0  # pass GPU through

---

TUI Key Reference

Key	Action
`↑`/`↓` or `j`/`k`	Navigate models
`/`	Search (name, provider, params, use case)
`Esc`/`Enter`	Exit search
`Ctrl-U`	Clear search
`f`	Cycle fit filter: All → Runnable → Perfect → Good → Marginal
`a`	Cycle availability: All → GGUF Avail → Installed
`s`	Cycle sort: Score → Params → Mem% → Ctx → Date → Use Case
`t`	Cycle color theme (auto-saved)
`v`	Visual mode (multi-select for comparison)
`V`	Select mode (column-based filtering)
`p`	Plan mode (what hardware needed for this model?)
`P`	Provider filter popup
`U`	Use-case filter popup
`C`	Capability filter popup
`m`	Mark model for comparison
`c`	Compare view (marked vs selected)
`d`	Download model (via detected runtime)
`r`	Refresh installed models from runtimes
`Enter`	Toggle detail view
`g`/`G`	Jump to top/bottom
`q`	Quit

Themes

t cycles: Default → Dracula → Solarized → Nord → Monokai → Gruvbox Theme saved to ~/.config/llmfit/theme

---

GPU Detection Details

GPU Vendor	Detection Method
NVIDIA	`nvidia-smi` (multi-GPU, aggregates VRAM)
AMD	`rocm-smi`
Intel Arc	sysfs (discrete) / `lspci` (integrated)
Apple Silicon	`system_profiler` (unified memory = VRAM)
Ascend	`npu-smi`

---

Common Patterns

"What can I run on my 16GB M2 Mac?"

llmfit fit --perfect -n 10
# or interactively
llmfit
# press 'f' to filter to Perfect fit

"I have a 3090 (24GB VRAM), what coding models fit?"

llmfit recommend --json --use-case coding | jq '.models[]'
# or with manual override if detection fails
llmfit --memory=24G recommend --json --use-case coding

"Can Llama 70B run on my machine?"

llmfit info "Llama-3.1-70B"
# Plan what hardware you'd need
llmfit plan "Llama-3.1-70B" --context 4096 --json

"Show me only models already installed in Ollama"

llmfit
# press 'a' to cycle to Installed filter
# or
llmfit fit -n 20  # run, press 'i' in TUI for installed-first

"Script: find best model and start Ollama"

MODEL=$(llmfit recommend --json --limit 1 | jq -r '.models[0].name')
ollama serve &
ollama run "$MODEL"

"API: poll node capabilities for cluster scheduler"

# Check node, get top 3 good+ models for reasoning
curl -s "http://node1:8787/api/v1/models/top?limit=3&min_fit=good&use_case=reasoning" | \
  jq '.models[].name'

---

Troubleshooting

GPU not detected / wrong VRAM reported

# Verify detection
llmfit system

# Manual override
llmfit --memory=24G --cli

`nvidia-smi` not found but you have an NVIDIA GPU

# Install CUDA toolkit or nvidia-utils, then retry
# Or override manually:
llmfit --memory=8G fit --perfect

Models show as too_tight but you have enough RAM

# llmfit may be using context-inflated estimates; cap context
llmfit --max-context 2048 fit --perfect -n 10

REST API: test endpoints

# Spawn server and run validation suite
python3 scripts/test_api.py --spawn

# Test already-running server
python3 scripts/test_api.py --base-url http://127.0.0.1:8787

Apple Silicon: VRAM shows as system RAM (expected)

# This is correct — Apple Silicon uses unified memory
# llmfit accounts for this automatically
llmfit system  # should show backend: Metal

Context length environment variable

export OLLAMA_CONTEXT_LENGTH=4096
llmfit recommend --json  # uses 4096 as context cap

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

How it compares

Pick llmfit-hardware-model-matcher when local inference fit matters; use cloud API skills when hardware limits are irrelevant.

FAQ

What does llmfit-hardware-model-matcher do?

Terminal tool that detects your hardware and recommends which LLM models will actually run well on your system

When should I use llmfit-hardware-model-matcher?

During idea discover work for ai & agent building.

Is llmfit-hardware-model-matcher safe to install?

Review the Security Audits panel on this listing before production use.

AI & Agent Buildingagents

About

Llmfit Hardware Model Matcher by the numbers

llmfit-hardware-model-matcher capabilities & compatibility

What llmfit-hardware-model-matcher says it does

Add your badge

What it does

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

llmfit Hardware Model Matcher

Installation

macOS / Linux (Homebrew)

Quick install script

Windows (Scoop)

Docker / Podman

From source (Rust)

Core Concepts

Key Commands

Launch Interactive TUI

CLI Table Output

Show System Hardware Detection

List All Models

Search Models

Fit Analysis

Model Detail

Recommendations

Hardware Planning (invert: what hardware do I need?)

REST API Server (for cluster scheduling)

Hardware Overrides

Context Length Cap

REST API Reference

Endpoints

Query Parameters for /models and /models/top

Scripting & Automation Examples

Bash: Get top coding models as JSON

Bash: Check if a specific model fits

Bash: Auto-pull top Ollama model

Python: Query the REST API

Python: Hardware-aware model selector for agents

Docker Compose: Node scheduler pattern

TUI Key Reference

Themes

GPU Detection Details

Common Patterns

"What can I run on my 16GB M2 Mac?"

"I have a 3090 (24GB VRAM), what coding models fit?"

"Can Llama 70B run on my machine?"

"Show me only models already installed in Ollama"

"Script: find best model and start Ollama"

"API: poll node capabilities for cluster scheduler"

Troubleshooting

Related skills

How it compares

FAQ

What does llmfit-hardware-model-matcher do?

When should I use llmfit-hardware-model-matcher?

Is llmfit-hardware-model-matcher safe to install?

This week in AI coding

Query Parameters for `/models` and `/models/top`