
Mac Code Local Ai Agent
Install and run mac-code so a local 35B-class model on Apple Silicon routes prompts to search, shell, and file tools without a paid cloud agent.
Overview
mac-code-local-ai-agent is a journey-wide agent skill that sets up a free local coding agent on Apple Silicon with llama.cpp or MLX and tool routing—usable whenever a solo builder needs on-device agent assistance before
Install
npx skills add https://github.com/aradotso/trending-skills --skill mac-code-local-ai-agentWhat is this skill?
- LLM-as-router classifies prompts into search, shell, or chat and dispatches matching tools
- Supports llama.cpp (reported ~30 tok/s on 35B MoE IQ2_M) and MLX (64K context, persistent KV cache, R2 sync)
- 35B MoE on 16 GB RAM via IQ2_M; optional MoE Expert Sniper path for full Q4 at lower tok/s
- Built-in tools: DuckDuckGo search, shell execution, and file read/write
- Documented triggers include out-of-RAM inference troubleshooting on Mac
- Documents 35B MoE near 30 tok/s via llama.cpp IQ2_M on 16 GB RAM
- MLX path advertises 64K context with persistent KV cache
- MoE Expert Sniper path cites ~1.54 tok/s with ~1.42 GB RAM for full Q4 35B
Adoption & trust: 751 installs on skills.sh; 31 GitHub stars; 0/3 security scanners passed (skills.sh audits).
What problem does it solve?
You want Claude Code–style agent workflows on a Mac without monthly API bills, but local 35B models, RAM limits, and backend choice feel opaque.
Who is it for?
Indie developers on M-series Macs with 16 GB+ who accept setup time in exchange for local inference and built-in tool routing.
Skip if: Teams needing guaranteed SLA, multi-user seat management, or non-Apple hardware without porting the stack themselves.
When should I use this skill?
User asks to set up mac-code, run a local LLM coding agent on Mac, use llama.cpp or MLX agents with tools, or fix out-of-RAM local inference.
What do I get? / Deliverables
You finish a working mac-code CLI that routes tasks through a local LLM to search, shell, and file tools at documented throughput and context targets for your RAM budget.
- Working mac-code CLI with chosen backend and model weights
- Configured tool routing for search, shell, and file operations
Recommended Skills
Journey fit
Useful at every journey phase - explore requirements and options before committing to a direction.
Where it fits
Install mac-code with llama.cpp IQ2_M so daily edits stay on a 35B MoE model inside 16 GB RAM.
Spike a feature branch using shell and file tools without sending proprietary code to a hosted agent.
Run DuckDuckGo-backed search locally while exploring competitor repos before you commit to a stack.
Triage small fixes on a laptop-only environment during incidents when cloud APIs are unavailable.
Use chat-only routing for read-only code review passes before opening a PR from the same local agent.
How it compares
Use for a local llama.cpp/MLX agent CLI instead of defaulting to hosted Claude Code or Cursor cloud for every coding session.
Common Questions / FAQ
Who is mac-code-local-ai-agent for?
Solo builders on Apple Silicon who want a zero-subscription coding agent with web search, shell, and file tools driven by local models.
When should I use mac-code-local-ai-agent?
Use in Build when wiring agent-tooling; in Validate when prototyping without API keys; in Operate when iterating on a laptop-only workflow; and in Idea when researching privately with local search tools.
Is mac-code-local-ai-agent safe to install?
Treat shell and filesystem tools as high risk—review the Security Audits panel on this page and constrain agent permissions before running on production repos or secrets.
SKILL.md
READMESKILL.md - Mac Code Local Ai Agent
# mac-code — Free Local AI Agent on Apple Silicon > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. Run a 35B reasoning model locally on your Mac for $0/month. mac-code is a CLI AI coding agent (Claude Code alternative) that routes tasks — web search, shell commands, file edits, chat — through a local LLM. Supports llama.cpp (30 tok/s) and MLX (64K context, persistent KV cache) backends on Apple Silicon. --- ## What It Does - **LLM-as-router**: The model classifies every prompt as `search`, `shell`, or `chat` and routes accordingly - **35B MoE at 30 tok/s** via llama.cpp + IQ2_M quantization (fits in 16 GB RAM) - **35B full Q4 on 16 GB** via custom MoE Expert Sniper (1.54 tok/s, only 1.42 GB RAM used) - **9B at 64K context** via quantized KV cache (`q4_0` keys/values) - **MLX backend** adds persistent KV cache save/load, context compression, R2 sync - **Tools**: DuckDuckGo search, shell execution, file read/write --- ## Installation ### Prerequisites ```bash brew install llama.cpp pip3 install rich ddgs huggingface-hub mlx-lm --break-system-packages ``` ### Clone the repo ```bash git clone https://github.com/walter-grace/mac-code cd mac-code ``` ### Download models **35B MoE — fast daily driver (10.6 GB, fits in 16 GB RAM):** ```bash mkdir -p ~/models python3 -c " from huggingface_hub import hf_hub_download hf_hub_download( 'unsloth/Qwen3.5-35B-A3B-GGUF', 'Qwen3.5-35B-A3B-UD-IQ2_M.gguf', local_dir='$HOME/models/' ) " ``` **9B — 64K context, long documents (5.3 GB):** ```bash python3 -c " from huggingface_hub import hf_hub_download hf_hub_download( 'unsloth/Qwen3.5-9B-GGUF', 'Qwen3.5-9B-Q4_K_M.gguf', local_dir='$HOME/models/' ) " ``` --- ## Starting the Backend ### Option A: llama.cpp + 35B MoE (recommended, 30 tok/s) ```bash llama-server \ --model ~/models/Qwen3.5-35B-A3B-UD-IQ2_M.gguf \ --port 8000 --host 127.0.0.1 \ --flash-attn on --ctx-size 12288 \ --cache-type-k q4_0 --cache-type-v q4_0 \ --n-gpu-layers 99 --reasoning off -np 1 -t 4 ``` ### Option B: llama.cpp + 9B (64K context) ```bash llama-server \ --model ~/models/Qwen3.5-9B-Q4_K_M.gguf \ --port 8000 --host 127.0.0.1 \ --flash-attn on --ctx-size 65536 \ --cache-type-k q4_0 --cache-type-v q4_0 \ --n-gpu-layers 99 --reasoning off -t 4 ``` ### Option C: MLX backend (persistent context, 9B) ```bash # Starts server on port 8000, downloads model on first run python3 mlx/mlx_engine.py ``` ### Start the agent (all options) ```bash python3 agent.py ``` --- ## Agent CLI Commands Inside the agent REPL, type `/` for all commands: | Command | Action | |---|---| | `/agent` | Agent mode with tools (default) | | `/raw` | Direct streaming, no tools | | `/model 9b` | Switch to 9B model (64K context) | | `/model 35b` | Switch to 35B MoE | | `/search <query>` | Quick DuckDuckGo search | | `/bench` | Run speed benchmark | | `/stats` | Session statistics | | `/cost` | Show cost savings vs cloud | | `/good` / `/bad` | Grade the last response | | `/improve` | View response grading stats | | `/clear` | Reset conversation | | `/quit` | Exit | ### Example prompts ``` > find all Python files modified in the last 7 days → routes to "shell", generates: find . -name "*.py" -mtime -7 > who won the NBA finals → routes to "search", queries DuckDuckGo, summarizes > explain how attention works → routes to "chat", streams directly ``` --- ## MLX Backend — Persistent KV Cache API The MLX engine exposes a REST API