
Ako4all
Run a repeatable agentic loop that profiles, correctness-checks, and iteratively speeds up a CUDA/Triton/TileLang GPU kernel against a PyTorch reference.
Install
npx skills add https://github.com/TongmingLAIC/AKO4ALL --skill SKILL.mdWhat is this skill?
- Drives an agentic optimization loop aimed at maximum GPU kernel speedup
- Supports CUDA, Triton, TileLang, C++, and Python kernel entry points
- Handles workspace bootstrap, ncu profiling, correctness checking, and git commits per iteration
- Benchmarks optimized kernels against a PyTorch reference implementation
- Responds to AKO / AKO4ALL / AKO4X and “make this kernel faster” style requests
Adoption & trust: 268 GitHub stars.
Recommended Skills
Agent Browservercel-labs/agent-browser
Lark Imlarksuite/cli
Lark Calendarlarksuite/cli
Lark Sheetslarksuite/cli
Lark Vclarksuite/cli
Lark Contactlarksuite/cli
Journey fit
Primary fit
Canonical shelf is Ship because the skill’s purpose is measurable speedup and benchmarking before you treat the kernel as production-ready. Perf is the right subphase for ncu profiling, iteration logging, and chasing speedup versus a reference implementation.
SKILL.md
READMESKILL.md - Ako4all
Drive an agentic loop that iteratively optimizes a GPU kernel for maximum speedup. Use this skill whenever the user wants to optimize / speed up / benchmark a GPU kernel (CUDA, Triton, TileLang, C++, Python), mentions AKO / AKO4ALL / AKO4X / agentic kernel optimization, asks to "make this kernel faster", or has a kernel they want measured against a PyTorch reference. The skill handles setup, profiling (ncu), correctness checking, iteration logging, and git commits. Bootstraps a workspace in any directory the user points at. # ako4all { "name": "ako4all", "description": "Drive an agentic loop that iteratively optimizes a GPU kernel for maximum speedup. Use this skill whenever the user wants to optimize / speed up / benchmark a GPU kernel (CUDA, Triton, TileLang, C++, Python), mentions AKO / AKO4ALL / AKO4X / agentic kernel optimization, asks to \"make this kernel faster\", or has a kernel they want measured against a PyTorch reference. The skill handles setup, profiling (ncu), correctness checking, iteration logging, and git commits. Bootstraps a workspace in any directory the user points at." }