
Cuda Kernels
Write, optimize, and integrate custom CUDA kernels for GPU-accelerated inference or training when PyTorch defaults are too slow for production LLM or ML workloads.
npx skills add https://github.com/huggingface/kernels --skill cuda-kernels| Installs | 139 |
|---|---|
| Repository | huggingface/kernels ↗ |