Cpu Gpu Performance

Name: Cpu Gpu Performance
Author: athola

athola/claude-night-market

Establish CPU and GPU baselines and throttle plan before builds, training, or retries that pin hardware for more than a minute.

Install

npx skills add https://github.com/athola/claude-night-market --skill cpu-gpu-performance

What is this skill?

Five required TodoWrite items: baseline, scope, instrument, throttle, and log
Session-start discipline alongside token-conservation hub dependency
Step flow: establish baseline → narrow scope → instrument → throttle/sequence → log decisions
Explicit trigger: any build, train, or test likely to pin CPU/GPU over one minute
Progressive loading with alwaysApply false—invoked when resource risk appears

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Recommended Skills

Azure Kubernetesmicrosoft/azure-skills

Azure Kubernetes skill supplies fix patterns for AKS Automatic compatibility—adding resource requests, dropping capabili…204k installs·1.2k stars

Github Actions Docsxixu-me/skills

github-actions-docs is a research-oriented agent skill that stops generic CI/CD guesses when the user’s question is real…187k installs·61 stars

Deploy To Vercelvercel-labs/agent-skills

Deploy to Vercel is an agent skill from Vercel Labs that walks solo builders through shipping a project to Vercel with d…67.1k installs·27.7k stars

Vercel Cli With Tokensvercel-labs/agent-skills

Vercel CLI with Tokens is a Vercel Labs agent skill that teaches coding agents how to deploy and manage projects when in…47.2k installs·27.7k stars

Turborepovercel/turborepo

Turborepo Skill is procedural guidance for Vercel’s Turborepo build system on JavaScript and TypeScript monorepos. It ta…36k installs·30.5k stars

Docker Expertsickn33/antigravity-awesome-skills

Docker Expert is an agent skill that acts as a hands-on containerization consultant for solo and indie builders shipping…18.7k installs·40.1k stars

Journey fit

Primary fit

OperateInfrastructure & cost

Canonical shelf is Operate/infra because the skill governs live machine capacity, but it is invoked at session start and before heavy work in any phase. Infra subphase fits baseline measurement, throttling, and sequencing on shared dev boxes or cloud GPUs—not app feature code.

Common Questions / FAQ

Is Cpu Gpu Performance safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

SKILL.md

READMESKILL.md - Cpu Gpu Performance

## Table of Contents

- [When to Use](#when-to-use)
- [Required TodoWrite Items](#required-todowrite-items)
- [Step 1: Establish Current Baseline](#step-1-establish-current-baseline)
- [Step 2: Narrow the Scope](#step-2-narrow-the-scope)
- [Step 3: Instrument Before You Optimize](#step-3-instrument-before-you-optimize)
- [Step 4: Throttle and Sequence Work](#step-4-throttle-and-sequence-work)
- [Step 5: Log Decisions and Next Steps](#step-5-log-decisions-and-next-steps)
- [Output Expectations](#output-expectations)


# CPU/GPU Performance Discipline

## When To Use
- At the beginning of every session (auto-load alongside `token-conservation`).
- Whenever you plan to build, train, or test anything that could pin CPU cores
  or GPUs for more than a minute.
- Before retrying a failing command that previously consumed significant resources.

## When NOT To Use

- Simple operations with no resource impact
- Quick single-file operations

## Required TodoWrite Items
1. `cpu-gpu-performance:baseline`
2. `cpu-gpu-performance:scope`
3. `cpu-gpu-performance:instrument`
4. `cpu-gpu-performance:throttle`
5. `cpu-gpu-performance:log`

## Step 1: Establish Current Baseline
- Capture current utilization:
  - `uptime`
  - `ps -eo pcpu,cmd | head`
  - `nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv`

  Note which hosts/GPUs are already busy.
- Record any CI/cluster budgets (time quotas, GPU hours) before launching work.
- Set a per-task CPU minute / GPU minute budget that respects those limits.

## Step 2: Narrow the Scope
- Avoid running "whole world" jobs after a small fix. Prefer diff-based
  or tag-based selective testing:
  - `pytest -k`
  - Bazel target patterns
  - `cargo test <module>`
- Batch low-level fixes so you can validate multiple changes with a single targeted command.
- For GPU jobs, favor unit-scale smoke inputs or lower epoch counts before
  scheduling the full training/eval sweep.

## Step 3: Instrument Before You Optimize
- Pick the right profiler/monitor:
  - CPU work:
    - `perf`
    - `intel vtune`
    - `cargo flamegraph`
    - language-specific profilers
  - GPU work:
    - `nvidia-smi dmon`
    - `nsys`
    - `nvprof`
    - DLProf
    - framework timeline tracers
- Capture kernel/ops timelines, memory footprints, and data pipeline latency
  so you have evidence when throttling or parallelizing.
- Record hot paths and I/O bottlenecks in notes so future reruns can jump straight to the culprit.

## Step 4: Throttle and Sequence Work
- Use `nice`, `ionice`, or Kubernetes/Slurm quotas to prevent starvation of shared nodes.
- Chain heavy tasks with guardrails:
  - Rerun only the failed test/module
  - Then (optionally) escalate to the next-wider shard
  - Reserve the full suite for the final gate
- Stagger GPU kernels (smaller batch sizes or gradient accumulation) when memory
  pressure risks eviction; prefer checkpoint/restore over restarts.

## Step 5: Log Decisions and Next Steps

Conclude by documenting the commands that were run and their resource cost
(duration, CPU%, GPU%), confirming whether they remained within the per-task
budget. If a full suite or long training run was necessary, justify why selective
or staged approaches were not feasible. Capture any follow-up tasks, such as
adding a new test marker or profiling documentation, to simplify future sessions.

## Output Expectations
- Brief summary covering:
  - baseline metrics
  - scope chosen
  - instrumentation captured
  - throttling tactics
  - follow-up items
- Concrete example(s) of what ran (e.g.):
  - "reran `pytest tests/test_orders.py -k test_refund` instead of `pytest -m slow`"
  - "profiled `nvidia-smi dmon` ou

What is this skill?

Five required TodoWrite items: baseline, scope, instrument, throttle, and log

Session-start discipline alongside token-conservation hub dependency

Step flow: establish baseline → narrow scope → instrument → throttle/sequence → log decisions

Explicit trigger: any build, train, or test likely to pin CPU/GPU over one minute

Progressive loading with alwaysApply false—invoked when resource risk appears

Adoption & trust: 1 installs on skills.sh; 304 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

Journey fit

Primary fit

OperateInfrastructure & cost

SKILL.md

READMESKILL.md - Cpu Gpu Performance

## Table of Contents

- [When to Use](#when-to-use)
- [Required TodoWrite Items](#required-todowrite-items)
- [Step 1: Establish Current Baseline](#step-1-establish-current-baseline)
- [Step 2: Narrow the Scope](#step-2-narrow-the-scope)
- [Step 3: Instrument Before You Optimize](#step-3-instrument-before-you-optimize)
- [Step 4: Throttle and Sequence Work](#step-4-throttle-and-sequence-work)
- [Step 5: Log Decisions and Next Steps](#step-5-log-decisions-and-next-steps)
- [Output Expectations](#output-expectations)


# CPU/GPU Performance Discipline

## When To Use
- At the beginning of every session (auto-load alongside `token-conservation`).
- Whenever you plan to build, train, or test anything that could pin CPU cores
  or GPUs for more than a minute.
- Before retrying a failing command that previously consumed significant resources.

## When NOT To Use

- Simple operations with no resource impact
- Quick single-file operations

## Required TodoWrite Items
1. `cpu-gpu-performance:baseline`
2. `cpu-gpu-performance:scope`
3. `cpu-gpu-performance:instrument`
4. `cpu-gpu-performance:throttle`
5. `cpu-gpu-performance:log`

## Step 1: Establish Current Baseline
- Capture current utilization:
  - `uptime`
  - `ps -eo pcpu,cmd | head`
  - `nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv`

  Note which hosts/GPUs are already busy.
- Record any CI/cluster budgets (time quotas, GPU hours) before launching work.
- Set a per-task CPU minute / GPU minute budget that respects those limits.

## Step 2: Narrow the Scope
- Avoid running "whole world" jobs after a small fix. Prefer diff-based
  or tag-based selective testing:
  - `pytest -k`
  - Bazel target patterns
  - `cargo test <module>`
- Batch low-level fixes so you can validate multiple changes with a single targeted command.
- For GPU jobs, favor unit-scale smoke inputs or lower epoch counts before
  scheduling the full training/eval sweep.

## Step 3: Instrument Before You Optimize
- Pick the right profiler/monitor:
  - CPU work:
    - `perf`
    - `intel vtune`
    - `cargo flamegraph`
    - language-specific profilers
  - GPU work:
    - `nvidia-smi dmon`
    - `nsys`
    - `nvprof`
    - DLProf
    - framework timeline tracers
- Capture kernel/ops timelines, memory footprints, and data pipeline latency
  so you have evidence when throttling or parallelizing.
- Record hot paths and I/O bottlenecks in notes so future reruns can jump straight to the culprit.

## Step 4: Throttle and Sequence Work
- Use `nice`, `ionice`, or Kubernetes/Slurm quotas to prevent starvation of shared nodes.
- Chain heavy tasks with guardrails:
  - Rerun only the failed test/module
  - Then (optionally) escalate to the next-wider shard
  - Reserve the full suite for the final gate
- Stagger GPU kernels (smaller batch sizes or gradient accumulation) when memory
  pressure risks eviction; prefer checkpoint/restore over restarts.

## Step 5: Log Decisions and Next Steps

Conclude by documenting the commands that were run and their resource cost
(duration, CPU%, GPU%), confirming whether they remained within the per-task
budget. If a full suite or long training run was necessary, justify why selective
or staged approaches were not feasible. Capture any follow-up tasks, such as
adding a new test marker or profiling documentation, to simplify future sessions.

## Output Expectations
- Brief summary covering:
  - baseline metrics
  - scope chosen
  - instrumentation captured
  - throttling tactics
  - follow-up items
- Concrete example(s) of what ran (e.g.):
  - "reran `pytest tests/test_orders.py -k test_refund` instead of `pytest -m slow`"
  - "profiled `nvidia-smi dmon` ou

Install

What is this skill?

Recommended Skills

Journey fit

Is Cpu Gpu Performance safe to install?

SKILL.md

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit

Is Cpu Gpu Performance safe to install?

SKILL.md