
M10 Performance
Profile and tune Rust services so release builds stay fast without guessing at hot paths.
Overview
m10-performance is an agent skill most often used in Ship (also Build/backend) that teaches profiling-first Rust optimization with criterion, flamegraph, and allocation-aware patterns.
Install
npx skills add https://github.com/zhanghandong/rust-skills --skill m10-performanceWhat is this skill?
- Profiling-first stack: flamegraph, cargo-instruments/heaptrack, valgrind cachegrind, and criterion via `cargo bench`
- Side-by-side criterion functions to compare parse_v1 vs parse_v2 on repeated inputs
- Avoid unnecessary allocations with `Cow<str>` when transforms may be no-ops
- Reuse `Vec` buffers with `clear()` inside loops instead of allocating per iteration
- BAD/GOOD Rust snippets for common allocation and reuse mistakes
- Documents CPU, memory, benchmark, and cache profiling tool commands in a single profiling-first section
- Includes paired BAD/GOOD examples for allocation avoidance and buffer reuse
Adoption & trust: 858 installs on skills.sh; 1.2k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You suspect your Rust code is slow or allocation-heavy but you are changing code without benchmarks or profiler evidence.
Who is it for?
Indie builders with an existing Rust binary or service who need a structured perf pass before release.
Skip if: Greenfield architecture choices with no code to profile yet, or teams that only need high-level Rust style advice without tooling.
When should I use this skill?
Optimizing Rust performance, setting up criterion or flamegraph, or applying allocation reuse patterns on hot paths.
What do I get? / Deliverables
You run targeted CPU/memory benchmarks and apply documented allocation reuse and `Cow` patterns so improvements are measurable before merge.
- Criterion benchmark functions comparing candidate implementations
- Profiler-backed list of hotspots and applied optimization patterns in code
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Ship → perf because the guide centers on benchmarking, flamegraphs, and release profiling before you call performance good enough to ship. Subphase perf matches criterion benches, cachegrind/heap tooling, and allocation patterns aimed at measurable latency and memory wins.
Where it fits
Compare two parser implementations with criterion before wiring the faster one into your API handler.
Run flamegraph on a release binary to find CPU hotspots before a production deploy.
Re-benchmark after an incident-driven fix to confirm latency regressed back into budget.
How it compares
Use as a measurement-backed Rust perf playbook instead of random `#[inline]` tweaks in chat.
Common Questions / FAQ
Who is m10-performance for?
Solo and indie developers shipping Rust CLIs, APIs, or backend services who want agent-guided profiling and optimization snippets.
When should I use m10-performance?
During Build when backend hot paths need review, and during Ship/perf when you are benching release builds, comparing implementations with criterion, or chasing allocation churn before launch.
Is m10-performance safe to install?
Review the Security Audits panel on this Prism page and inspect the skill bundle in your repo before letting an agent run `cargo install` or profiling commands on your machine.
SKILL.md
READMESKILL.md - M10 Performance
# Rust Performance Optimization Guide ## Profiling First ### Tools ```bash # CPU profiling cargo install flamegraph cargo flamegraph --bin myapp # Memory profiling cargo install cargo-instruments # macOS heaptrack ./target/release/myapp # Linux # Benchmarking cargo bench # with criterion # Cache analysis valgrind --tool=cachegrind ./target/release/myapp ``` ### Criterion Benchmarks ```rust use criterion::{criterion_group, criterion_main, Criterion}; fn benchmark_parse(c: &mut Criterion) { let input = "test data".repeat(1000); c.bench_function("parse_v1", |b| { b.iter(|| parse_v1(&input)) }); c.bench_function("parse_v2", |b| { b.iter(|| parse_v2(&input)) }); } criterion_group!(benches, benchmark_parse); criterion_main!(benches); ``` --- ## Common Optimizations ### 1. Avoid Unnecessary Allocations ```rust // BAD: allocates on every call fn to_uppercase(s: &str) -> String { s.to_uppercase() } // GOOD: return Cow, allocate only if needed use std::borrow::Cow; fn to_uppercase(s: &str) -> Cow<'_, str> { if s.chars().all(|c| c.is_uppercase()) { Cow::Borrowed(s) } else { Cow::Owned(s.to_uppercase()) } } ``` ### 2. Reuse Allocations ```rust // BAD: creates new Vec each iteration for item in items { let mut buffer = Vec::new(); process(&mut buffer, item); } // GOOD: reuse buffer let mut buffer = Vec::new(); for item in items { buffer.clear(); process(&mut buffer, item); } ``` ### 3. Use Appropriate Collections | Need | Collection | Notes | |------|------------|-------| | Sequential access | `Vec<T>` | Best cache locality | | Random access by key | `HashMap<K, V>` | O(1) lookup | | Ordered keys | `BTreeMap<K, V>` | O(log n) lookup | | Small sets (<20) | `Vec<T>` + linear search | Lower overhead | | FIFO queue | `VecDeque<T>` | O(1) push/pop both ends | ### 4. Pre-allocate Capacity ```rust // BAD: many reallocations let mut v = Vec::new(); for i in 0..10000 { v.push(i); } // GOOD: single allocation let mut v = Vec::with_capacity(10000); for i in 0..10000 { v.push(i); } ``` --- ## String Optimization ### Avoid String Concatenation in Loops ```rust // BAD: O(n²) allocations let mut result = String::new(); for s in strings { result = result + &s; } // GOOD: O(n) with push_str let mut result = String::new(); for s in strings { result.push_str(&s); } // BETTER: pre-calculate capacity let total_len: usize = strings.iter().map(|s| s.len()).sum(); let mut result = String::with_capacity(total_len); for s in strings { result.push_str(&s); } // BEST: use join for simple cases let result = strings.join(""); ``` ### Use &str When Possible ```rust // BAD: requires allocation fn greet(name: String) { println!("Hello, {}", name); } // GOOD: borrows, no allocation fn greet(name: &str) { println!("Hello, {}", name); } // Works with both: greet("world"); // &str greet(&String::from("world")); // &String coerces to &str ``` --- ## Iterator Optimization ### Use Iterators Over Indexing ```rust // BAD: bounds checking on each access let mut sum = 0; for i in 0..vec.len() { sum += vec[i]; } // GOOD: no bounds checking let sum: i32 = vec.iter().sum(); // GOOD: when index needed for (i, item) in vec.iter().enumerate() { // ... } ``` ### Lazy Evaluation ```rust // Iterators are lazy - computation happens at collect let result: Vec<_> = data .iter() .filter(|x| x.is_valid()) .map(|x| x.process()) .take(10) // stop after 10 items .collect(); ``` ### Avoid Collecting When Not Needed ```rust // BAD: unnecessary intermediate allocation let filtered: Vec<_> = items.iter().filter(|x| x.valid).collect(); let count = filtered.len(); // GOOD: no allocation let count = items.iter().filter(|x| x.valid).count(); ``` --- ## Parallelism with Rayon ```rust use rayon::prelude::*; // Sequential let sum: i32 = (0..1_000_000).map(|x| x * x).sum(); // Parallel (automatic