M10 Performance

Name: M10 Performance
Author: actionbook

actionbook/rust-skills

1.5k installs
1.3k repo stars
Updated May 24, 2026
actionbook/rust-skills

m10-performance is an agent skill for Rust performance optimization using profiling, benchmarks, and measured design choices.

About

The m10-performance skill guides Rust performance optimization with a measure-first mindset using flamegraphs, perf, and criterion benchmarks to find real bottlenecks. It maps goals to design choices such as pre-allocation with with_capacity, contiguous Vec layouts, rayon parallelism, zero-copy Cow references, and smallvec for inline data. The skill asks whether optimization is worth added complexity and prioritizes algorithmic wins over cache or allocation tweaks. Thinking prompts cover measurement, priority ordering from algorithm through cache effects, and tradeoffs between memory, CPU, latency, and throughput. It traces decisions up to domain constraints and down to concrete Rust patterns. Triggers include performance, benchmark, profiling, flamegraph, criterion, SIMD, and allocation keywords. Use when developers profile Rust code and choose optimization strategies grounded in measurement.

Measure-first workflow with flamegraph, perf, and criterion before optimizing.
Decision table maps goals to pre-allocation, rayon, Cow, and smallvec patterns.
Prioritizes algorithmic gains over allocation and cache micro-optimizations.
Prompts for complexity versus speed and memory versus CPU tradeoffs.
User-invocable false; triggered by performance and benchmark keywords.

M10 Performance by the numbers

1,494 all-time installs (skills.sh)
+52 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #7 of 129 Rust skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

m10-performance capabilities & compatibility

Capabilities: profile first bottleneck identification · allocation and cache optimization patterns · rayon parallelism guidance · zero copy cow reference patterns · complexity versus speed tradeoff prompts
Use cases: refactoring · testing · debugging

From the docs

What m10-performance says it does

What's the bottleneck, and is optimization worth it?

SKILL.md

Have you measured? (Don't guess)

SKILL.md

npx skills add https://github.com/actionbook/rust-skills --skill m10-performance

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/actionbook/rust-skills/m10-performance.svg)](https://skillselion.com/skills/actionbook/rust-skills/m10-performance)

Installs	1.5k
repo stars	★ 1.3k
Security audit	3 / 3 scanners passed
Last updated	May 24, 2026
Repository	actionbook/rust-skills ↗

How do I find and fix Rust performance bottlenecks without guessing or premature micro-optimization?

Optimize Rust performance using profiling, benchmarks, and allocation or cache-aware design choices before micro-optimizing.

Who is it for?

Developers optimizing Rust services or libraries who need benchmark-driven performance decisions.

Skip if: Skip for non-Rust languages or feature development without performance measurement needs.

When should I use this skill?

User asks to optimize Rust performance, run criterion benchmarks, or analyze flamegraphs.

What you get

Profiled hotspots with chosen optimization patterns such as pre-allocation, parallelism, or zero-copy references.

Benchmark results
Bottleneck analysis
Targeted optimization plan

Files

SKILL.mdMarkdownGitHub ↗

Performance Optimization

Layer 2: Design Choices

Core Question

What's the bottleneck, and is optimization worth it?

Before optimizing:

Have you measured? (Don't guess)
What's the acceptable performance?
Will optimization add complexity?

---

Performance Decision → Implementation

Goal	Design Choice	Implementation
Reduce allocations	Pre-allocate, reuse	`with_capacity`, object pools
Improve cache	Contiguous data	`Vec`, `SmallVec`
Parallelize	Data parallelism	`rayon`, threads
Avoid copies	Zero-copy	References, `Cow<T>`
Reduce indirection	Inline data	`smallvec`, arrays

---

Thinking Prompt

Before optimizing:

1. Have you measured?

Profile first → flamegraph, perf
Benchmark → criterion, cargo bench
Identify actual hotspots

2. What's the priority?

Algorithm (10x-1000x improvement)
Data structure (2x-10x)
Allocation (2x-5x)
Cache (1.5x-3x)

3. What's the trade-off?

Complexity vs speed
Memory vs CPU
Latency vs throughput

---

Trace Up ↑

To domain constraints (Layer 3):

"How fast does this need to be?"
    ↑ Ask: What's the performance SLA?
    ↑ Check: domain-* (latency requirements)
    ↑ Check: Business requirements (acceptable response time)

Question	Trace To	Ask
Latency requirements	domain-*	What's acceptable response time?
Throughput needs	domain-*	How many requests per second?
Memory constraints	domain-*	What's the memory budget?

---

Trace Down ↓

To implementation (Layer 1):

"Need to reduce allocations"
    ↓ m01-ownership: Use references, avoid clone
    ↓ m02-resource: Pre-allocate with_capacity

"Need to parallelize"
    ↓ m07-concurrency: Choose rayon or threads
    ↓ m07-concurrency: Consider async for I/O-bound

"Need cache efficiency"
    ↓ Data layout: Prefer Vec over HashMap when possible
    ↓ Access patterns: Sequential over random access

---

Quick Reference

Tool	Purpose
`cargo bench`	Micro-benchmarks
`criterion`	Statistical benchmarks
`perf` / `flamegraph`	CPU profiling
`heaptrack`	Allocation tracking
`valgrind` / `cachegrind`	Cache analysis

Optimization Priority

1. Algorithm choice     (10x - 1000x)
2. Data structure       (2x - 10x)
3. Allocation reduction (2x - 5x)
4. Cache optimization   (1.5x - 3x)
5. SIMD/Parallelism     (2x - 8x)

Common Techniques

Technique	When	How
Pre-allocation	Known size	`Vec::with_capacity(n)`
Avoid cloning	Hot paths	Use references or `Cow<T>`
Batch operations	Many small ops	Collect then process
SmallVec	Usually small	`smallvec::SmallVec<[T; N]>`
Inline buffers	Fixed-size data	Arrays over Vec

---

Common Mistakes

Mistake	Why Wrong	Better
Optimize without profiling	Wrong target	Profile first
Benchmark in debug mode	Meaningless	Always `--release`
Use LinkedList	Cache unfriendly	`Vec` or `VecDeque`
Hidden `.clone()`	Unnecessary allocs	Use references
Premature optimization	Wasted effort	Make it work first

---

Anti-Patterns

Anti-Pattern	Why Bad	Better
Clone to avoid lifetimes	Performance cost	Proper ownership
Box everything	Indirection cost	Stack when possible
HashMap for small sets	Overhead	Vec with linear search
String concat in loop	O(n^2)	`String::with_capacity` or `format!`

---

Related Skills

When	See
Reducing clones	m01-ownership
Concurrency options	m07-concurrency
Smart pointer choice	m02-resource
Domain requirements	domain-*

Rust Performance Optimization Guide

Profiling First

Tools

# CPU profiling
cargo install flamegraph
cargo flamegraph --bin myapp

# Memory profiling
cargo install cargo-instruments  # macOS
heaptrack ./target/release/myapp  # Linux

# Benchmarking
cargo bench  # with criterion

# Cache analysis
valgrind --tool=cachegrind ./target/release/myapp

Criterion Benchmarks

use criterion::{criterion_group, criterion_main, Criterion};

fn benchmark_parse(c: &mut Criterion) {
    let input = "test data".repeat(1000);

    c.bench_function("parse_v1", |b| {
        b.iter(|| parse_v1(&input))
    });

    c.bench_function("parse_v2", |b| {
        b.iter(|| parse_v2(&input))
    });
}

criterion_group!(benches, benchmark_parse);
criterion_main!(benches);

---

Common Optimizations

1. Avoid Unnecessary Allocations

// BAD: allocates on every call
fn to_uppercase(s: &str) -> String {
    s.to_uppercase()
}

// GOOD: return Cow, allocate only if needed
use std::borrow::Cow;

fn to_uppercase(s: &str) -> Cow<'_, str> {
    if s.chars().all(|c| c.is_uppercase()) {
        Cow::Borrowed(s)
    } else {
        Cow::Owned(s.to_uppercase())
    }
}

2. Reuse Allocations

// BAD: creates new Vec each iteration
for item in items {
    let mut buffer = Vec::new();
    process(&mut buffer, item);
}

// GOOD: reuse buffer
let mut buffer = Vec::new();
for item in items {
    buffer.clear();
    process(&mut buffer, item);
}

3. Use Appropriate Collections

Need	Collection	Notes
Sequential access	`Vec<T>`	Best cache locality
Random access by key	`HashMap<K, V>`	O(1) lookup
Ordered keys	`BTreeMap<K, V>`	O(log n) lookup
Small sets (<20)	`Vec<T>` + linear search	Lower overhead
FIFO queue	`VecDeque<T>`	O(1) push/pop both ends

4. Pre-allocate Capacity

// BAD: many reallocations
let mut v = Vec::new();
for i in 0..10000 {
    v.push(i);
}

// GOOD: single allocation
let mut v = Vec::with_capacity(10000);
for i in 0..10000 {
    v.push(i);
}

---

String Optimization

Avoid String Concatenation in Loops

// BAD: O(n²) allocations
let mut result = String::new();
for s in strings {
    result = result + &s;
}

// GOOD: O(n) with push_str
let mut result = String::new();
for s in strings {
    result.push_str(&s);
}

// BETTER: pre-calculate capacity
let total_len: usize = strings.iter().map(|s| s.len()).sum();
let mut result = String::with_capacity(total_len);
for s in strings {
    result.push_str(&s);
}

// BEST: use join for simple cases
let result = strings.join("");

Use &str When Possible

// BAD: requires allocation
fn greet(name: String) {
    println!("Hello, {}", name);
}

// GOOD: borrows, no allocation
fn greet(name: &str) {
    println!("Hello, {}", name);
}

// Works with both:
greet("world");                    // &str
greet(&String::from("world"));     // &String coerces to &str

---

Iterator Optimization

Use Iterators Over Indexing

// BAD: bounds checking on each access
let mut sum = 0;
for i in 0..vec.len() {
    sum += vec[i];
}

// GOOD: no bounds checking
let sum: i32 = vec.iter().sum();

// GOOD: when index needed
for (i, item) in vec.iter().enumerate() {
    // ...
}

Lazy Evaluation

// Iterators are lazy - computation happens at collect
let result: Vec<_> = data
    .iter()
    .filter(|x| x.is_valid())
    .map(|x| x.process())
    .take(10)  // stop after 10 items
    .collect();

Avoid Collecting When Not Needed

// BAD: unnecessary intermediate allocation
let filtered: Vec<_> = items.iter().filter(|x| x.valid).collect();
let count = filtered.len();

// GOOD: no allocation
let count = items.iter().filter(|x| x.valid).count();

---

Parallelism with Rayon

use rayon::prelude::*;

// Sequential
let sum: i32 = (0..1_000_000).map(|x| x * x).sum();

// Parallel (automatic work stealing)
let sum: i32 = (0..1_000_000).into_par_iter().map(|x| x * x).sum();

// Parallel with custom chunk size
let results: Vec<_> = data
    .par_chunks(1000)
    .map(|chunk| process_chunk(chunk))
    .collect();

---

Memory Layout

Use Appropriate Integer Sizes

// If values are small, use smaller types
struct Item {
    count: u8,      // 0-255, not u64
    flags: u8,      // small enum
    id: u32,        // if 4 billion is enough
}

Pack Structs Efficiently

// BAD: 24 bytes due to padding
struct Bad {
    a: u8,   // 1 byte + 7 padding
    b: u64,  // 8 bytes
    c: u8,   // 1 byte + 7 padding
}

// GOOD: 16 bytes (or use #[repr(packed)])
struct Good {
    b: u64,  // 8 bytes
    a: u8,   // 1 byte
    c: u8,   // 1 byte + 6 padding
}

Box Large Values

// Large enum variants waste space
enum Message {
    Quit,
    Data([u8; 10000]),  // all variants are 10000+ bytes
}

// Better: box the large variant
enum Message {
    Quit,
    Data(Box<[u8; 10000]>),  // variants are pointer-sized
}

---

Async Performance

Avoid Blocking in Async

// BAD: blocks the executor
async fn bad() {
    std::thread::sleep(Duration::from_secs(1));  // blocking!
    std::fs::read_to_string("file.txt").unwrap();  // blocking!
}

// GOOD: use async versions
async fn good() {
    tokio::time::sleep(Duration::from_secs(1)).await;
    tokio::fs::read_to_string("file.txt").await.unwrap();
}

// For CPU work: spawn_blocking
async fn compute() -> i32 {
    tokio::task::spawn_blocking(|| {
        heavy_computation()
    }).await.unwrap()
}

Buffer Async I/O

use tokio::io::{AsyncBufReadExt, BufReader};

// BAD: many small reads
async fn bad(file: File) {
    let mut byte = [0u8];
    while file.read(&mut byte).await.unwrap() > 0 {
        process(byte[0]);
    }
}

// GOOD: buffered reading
async fn good(file: File) {
    let reader = BufReader::new(file);
    let mut lines = reader.lines();
    while let Some(line) = lines.next_line().await.unwrap() {
        process(&line);
    }
}

---

Release Build Optimization

Cargo.toml Settings

[profile.release]
lto = true           # Link-time optimization
codegen-units = 1    # Single codegen unit (slower compile, faster code)
panic = "abort"      # Smaller binary, no unwinding
strip = true         # Strip symbols

[profile.release-fast]
inherits = "release"
opt-level = 3        # Maximum optimization

[profile.release-small]
inherits = "release"
opt-level = "s"      # Optimize for size

Compile-Time Assertions

// Zero runtime cost
const _: () = assert!(std::mem::size_of::<MyStruct>() <= 64);

---

Checklist

Before optimizing:

[ ] Profile to find actual bottlenecks
[ ] Have benchmarks to measure improvement
[ ] Consider if optimization is worth complexity

Common wins:

[ ] Reduce allocations (Cow, reuse buffers)
[ ] Use appropriate collections
[ ] Pre-allocate with_capacity
[ ] Use iterators instead of indexing
[ ] Enable LTO for release builds
[ ] Use rayon for parallel workloads

Related skills

Rust Async PatternsGenerate correct, production-grade async Rust code using Tokio, async-trait, channels, streams and proper error handling.16.3k38.3k

Rust Best PracticesGet concise, high-signal Rust style guidance that an agent can apply directly to code suggestions and reviews.14.5k100

Tauri V2Get expert guidance when creating cross-platform desktop and mobile apps using web technologies and Rust.6.5k14

Rust TestingGenerate comprehensive Rust tests following TDD with unit, integration, async, property-based, and mocked tests.5.8k234k

Rust PatternsGenerate and review idiomatic Rust code that follows ownership rules, error handling conventions, and concurrency best practices.5.4k234k

Rust SkillsEnsure every line of Rust they generate follows battle-tested idioms for ownership, error handling, performance and API design.3.6k367

Forks & variants (1)

M10 Performance has 1 known copy in the catalog totaling 911 installs. They canonicalize to this original listing.

zhanghandong - 911 installs

How it compares

Pick this over general Rust coding skills when the task is profiling-driven performance decisions rather than language syntax or API design.

FAQ

What does m10-performance recommend first?

Measure with profiling and benchmarks to identify actual bottlenecks before choosing optimization patterns.

When should I use m10-performance?

When optimizing Rust code for speed, allocations, cache behavior, or parallel execution with criterion or flamegraphs.

Is m10-performance safe to install?

Review the Security Audits panel on this page before installing in production.

Rustbackendtesting

About

M10 Performance by the numbers

m10-performance capabilities & compatibility

What m10-performance says it does

Add your badge

How do I find and fix Rust performance bottlenecks without guessing or premature micro-optimization?

Who is it for?

When should I use this skill?

What you get

Files

Performance Optimization

Core Question

Performance Decision → Implementation

Thinking Prompt

Trace Up ↑

Trace Down ↓

Quick Reference

Optimization Priority

Common Techniques

Common Mistakes

Anti-Patterns

Related Skills

Rust Performance Optimization Guide

Profiling First

Tools

Criterion Benchmarks

Common Optimizations

1. Avoid Unnecessary Allocations

2. Reuse Allocations

3. Use Appropriate Collections

4. Pre-allocate Capacity

String Optimization

Avoid String Concatenation in Loops

Use &str When Possible

Iterator Optimization

Use Iterators Over Indexing

Lazy Evaluation

Avoid Collecting When Not Needed

Parallelism with Rayon

Memory Layout

Use Appropriate Integer Sizes

Pack Structs Efficiently

Box Large Values

Async Performance

Avoid Blocking in Async

Buffer Async I/O

Release Build Optimization

Cargo.toml Settings

Compile-Time Assertions

Checklist

Related skills

Forks & variants (1)

How it compares

FAQ

What does m10-performance recommend first?

When should I use m10-performance?

Is m10-performance safe to install?

This week in AI coding