Worker Benchmarks

Name: Worker Benchmarks
Author: ruvnet

ruvnet/ruflo

Measure agentic-flow worker latency, throughput, and concurrency against documented p95 targets before you scale agent workloads.

Install

npx skills add https://github.com/ruvnet/ruflo --skill worker-benchmarks

What is this skill?

Full suite via `npx agentic-flow workers benchmark` plus `--type` filters for trigger-detection, registry, agent-selecti
Per-type SLOs documented (e.g. trigger-detection p95 < 5ms, registry p95 < 10ms, agent-selection p95 < 1ms, cache p95 <
Collects latency histograms, throughput, registry CRUD breakdown, selection confidence scores, and cache hit/eviction st
Targets the agentic-flow worker system (12 trigger keywords in trigger-detection benchmark)
Capabilities: performance_testing, metrics_collection, optimization_recommendations

Adoption & trust: 635 installs on skills.sh; 58.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Recommended Skills

Azure Kubernetesmicrosoft/azure-skills

Azure Kubernetes skill supplies fix patterns for AKS Automatic compatibility—adding resource requests, dropping capabili…204k installs·1.2k stars

Github Actions Docsxixu-me/skills

github-actions-docs is a research-oriented agent skill that stops generic CI/CD guesses when the user’s question is real…187k installs·61 stars

Deploy To Vercelvercel-labs/agent-skills

Deploy to Vercel is an agent skill from Vercel Labs that walks solo builders through shipping a project to Vercel with d…67.1k installs·27.7k stars

Vercel Cli With Tokensvercel-labs/agent-skills

Vercel CLI with Tokens is a Vercel Labs agent skill that teaches coding agents how to deploy and manage projects when in…47.2k installs·27.7k stars

Turborepovercel/turborepo

Turborepo Skill is procedural guidance for Vercel’s Turborepo build system on JavaScript and TypeScript monorepos. It ta…36k installs·30.5k stars

Docker Expertsickn33/antigravity-awesome-skills

Docker Expert is an agent skill that acts as a hands-on containerization consultant for solo and indie builders shipping…18.7k installs·40.1k stars

Journey fit

Primary fit

Performance benchmark suites belong on the Ship shelf because solo builders run them to prove agent pipelines meet latency budgets before release. Perf is the canonical subphase for scripted benchmark types (trigger detection, registry, agent selection, cache, concurrent workers) with explicit p95 targets.

Common Questions / FAQ

Is Worker Benchmarks safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

SKILL.md

READMESKILL.md - Worker Benchmarks

# Worker Benchmarks Skill

Run comprehensive performance benchmarks for the agentic-flow worker system.

## Quick Start

```bash
# Run full benchmark suite
npx agentic-flow workers benchmark

# Run specific benchmark
npx agentic-flow workers benchmark --type trigger-detection
npx agentic-flow workers benchmark --type registry
npx agentic-flow workers benchmark --type agent-selection
npx agentic-flow workers benchmark --type concurrent
```

## Benchmark Types

### 1. Trigger Detection (`trigger-detection`)
Tests keyword detection speed across 12 worker triggers.
- **Target**: p95 < 5ms
- **Iterations**: 1000
- **Metrics**: latency, throughput, histogram

### 2. Worker Registry (`registry`)
Tests CRUD operations on worker entries.
- **Target**: p95 < 10ms
- **Iterations**: 500 creates, gets, updates
- **Metrics**: per-operation latency breakdown

### 3. Agent Selection (`agent-selection`)
Tests performance-based agent selection.
- **Target**: p95 < 1ms
- **Iterations**: 1000
- **Metrics**: selection confidence, agent scores

### 4. Model Cache (`cache`)
Tests model caching performance.
- **Target**: p95 < 0.5ms
- **Metrics**: hit rate, cache size, eviction stats

### 5. Concurrent Workers (`concurrent`)
Tests parallel worker creation and updates.
- **Target**: < 1000ms for 10 workers
- **Metrics**: per-worker latency, memory usage

### 6. Memory Key Generation (`memory-keys`)
Tests memory pattern key generation.
- **Target**: p95 < 0.1ms
- **Iterations**: 5000
- **Metrics**: unique patterns, throughput

## Output Format

```
═══════════════════════════════════════════════════════════
📈 BENCHMARK RESULTS
═══════════════════════════════════════════════════════════

✅ Trigger Detection
   Operation: detect
   Count: 1,000
   Avg: 0.045ms | p95: 0.120ms (target: 5ms)
   Throughput: 22,222 ops$s
   Memory Δ: 0.12MB

✅ Worker Registry
   Operation: crud
   Count: 1,500
   Avg: 1.234ms | p95: 3.456ms (target: 10ms)
   Throughput: 810 ops$s
   Memory Δ: 2.34MB

───────────────────────────────────────────────────────────
📊 SUMMARY
───────────────────────────────────────────────────────────
Total Tests: 6
Passed: 6 | Failed: 0
Avg Latency: 0.567ms
Total Duration: 2345ms
Peak Memory: 8.90MB
═══════════════════════════════════════════════════════════
```

## Integration with Settings

Benchmark thresholds are configured in `.claude$settings.json`:

```json
{
  "performance": {
    "benchmarkThresholds": {
      "triggerDetection": { "p95Ms": 5 },
      "workerRegistry": { "p95Ms": 10 },
      "agentSelection": { "p95Ms": 1 },
      "memoryKeyGeneration": { "p95Ms": 0.1 },
      "concurrentWorkers": { "totalMs": 1000 }
    }
  }
}
```

## Programmatic Usage

```typescript
import { workerBenchmarks, runBenchmarks } from 'agentic-flow$workers$worker-benchmarks';

// Run full suite
const suite = await runBenchmarks();
console.log(suite.summary);

// Run individual benchmarks
const triggerResult = await workerBenchmarks.benchmarkTriggerDetection(1000);
const registryResult = await workerBenchmarks.benchmarkRegistryOperations(500);
```

## Performance Optimization Tips

1. **Model Cache**: Enable with `CLAUDE_FLOW_MODEL_CACHE_MB=512`
2. **Parallel Workers**: Enable with `CLAUDE_FLOW_WORKER_PARALLEL=true`
3. **Warning Suppression**: Enable with `CLAUDE_FLOW_SUPPRESS_WARNINGS=true`
4. **SQLite WAL Mode**: Automatic for better concurrent performance

What is this skill?

Full suite via `npx agentic-flow workers benchmark` plus `--type` filters for trigger-detection, registry, agent-selecti

Per-type SLOs documented (e.g. trigger-detection p95 < 5ms, registry p95 < 10ms, agent-selection p95 < 1ms, cache p95 <

Collects latency histograms, throughput, registry CRUD breakdown, selection confidence scores, and cache hit/eviction st

Targets the agentic-flow worker system (12 trigger keywords in trigger-detection benchmark)

Capabilities: performance_testing, metrics_collection, optimization_recommendations

Adoption & trust: 635 installs on skills.sh; 58.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

SKILL.md

READMESKILL.md - Worker Benchmarks

# Worker Benchmarks Skill

Run comprehensive performance benchmarks for the agentic-flow worker system.

## Quick Start

```bash
# Run full benchmark suite
npx agentic-flow workers benchmark

# Run specific benchmark
npx agentic-flow workers benchmark --type trigger-detection
npx agentic-flow workers benchmark --type registry
npx agentic-flow workers benchmark --type agent-selection
npx agentic-flow workers benchmark --type concurrent
```

## Benchmark Types

### 1. Trigger Detection (`trigger-detection`)
Tests keyword detection speed across 12 worker triggers.
- **Target**: p95 < 5ms
- **Iterations**: 1000
- **Metrics**: latency, throughput, histogram

### 2. Worker Registry (`registry`)
Tests CRUD operations on worker entries.
- **Target**: p95 < 10ms
- **Iterations**: 500 creates, gets, updates
- **Metrics**: per-operation latency breakdown

### 3. Agent Selection (`agent-selection`)
Tests performance-based agent selection.
- **Target**: p95 < 1ms
- **Iterations**: 1000
- **Metrics**: selection confidence, agent scores

### 4. Model Cache (`cache`)
Tests model caching performance.
- **Target**: p95 < 0.5ms
- **Metrics**: hit rate, cache size, eviction stats

### 5. Concurrent Workers (`concurrent`)
Tests parallel worker creation and updates.
- **Target**: < 1000ms for 10 workers
- **Metrics**: per-worker latency, memory usage

### 6. Memory Key Generation (`memory-keys`)
Tests memory pattern key generation.
- **Target**: p95 < 0.1ms
- **Iterations**: 5000
- **Metrics**: unique patterns, throughput

## Output Format

```
═══════════════════════════════════════════════════════════
📈 BENCHMARK RESULTS
═══════════════════════════════════════════════════════════

✅ Trigger Detection
   Operation: detect
   Count: 1,000
   Avg: 0.045ms | p95: 0.120ms (target: 5ms)
   Throughput: 22,222 ops$s
   Memory Δ: 0.12MB

✅ Worker Registry
   Operation: crud
   Count: 1,500
   Avg: 1.234ms | p95: 3.456ms (target: 10ms)
   Throughput: 810 ops$s
   Memory Δ: 2.34MB

───────────────────────────────────────────────────────────
📊 SUMMARY
───────────────────────────────────────────────────────────
Total Tests: 6
Passed: 6 | Failed: 0
Avg Latency: 0.567ms
Total Duration: 2345ms
Peak Memory: 8.90MB
═══════════════════════════════════════════════════════════
```

## Integration with Settings

Benchmark thresholds are configured in `.claude$settings.json`:

```json
{
  "performance": {
    "benchmarkThresholds": {
      "triggerDetection": { "p95Ms": 5 },
      "workerRegistry": { "p95Ms": 10 },
      "agentSelection": { "p95Ms": 1 },
      "memoryKeyGeneration": { "p95Ms": 0.1 },
      "concurrentWorkers": { "totalMs": 1000 }
    }
  }
}
```

## Programmatic Usage

```typescript
import { workerBenchmarks, runBenchmarks } from 'agentic-flow$workers$worker-benchmarks';

// Run full suite
const suite = await runBenchmarks();
console.log(suite.summary);

// Run individual benchmarks
const triggerResult = await workerBenchmarks.benchmarkTriggerDetection(1000);
const registryResult = await workerBenchmarks.benchmarkRegistryOperations(500);
```

## Performance Optimization Tips

1. **Model Cache**: Enable with `CLAUDE_FLOW_MODEL_CACHE_MB=512`
2. **Parallel Workers**: Enable with `CLAUDE_FLOW_WORKER_PARALLEL=true`
3. **Warning Suppression**: Enable with `CLAUDE_FLOW_SUPPRESS_WARNINGS=true`
4. **SQLite WAL Mode**: Automatic for better concurrent performance

Install

What is this skill?

Recommended Skills

Journey fit

Is Worker Benchmarks safe to install?

SKILL.md

This week for builders

Install

What is this skill?

Recommended Skills

Journey fit

Is Worker Benchmarks safe to install?

SKILL.md