
Domain Ml
Apply Rust-specific ML domain constraints—memory, GPU, ONNX, batching—when designing inference services and data pipelines as a solo builder.
Overview
domain-ml is an agent skill most often used in Build (also Validate, Ship, Operate) that translates ML domain constraints into Rust backend design choices.
Install
npx skills add https://github.com/actionbook/rust-skills --skill domain-mlWhat is this skill?
- Layer 3 domain constraints mapping ML rules to Rust design implications
- Covers zero-copy tensors, GPU batching, ONNX portability, and reproducibility
- References candle, tch-rs, burn, ndarray, tract, and polars-style data paths
- Traces constraints down to performance and concurrency modules in the Rust skills stack
- Marked user-invocable: false—reference discipline for agent context, not a standalone ritual
- Documented as Layer 3: Domain Constraints in the Rust skills stack
- Critical constraint table covers memory efficiency, GPU utilization, model portability, and reproducibility
Adoption & trust: 725 installs on skills.sh; 1.2k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are implementing ML in Rust but your agent keeps copying huge tensors, ignoring GPU batching, or inventing non-portable model formats.
Who is it for?
Indie hackers building Rust inference APIs, edge ML tools, or data pipelines where memory bandwidth and ONNX deployment matter.
Skip if: Pure Python training workflows, beginners seeking step-by-step notebook tutorials, or frontend-only products with no Rust ML surface.
When should I use this skill?
Use when building ML/AI apps in Rust; keywords include machine learning, tensor, inference, neural network, ndarray, tch-rs, burn, candle, ONNX.
What do I get? / Deliverables
After the skill is in context, designs favor zero-copy data paths, batched GPU inference, ONNX portability, and reproducible numeric handling aligned with the Rust skills stack.
- Architecture decisions aligned with ML domain constraint table
- Explicit Rust implications for memory, GPU, ONNX, and reproducibility
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
ML system design in Rust is anchored in Build backend work where models, tensors, and serving paths are implemented. Backend is the shelf for inference APIs, batched GPU paths, and ndarray/candle/tch-rs architecture choices—not frontend UI polish.
Where it fits
Decide whether a Rust ONNX path is feasible before committing to a full inference microservice.
Structure batched candle inference with streaming polars features instead of per-request tensor clones.
Load-test GPU batch sizes and memory bandwidth assumptions before launch.
Tune production inference workers for determinism and model version pinning after a silent numeric drift incident.
How it compares
Domain constraint reference for Rust ML architecture—not a one-shot model training generator or a hosted inference MCP.
Common Questions / FAQ
Who is domain-ml for?
Solo builders and small teams writing Rust ML or inference services who want agents to respect tensor memory, GPU batching, and ONNX constraints by default.
When should I use domain-ml?
In Validate when scoping a Rust inference prototype; in Build when choosing candle/tch-rs/ndarray patterns; in Ship when perf-testing batches; in Operate when tuning production inference memory.
Is domain-ml safe to install?
It is documentation-style guidance with user-invocable false; still review the Security Audits panel on this page before bundling it in automated agent policies.
SKILL.md
READMESKILL.md - Domain Ml
# Machine Learning Domain > **Layer 3: Domain Constraints** ## Domain Constraints → Design Implications | Domain Rule | Design Constraint | Rust Implication | |-------------|-------------------|------------------| | Large data | Efficient memory | Zero-copy, streaming | | GPU acceleration | CUDA/Metal support | candle, tch-rs | | Model portability | Standard formats | ONNX | | Batch processing | Throughput over latency | Batched inference | | Numerical precision | Float handling | ndarray, careful f32/f64 | | Reproducibility | Deterministic | Seeded random, versioning | --- ## Critical Constraints ### Memory Efficiency ``` RULE: Avoid copying large tensors WHY: Memory bandwidth is bottleneck RUST: References, views, in-place ops ``` ### GPU Utilization ``` RULE: Batch operations for GPU efficiency WHY: GPU overhead per kernel launch RUST: Batch sizes, async data loading ``` ### Model Portability ``` RULE: Use standard model formats WHY: Train in Python, deploy in Rust RUST: ONNX via tract or candle ``` --- ## Trace Down ↓ From constraints to design (Layer 2): ``` "Need efficient data pipelines" ↓ m10-performance: Streaming, batching ↓ polars: Lazy evaluation "Need GPU inference" ↓ m07-concurrency: Async data loading ↓ candle/tch-rs: CUDA backend "Need model loading" ↓ m12-lifecycle: Lazy init, caching ↓ tract: ONNX runtime ``` --- ## Use Case → Framework | Use Case | Recommended | Why | |----------|-------------|-----| | Inference only | tract (ONNX) | Lightweight, portable | | Training + inference | candle, burn | Pure Rust, GPU | | PyTorch models | tch-rs | Direct bindings | | Data pipelines | polars | Fast, lazy eval | ## Key Crates | Purpose | Crate | |---------|-------| | Tensors | ndarray | | ONNX inference | tract | | ML framework | candle, burn | | PyTorch bindings | tch-rs | | Data processing | polars | | Embeddings | fastembed | ## Design Patterns | Pattern | Purpose | Implementation | |---------|---------|----------------| | Model loading | Once, reuse | `OnceLock<Model>` | | Batching | Throughput | Collect then process | | Streaming | Large data | Iterator-based | | GPU async | Parallelism | Data loading parallel to compute | ## Code Pattern: Inference Server ```rust use std::sync::OnceLock; use tract_onnx::prelude::*; static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new(); fn get_model() -> &'static SimplePlan<...> { MODEL.get_or_init(|| { tract_onnx::onnx() .model_for_path("model.onnx") .unwrap() .into_optimized() .unwrap() .into_runnable() .unwrap() }) } async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> { let model = get_model(); let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?; let result = model.run(tvec!(input.into()))?; Ok(result[0].to_array_view::<f32>()?.iter().copied().collect()) } ``` ## Code Pattern: Batched Inference ```rust async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> { let mut results = Vec::with_capacity(inputs.len()); for batch in inputs.chunks(batch_size) { // Stack inputs into batch tensor let batch_tensor = stack_inputs(batch); // Run inference on batch let batch_output = model.run(batch_tensor).await; // Unstack results results.extend(unstack_outputs(batch_output)); } results } ``` --- ## Common Mistakes | Mistake | Domain Violation | Fix | |---------|-----------------|-----| | Clone tensors | Memory waste | Use views | | Single inference | GPU underutilized | Batch processing | |