
Domain Ml
Apply Rust-specific ML design rules—zero-copy tensors, GPU batching, and ONNX deployment—when shipping inference or training-adjacent services.
Overview
domain-ml is an agent skill most often used in Build (also Ship perf, Operate infra) that encodes Rust ML domain constraints for efficient tensors, GPU inference, and portable ONNX deployment.
Install
npx skills add https://github.com/zhanghandong/rust-skills --skill domain-mlWhat is this skill?
- Domain-constraint table maps ML rules to Rust design choices (memory, GPU, ONNX)
- Critical rules: avoid tensor copies, batch for GPU efficiency, standard formats for Python→Rust deploy
- Trace-down links to performance, concurrency, and polars-style streaming patterns
- Covers ndarray, candle, tch-rs, tract ONNX, and reproducibility (seeded RNG, versioning)
Adoption & trust: 537 installs on skills.sh; 1.2k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are building ML in Rust but lack a concise map from ML operational rules to crate choices and memory-safe patterns.
Who is it for?
Indie builders adding on-device or server-side Rust inference after training models in Python or exporting ONNX.
Skip if: Teams who only need a one-off Python notebook with no Rust deploy path, or beginners who have not chosen Rust yet.
When should I use this skill?
Building ML/AI apps in Rust—machine learning, inference, neural networks, ndarray, tch-rs, burn, candle.
What do I get? / Deliverables
You get constraint-driven design guidance—batching, zero-copy, ONNX paths—that aligns your Rust service with how models actually run in production.
- Architecture-aligned constraint checklist
- Crate and pattern choices for memory and GPU efficiency
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Build because the skill encodes implementation constraints for Rust ML backends rather than ideation or ops runbooks. Backend fits tensor pipelines, inference servers, and numerical services that sit behind your product API.
Where it fits
Pick tract versus candle before wiring an embedding API into your SaaS.
Tune batch sizes and async loaders so GPU kernels amortize launch overhead.
Lock seeded RNG and model versioning so production predictions stay reproducible after redeploys.
How it compares
Domain constraint reference for Rust ML—not a full training tutorial or a generic prompt library.
Common Questions / FAQ
Who is domain-ml for?
Solo and indie developers shipping Rust backends or CLIs that run tensors, embeddings, or ONNX models with serious memory and GPU constraints.
When should I use domain-ml?
During Build when scaffolding inference crates; during Ship perf work when tuning batch sizes; during Operate when hardening reproducible prediction pipelines.
Is domain-ml safe to install?
It is procedural documentation with no inherent shell or network hooks; review the Security Audits panel on this Prism page before trusting any third-party skill bundle.
SKILL.md
READMESKILL.md - Domain Ml
# Machine Learning Domain > **Layer 3: Domain Constraints** ## Domain Constraints → Design Implications | Domain Rule | Design Constraint | Rust Implication | |-------------|-------------------|------------------| | Large data | Efficient memory | Zero-copy, streaming | | GPU acceleration | CUDA/Metal support | candle, tch-rs | | Model portability | Standard formats | ONNX | | Batch processing | Throughput over latency | Batched inference | | Numerical precision | Float handling | ndarray, careful f32/f64 | | Reproducibility | Deterministic | Seeded random, versioning | --- ## Critical Constraints ### Memory Efficiency ``` RULE: Avoid copying large tensors WHY: Memory bandwidth is bottleneck RUST: References, views, in-place ops ``` ### GPU Utilization ``` RULE: Batch operations for GPU efficiency WHY: GPU overhead per kernel launch RUST: Batch sizes, async data loading ``` ### Model Portability ``` RULE: Use standard model formats WHY: Train in Python, deploy in Rust RUST: ONNX via tract or candle ``` --- ## Trace Down ↓ From constraints to design (Layer 2): ``` "Need efficient data pipelines" ↓ m10-performance: Streaming, batching ↓ polars: Lazy evaluation "Need GPU inference" ↓ m07-concurrency: Async data loading ↓ candle/tch-rs: CUDA backend "Need model loading" ↓ m12-lifecycle: Lazy init, caching ↓ tract: ONNX runtime ``` --- ## Use Case → Framework | Use Case | Recommended | Why | |----------|-------------|-----| | Inference only | tract (ONNX) | Lightweight, portable | | Training + inference | candle, burn | Pure Rust, GPU | | PyTorch models | tch-rs | Direct bindings | | Data pipelines | polars | Fast, lazy eval | ## Key Crates | Purpose | Crate | |---------|-------| | Tensors | ndarray | | ONNX inference | tract | | ML framework | candle, burn | | PyTorch bindings | tch-rs | | Data processing | polars | | Embeddings | fastembed | ## Design Patterns | Pattern | Purpose | Implementation | |---------|---------|----------------| | Model loading | Once, reuse | `OnceLock<Model>` | | Batching | Throughput | Collect then process | | Streaming | Large data | Iterator-based | | GPU async | Parallelism | Data loading parallel to compute | ## Code Pattern: Inference Server ```rust use std::sync::OnceLock; use tract_onnx::prelude::*; static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new(); fn get_model() -> &'static SimplePlan<...> { MODEL.get_or_init(|| { tract_onnx::onnx() .model_for_path("model.onnx") .unwrap() .into_optimized() .unwrap() .into_runnable() .unwrap() }) } async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> { let model = get_model(); let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?; let result = model.run(tvec!(input.into()))?; Ok(result[0].to_array_view::<f32>()?.iter().copied().collect()) } ``` ## Code Pattern: Batched Inference ```rust async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> { let mut results = Vec::with_capacity(inputs.len()); for batch in inputs.chunks(batch_size) { // Stack inputs into batch tensor let batch_tensor = stack_inputs(batch); // Run inference on batch let batch_output = model.run(batch_tensor).await; // Unstack results results.extend(unstack_outputs(batch_output)); } results } ``` --- ## Common Mistakes | Mistake | Domain Violation | Fix | |---------|-----------------|-----| | Clone tensors | Memory waste | Use views | | Single inference | GPU underutilized | Batch processing | |