
Weights And Biases
Version datasets and model checkpoints in Weights & Biases Artifacts and Model Registry so solo ML builders can reproduce runs and promote models without scattered files.
Overview
weights-and-biases is an agent skill most often used in Build (also Ship, Grow) that documents W&B Artifacts and Model Registry workflows for versioned datasets and model checkpoints.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill weights-and-biasesWhat is this skill?
- End-to-end Artifacts guide: creation, usage, Model Registry, versioning, and lineage
- Dataset and model artifact patterns with metadata blocks and add_file, add_dir, add_reference
- Automatic versioning (v0, v1, v2) plus aliases such as latest, best, and production
- Lineage tracking tying runs to datasets and checkpoints for reproducibility
- Best-practices section for team-wide artifact collaboration and deduplicated storage
- Table of contents covers 6 major sections including Model Registry and Best Practices
- Documents automatic artifact versioning pattern v0, v1, v2
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
Your training outputs live in ad-hoc folders and you cannot trace which dataset version or checkpoint a given experiment or deploy used.
Who is it for?
Solo builders fine-tuning models or running repeatable evals who already use or plan to use W&B for experiment tracking.
Skip if: Projects with no ML training pipeline, or teams that need a non-W&B MLOps platform without SDK migration appetite.
When should I use this skill?
You need W&B Artifacts, Model Registry, dataset or checkpoint versioning, or lineage tracking in training code.
What do I get? / Deliverables
Your agent implements wandb Artifact logging with metadata, lineage, and registry aliases so runs and promoted models stay reproducible across environments.
- Artifact logging code for datasets and/or models with metadata
- Registry-oriented promotion pattern using aliases such as latest or production
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
W&B wiring happens while you integrate experiment tracking and artifact storage into training code—the canonical Build integrations shelf—though you keep using artifacts through Operate and Grow measurement. Integrations fits because the skill is API- and SDK-oriented guidance for wandb.init, log_artifact, lineage, and registry aliases—not generic product management docs.
Where it fits
Add dataset Artifacts around a new fine-tuning script so every run references v2 of your cleaned JSONL.
Save torch checkpoints as model Artifacts with resolution metadata for downstream inference services.
Promote the best alias to production in Model Registry before shipping an API that loads a fixed artifact version.
Compare evaluation Artifacts across weeks to see which data mix improved offline metrics.
How it compares
W&B Artifacts and registry integration guide—not a NeMo infra resiliency skill and not a generic pytest workflow.
Common Questions / FAQ
Who is weights-and-biases for?
Indie ML and LLM builders who want agent help implementing W&B artifact versioning, lineage, and model promotion patterns in Python training code.
When should I use weights-and-biases?
Use it during Build when integrating wandb into training; during Ship when packaging checkpoints for release; and during Grow when comparing artifact-tagged eval results across versions.
Is weights-and-biases safe to install?
The skill teaches cloud-backed logging and file uploads—store API keys outside the repo and review the Security Audits panel on this Prism page before granting network or secrets access to the agent.
SKILL.md
READMESKILL.md - Weights And Biases
# Artifacts & Model Registry Guide Complete guide to data versioning and model management with W&B Artifacts. ## Table of Contents - What are Artifacts - Creating Artifacts - Using Artifacts - Model Registry - Versioning & Lineage - Best Practices ## What are Artifacts Artifacts are versioned datasets, models, or files tracked with lineage. **Key Features:** - Automatic versioning (v0, v1, v2...) - Lineage tracking (which runs produced/used artifacts) - Efficient storage (deduplication) - Collaboration (team-wide access) - Aliases (latest, best, production) **Common Use Cases:** - Dataset versioning - Model checkpoints - Preprocessed data - Evaluation results - Configuration files ## Creating Artifacts ### Basic Dataset Artifact ```python import wandb run = wandb.init(project="my-project") # Create artifact dataset = wandb.Artifact( name='training-data', type='dataset', description='ImageNet training split with augmentations', metadata={ 'size': '1.2M images', 'format': 'JPEG', 'resolution': '224x224' } ) # Add files dataset.add_file('data/train.csv') # Single file dataset.add_dir('data/images') # Entire directory dataset.add_reference('s3://bucket/data') # Cloud reference # Log artifact run.log_artifact(dataset) wandb.finish() ``` ### Model Artifact ```python import torch import wandb run = wandb.init(project="my-project") # Train model model = train_model() # Save model torch.save(model.state_dict(), 'model.pth') # Create model artifact model_artifact = wandb.Artifact( name='resnet50-classifier', type='model', description='ResNet50 trained on ImageNet', metadata={ 'architecture': 'ResNet50', 'accuracy': 0.95, 'loss': 0.15, 'epochs': 50, 'framework': 'PyTorch' } ) # Add model file model_artifact.add_file('model.pth') # Add config model_artifact.add_file('config.yaml') # Log with aliases run.log_artifact(model_artifact, aliases=['latest', 'best']) wandb.finish() ``` ### Preprocessed Data Artifact ```python import pandas as pd import wandb run = wandb.init(project="nlp-project") # Preprocess data df = pd.read_csv('raw_data.csv') df_processed = preprocess(df) df_processed.to_csv('processed_data.csv', index=False) # Create artifact processed_data = wandb.Artifact( name='processed-text-data', type='dataset', metadata={ 'rows': len(df_processed), 'columns': list(df_processed.columns), 'preprocessing_steps': ['lowercase', 'remove_stopwords', 'tokenize'] } ) processed_data.add_file('processed_data.csv') # Log artifact run.log_artifact(processed_data) ``` ## Using Artifacts ### Download and Use ```python import wandb run = wandb.init(project="my-project") # Download artifact artifact = run.use_artifact('training-data:latest') artifact_dir = artifact.download() # Use files import pandas as pd df = pd.read_csv(f'{artifact_dir}/train.csv') # Train with artifact data model = train_model(df) ``` ### Use Specific Version ```python # Use specific version artifact_v2 = run.use_artifact('training-data:v2') # Use alias artifact_best = run.use_artifact('model:best') artifact_prod = run.use_artifact('model:production') # Use from another project artifact = run.use_artifact('team/other-project/model:latest') ``` ### Check Artifact Metadata ```python artifact = run.use_artifact('training-data:latest') # Access metadata print(artifact.metadata) print(f"Size: {artifact.metadata['size']}") # Access version info print(f"Version: {artifact.version}") print(f"Created at: {artifact.created_at}") print(f"Digest: {artifact.digest}") ``` ## Model Registry Link models to a central registry for governance and deployment. ### Create Model Registry ```python # In W&B UI: # 1. Go to "Registry" tab # 2. Create new registry: "production-models" # 3. Define stages: development, staging, production ``` ### Link Model to Registry ```python import wandb run =