
Audiocraft Audio Generation
Fine-tune and run Meta AudioCraft MusicGen pipelines with custom audio datasets resampled to 32 kHz mono WAV and metadata JSON.
Overview
Audiocraft-audio-generation is an agent skill for the Build phase that documents AudioCraft MusicGen fine-tuning and 32 kHz mono dataset preparation with torchaudio.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill audiocraft-audio-generationWhat is this skill?
- Custom dataset preparation: audio/ folder plus metadata.json with path and description fields
- Resample arbitrary inputs to 32 kHz and collapse stereo to mono via torchaudio
- MusicGen fine-tuning workflow grounded in AudioCraft directory conventions
- Python snippets for loading metadata, saving processed 0001.wav-style files, and recording duration
- Advanced usage guide beyond one-shot inference prompts
- Target training sample rate 32 kHz for processed WAV outputs
- Dataset layout uses zero-padded filenames such as 0001.wav under audio/
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have raw audio and captions but no standardized MusicGen fine-tuning dataset, so training jobs fail on sample rate, channels, or metadata shape.
Who is it for?
Indie builders prototyping generative audio features who want copy-paste dataset prep aligned with MusicGen training expectations.
Skip if: Teams needing only real-time inference without fine-tuning, or builders avoiding Python/GPU training entirely.
When should I use this skill?
Fine-tuning MusicGen or preparing AudioCraft-compatible audio corpora from custom metadata.
What do I get? / Deliverables
You produce an output_dir with processed WAVs and metadata.json ready for AudioCraft fine-tuning scripts.
- Processed audio/ WAV set at 32 kHz mono
- metadata.json with relative paths, descriptions, and durations
Recommended Skills
Journey fit
Audio model fine-tuning and dataset prep happen while building the generative feature backend, before you ship inference endpoints or product UI. Backend subphase fits ML training scripts, torchaudio preprocessing, and dataset layout—not storefront launch or marketing.
How it compares
Training-data and fine-tune workflow for AudioCraft—not a hosted music API integration or a single-line inference demo.
Common Questions / FAQ
Who is audiocraft-audio-generation for?
Solo developers and small teams using coding agents to prepare custom MusicGen datasets and follow AudioCraft advanced fine-tuning patterns in Python.
When should I use audiocraft-audio-generation?
During Build backend work when converting labeled audio into 32 kHz mono WAVs and metadata.json before launching a fine-tune job or validating a generative audio prototype.
Is audiocraft-audio-generation safe to install?
Check the Security Audits panel on this Prism page before installing; the skill references local Python and audio processing—review any script execution permissions you grant your agent.
SKILL.md
READMESKILL.md - Audiocraft Audio Generation
# AudioCraft Advanced Usage Guide ## Fine-tuning MusicGen ### Custom dataset preparation ```python import os import json from pathlib import Path import torchaudio def prepare_dataset(audio_dir, output_dir, metadata_file): """ Prepare dataset for MusicGen fine-tuning. Directory structure: output_dir/ ├── audio/ │ ├── 0001.wav │ ├── 0002.wav │ └── ... └── metadata.json """ output_dir = Path(output_dir) audio_output = output_dir / "audio" audio_output.mkdir(parents=True, exist_ok=True) # Load metadata (format: {"path": "...", "description": "..."}) with open(metadata_file) as f: metadata = json.load(f) processed = [] for idx, item in enumerate(metadata): audio_path = Path(audio_dir) / item["path"] # Load and resample to 32kHz wav, sr = torchaudio.load(str(audio_path)) if sr != 32000: resampler = torchaudio.transforms.Resample(sr, 32000) wav = resampler(wav) # Convert to mono if stereo if wav.shape[0] > 1: wav = wav.mean(dim=0, keepdim=True) # Save processed audio output_path = audio_output / f"{idx:04d}.wav" torchaudio.save(str(output_path), wav, sample_rate=32000) processed.append({ "path": str(output_path.relative_to(output_dir)), "description": item["description"], "duration": wav.shape[1] / 32000 }) # Save processed metadata with open(output_dir / "metadata.json", "w") as f: json.dump(processed, f, indent=2) print(f"Processed {len(processed)} samples") return processed ``` ### Fine-tuning with dora ```bash # AudioCraft uses dora for experiment management # Install dora pip install dora-search # Clone AudioCraft git clone https://github.com/facebookresearch/audiocraft.git cd audiocraft # Create config for fine-tuning cat > config/solver/musicgen/finetune.yaml << 'EOF' defaults: - musicgen/musicgen_base - /model: lm/musicgen_lm - /conditioner: cond_base solver: musicgen autocast: true autocast_dtype: float16 optim: epochs: 100 batch_size: 4 lr: 1e-4 ema: 0.999 optimizer: adamw dataset: batch_size: 4 num_workers: 4 train: - dset: your_dataset root: /path/to/dataset valid: - dset: your_dataset root: /path/to/dataset checkpoint: save_every: 10 keep_every_states: null EOF # Run fine-tuning dora run solver=musicgen/finetune ``` ### LoRA fine-tuning ```python from peft import LoraConfig, get_peft_model from audiocraft.models import MusicGen import torch # Load base model model = MusicGen.get_pretrained('facebook/musicgen-small') # Get the language model component lm = model.lm # Configure LoRA lora_config = LoraConfig( r=8, lora_alpha=16, target_modules=["q_proj", "v_proj", "k_proj", "out_proj"], lora_dropout=0.05, bias="none" ) # Apply LoRA lm = get_peft_model(lm, lora_config) lm.print_trainable_parameters() ``` ## Multi-GPU Training ### DataParallel ```python import torch import torch.nn as nn from audiocraft.models import MusicGen model = MusicGen.get_pretrained('facebook/musicgen-small') # Wrap LM with DataParallel if torch.cuda.device_count() > 1: model.lm = nn.DataParallel(model.lm) model.to("cuda") ``` ### DistributedDataParallel ```python import torch.distributed as dist from torch.nn.parallel import DistributedDataParallel as DDP def setup(rank, world_size): dist.init_process_group("nccl", rank=rank, world_size=world_size) torch.cuda.set_device(rank) def train(rank, world_size): setup(rank, world_size) model = MusicGen.get_pretrained('facebook/musicgen-small') model.lm = model.lm.to(rank) model.lm = DDP(model.lm, device_ids=[rank]) # Training loop # ... dist.destroy_process_group() ``` ## Custom Conditioning ### Adding new conditioners ```python from audiocraft.modules.conditioners import BaseConditioner import t