
Audiocraft Audio Generation
Fine-tune Meta AudioCraft MusicGen on a custom WAV dataset with 32 kHz mono prep and metadata for solo builders adding AI music to apps or demos.
Overview
AudioCraft Audio Generation is an agent skill for the Build phase that prepares and fine-tunes MusicGen-style datasets with torchaudio at 32 kHz mono for custom AI music generation.
Install
npx skills add https://github.com/davila7/claude-code-templates --skill audiocraft-audio-generationWhat is this skill?
- Step-by-step MusicGen fine-tuning dataset layout (audio/ + metadata.json)
- Loads audio with torchaudio, resamples to 32 kHz, collapses stereo to mono
- Writes numbered WAV outputs and duration fields for training manifests
- Python + pathlib workflow suited to local GPU or notebook environments
- Advanced AudioCraft usage beyond one-shot text-to-music prompts
Adoption & trust: 510 installs on skills.sh; 27.8k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You want branded or domain-specific music from MusicGen but only have raw audio files and no consistent training manifest or resampling pipeline.
Who is it for?
Indie builders fine-tuning MusicGen on a small labeled clip set before wiring audio generation into a product or agent workflow.
Skip if: Teams that only need a single prompt-to-mp3 from a hosted API with no custom training or local GPU setup.
When should I use this skill?
You are fine-tuning MusicGen or preparing a custom AudioCraft training set from labeled audio and metadata.
What do I get? / Deliverables
You get a documented dataset folder with processed WAVs and metadata JSON ready to plug into AudioCraft MusicGen fine-tuning in your repo.
- Processed audio/ folder with 32 kHz mono WAV files
- metadata.json manifest with paths, descriptions, and durations
- Repeatable dataset prep script pattern for fine-tuning runs
Recommended Skills
Journey fit
Audio model fine-tuning and dataset prep are core product-build work before you ship generative audio features. Training pipelines, torchaudio resampling, and on-disk dataset layout are backend/ML engineering, not frontend or launch copy.
How it compares
Use for dataset prep and fine-tuning depth; use a hosted generative-audio API skill when you only need inference without training.
Common Questions / FAQ
Who is audiocraft-audio-generation for?
Solo and indie builders shipping AI-generated music who already use Python and want MusicGen fine-tuning on custom audio rather than default model output only.
When should I use audiocraft-audio-generation?
During Build when you are creating training data for MusicGen—resampling clips, building metadata.json, and following AudioCraft advanced fine-tuning steps before integration or ship.
Is audiocraft-audio-generation safe to install?
Review the Security Audits panel on this Prism page and inspect the skill source in your repo; it implies local filesystem access and Python ML dependencies, not remote secrets by default.
SKILL.md
READMESKILL.md - Audiocraft Audio Generation
# AudioCraft Advanced Usage Guide ## Fine-tuning MusicGen ### Custom dataset preparation ```python import os import json from pathlib import Path import torchaudio def prepare_dataset(audio_dir, output_dir, metadata_file): """ Prepare dataset for MusicGen fine-tuning. Directory structure: output_dir/ ├── audio/ │ ├── 0001.wav │ ├── 0002.wav │ └── ... └── metadata.json """ output_dir = Path(output_dir) audio_output = output_dir / "audio" audio_output.mkdir(parents=True, exist_ok=True) # Load metadata (format: {"path": "...", "description": "..."}) with open(metadata_file) as f: metadata = json.load(f) processed = [] for idx, item in enumerate(metadata): audio_path = Path(audio_dir) / item["path"] # Load and resample to 32kHz wav, sr = torchaudio.load(str(audio_path)) if sr != 32000: resampler = torchaudio.transforms.Resample(sr, 32000) wav = resampler(wav) # Convert to mono if stereo if wav.shape[0] > 1: wav = wav.mean(dim=0, keepdim=True) # Save processed audio output_path = audio_output / f"{idx:04d}.wav" torchaudio.save(str(output_path), wav, sample_rate=32000) processed.append({ "path": str(output_path.relative_to(output_dir)), "description": item["description"], "duration": wav.shape[1] / 32000 }) # Save processed metadata with open(output_dir / "metadata.json", "w") as f: json.dump(processed, f, indent=2) print(f"Processed {len(processed)} samples") return processed ``` ### Fine-tuning with dora ```bash # AudioCraft uses dora for experiment management # Install dora pip install dora-search # Clone AudioCraft git clone https://github.com/facebookresearch/audiocraft.git cd audiocraft # Create config for fine-tuning cat > config/solver/musicgen/finetune.yaml << 'EOF' defaults: - musicgen/musicgen_base - /model: lm/musicgen_lm - /conditioner: cond_base solver: musicgen autocast: true autocast_dtype: float16 optim: epochs: 100 batch_size: 4 lr: 1e-4 ema: 0.999 optimizer: adamw dataset: batch_size: 4 num_workers: 4 train: - dset: your_dataset root: /path/to/dataset valid: - dset: your_dataset root: /path/to/dataset checkpoint: save_every: 10 keep_every_states: null EOF # Run fine-tuning dora run solver=musicgen/finetune ``` ### LoRA fine-tuning ```python from peft import LoraConfig, get_peft_model from audiocraft.models import MusicGen import torch # Load base model model = MusicGen.get_pretrained('facebook/musicgen-small') # Get the language model component lm = model.lm # Configure LoRA lora_config = LoraConfig( r=8, lora_alpha=16, target_modules=["q_proj", "v_proj", "k_proj", "out_proj"], lora_dropout=0.05, bias="none" ) # Apply LoRA lm = get_peft_model(lm, lora_config) lm.print_trainable_parameters() ``` ## Multi-GPU Training ### DataParallel ```python import torch import torch.nn as nn from audiocraft.models import MusicGen model = MusicGen.get_pretrained('facebook/musicgen-small') # Wrap LM with DataParallel if torch.cuda.device_count() > 1: model.lm = nn.DataParallel(model.lm) model.to("cuda") ``` ### DistributedDataParallel ```python import torch.distributed as dist from torch.nn.parallel import DistributedDataParallel as DDP def setup(rank, world_size): dist.init_process_group("nccl", rank=rank, world_size=world_size) torch.cuda.set_device(rank) def train(rank, world_size): setup(rank, world_size) model = MusicGen.get_pretrained('facebook/musicgen-small') model.lm = model.lm.to(rank) model.lm = DDP(model.lm, device_ids=[rank]) # Training loop # ... dist.destroy_process_group() ``` ## Custom Conditioning ### Adding new conditioners ```python from audiocraft.modules.conditioners import BaseConditioner import t