
Stable Diffusion Image Generation
Generate and fine-tune Stable Diffusion images in Python with custom diffusers pipelines instead of one-off prompts in chat.
Overview
Stable Diffusion Image Generation is an agent skill most often used in Build (also Validate prototype visuals, Launch content) that teaches custom diffusers pipelines and denoising loops for local txt2img generation.
Install
npx skills add https://github.com/davila7/claude-code-templates --skill stable-diffusion-image-generationWhat is this skill?
- Build Stable Diffusion pipelines from individual UNet, VAE, CLIP text encoder, tokenizer, and scheduler components
- Custom denoising loop with configurable steps, guidance scale, height, and width
- Examples disable the safety checker for local/dev pipelines when explicitly configured
- Python + PyTorch + diffusers + transformers stack with pretrained stable-diffusion-v1-5 paths
- Supports advanced control beyond default txt2img wrappers
- Custom denoising example defaults to 50 steps and guidance_scale 7.5 at 512×512
Adoption & trust: 1.2k installs on skills.sh; 27.8k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need consistent AI-generated images in your product or content workflow but only have fragile chat prompts instead of versioned Python pipelines.
Who is it for?
Solo builders shipping content products, game/UI mock assets, or automated thumbnail pipelines who already run Python and want Stable Diffusion under their control.
Skip if: Teams that only need a hosted image API with no local GPU, or beginners who want one-click image buttons without PyTorch setup.
When should I use this skill?
User asks for Stable Diffusion scripts, diffusers custom pipelines, or programmatic image generation in Python.
What do I get? / Deliverables
You get runnable diffusers assembly and custom_generate patterns you can drop into scripts, notebooks, or backend jobs with explicit step and guidance controls.
- Assembled StableDiffusionPipeline from components
- custom_generate denoising function
- Configurable height, width, steps, and guidance parameters
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf is Build because the skill centers on assembling models, schedulers, and denoising code—not on market research or launch distribution. Integrations fits best: Hugging Face checkpoints, UNet/VAE/text encoder wiring, and optional safety-checker toggles are third-party ML stack work.
Where it fits
Render hero and feature mock images for a landing page before paying a designer.
Embed a custom_generate function in a backend job that produces user-specific avatars.
Batch social post visuals with fixed seeds and dimensions for a launch week.
Automate blog header images from article titles in a content pipeline.
How it compares
Skill package for self-hosted diffusers code—not a hosted Midjourney integration or a single REST image MCP.
Common Questions / FAQ
Who is stable-diffusion-image-generation for?
Indie developers and creators who generate images programmatically with Stable Diffusion v1.5-style checkpoints via Python, diffusers, and transformers.
When should I use stable-diffusion-image-generation?
During Build when wiring ML asset pipelines; during Validate when prototyping landing hero images; during Grow when batching social or blog visuals—whenever you need scripted generation rather than manual UI exports.
Is stable-diffusion-image-generation safe to install?
Review the Security Audits panel on this Prism page before installing; the skill documents disabling safety_checker for dev pipelines, so treat outputs and model sources as your responsibility.
SKILL.md
READMESKILL.md - Stable Diffusion Image Generation
# Stable Diffusion Advanced Usage Guide ## Custom Pipelines ### Building from components ```python from diffusers import ( UNet2DConditionModel, AutoencoderKL, DDPMScheduler, StableDiffusionPipeline ) from transformers import CLIPTextModel, CLIPTokenizer import torch # Load components individually unet = UNet2DConditionModel.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet" ) vae = AutoencoderKL.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="vae" ) text_encoder = CLIPTextModel.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="text_encoder" ) tokenizer = CLIPTokenizer.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="tokenizer" ) scheduler = DDPMScheduler.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="scheduler" ) # Assemble pipeline pipe = StableDiffusionPipeline( unet=unet, vae=vae, text_encoder=text_encoder, tokenizer=tokenizer, scheduler=scheduler, safety_checker=None, feature_extractor=None, requires_safety_checker=False ) ``` ### Custom denoising loop ```python from diffusers import DDIMScheduler, AutoencoderKL, UNet2DConditionModel from transformers import CLIPTextModel, CLIPTokenizer import torch def custom_generate( prompt: str, num_steps: int = 50, guidance_scale: float = 7.5, height: int = 512, width: int = 512 ): # Load components tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14") text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14") unet = UNet2DConditionModel.from_pretrained("sd-model", subfolder="unet") vae = AutoencoderKL.from_pretrained("sd-model", subfolder="vae") scheduler = DDIMScheduler.from_pretrained("sd-model", subfolder="scheduler") device = "cuda" text_encoder.to(device) unet.to(device) vae.to(device) # Encode prompt text_input = tokenizer( prompt, padding="max_length", max_length=77, truncation=True, return_tensors="pt" ) text_embeddings = text_encoder(text_input.input_ids.to(device))[0] # Unconditional embeddings for classifier-free guidance uncond_input = tokenizer( "", padding="max_length", max_length=77, return_tensors="pt" ) uncond_embeddings = text_encoder(uncond_input.input_ids.to(device))[0] # Concatenate for batch processing text_embeddings = torch.cat([uncond_embeddings, text_embeddings]) # Initialize latents latents = torch.randn( (1, 4, height // 8, width // 8), device=device ) latents = latents * scheduler.init_noise_sigma # Denoising loop scheduler.set_timesteps(num_steps) for t in scheduler.timesteps: latent_model_input = torch.cat([latents] * 2) latent_model_input = scheduler.scale_model_input(latent_model_input, t) # Predict noise with torch.no_grad(): noise_pred = unet( latent_model_input, t, encoder_hidden_states=text_embeddings ).sample # Classifier-free guidance noise_pred_uncond, noise_pred_cond = noise_pred.chunk(2) noise_pred = noise_pred_uncond + guidance_scale * ( noise_pred_cond - noise_pred_uncond ) # Update latents latents = scheduler.step(noise_pred, t, latents).prev_sample # Decode latents latents = latents / vae.config.scaling_factor with torch.no_grad(): image = vae.decode(latents).sample # Convert to PIL image = (image / 2 + 0.5).clamp(0, 1) image = image.cpu().permute(0, 2, 3, 1).numpy() image = (image * 255).round().astype("uint8")[0] return Image.fromarray(image) ``` ## IP-Adapter Use image prompts alongside text: ```python from diffusers import StableDiffusionPi