
Stable Diffusion Image Generation
Wire Stable Diffusion image generation into agent workflows or apps using custom Hugging Face diffusers pipelines and denoising loops.
Install
npx skills add https://github.com/orchestra-research/ai-research-skills --skill stable-diffusion-image-generationWhat is this skill?
- Load UNet, VAE, CLIP text encoder, tokenizer, and scheduler as separate pretrained components
- Assemble a StableDiffusionPipeline with optional safety_checker disabled for controlled setups
- Implement a custom denoising loop with configurable steps, guidance_scale, height, and width
- Supports DDIMScheduler and manual torch-based inference for advanced tuning
- Python-first patterns using diffusers and transformers
Adoption & trust: 1 installs on skills.sh; 9.4k GitHub stars; 2/3 security scanners passed (skills.sh audits).
Recommended Skills
Video Editagentspace-so/runcomfy-agent-skills
Image To Videoagentspace-so/runcomfy-agent-skills
Image Editagentspace-so/runcomfy-agent-skills
Flux Kontextagentspace-so/runcomfy-agent-skills
Nano Banana 2agentspace-so/runcomfy-agent-skills
Nano Banana Editagentspace-so/runcomfy-agent-skills
Journey fit
Primary fit
Image generation is implemented as product or agent integrations during the build phase, not as launch or growth distribution work. The guide focuses on assembling diffusers components and custom generate loops—classic third-party ML integration work.
Common Questions / FAQ
Is Stable Diffusion Image Generation safe to install?
skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Stable Diffusion Image Generation
# Stable Diffusion Advanced Usage Guide ## Custom Pipelines ### Building from components ```python from diffusers import ( UNet2DConditionModel, AutoencoderKL, DDPMScheduler, StableDiffusionPipeline ) from transformers import CLIPTextModel, CLIPTokenizer import torch # Load components individually unet = UNet2DConditionModel.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="unet" ) vae = AutoencoderKL.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="vae" ) text_encoder = CLIPTextModel.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="text_encoder" ) tokenizer = CLIPTokenizer.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="tokenizer" ) scheduler = DDPMScheduler.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", subfolder="scheduler" ) # Assemble pipeline pipe = StableDiffusionPipeline( unet=unet, vae=vae, text_encoder=text_encoder, tokenizer=tokenizer, scheduler=scheduler, safety_checker=None, feature_extractor=None, requires_safety_checker=False ) ``` ### Custom denoising loop ```python from diffusers import DDIMScheduler, AutoencoderKL, UNet2DConditionModel from transformers import CLIPTextModel, CLIPTokenizer import torch def custom_generate( prompt: str, num_steps: int = 50, guidance_scale: float = 7.5, height: int = 512, width: int = 512 ): # Load components tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14") text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14") unet = UNet2DConditionModel.from_pretrained("sd-model", subfolder="unet") vae = AutoencoderKL.from_pretrained("sd-model", subfolder="vae") scheduler = DDIMScheduler.from_pretrained("sd-model", subfolder="scheduler") device = "cuda" text_encoder.to(device) unet.to(device) vae.to(device) # Encode prompt text_input = tokenizer( prompt, padding="max_length", max_length=77, truncation=True, return_tensors="pt" ) text_embeddings = text_encoder(text_input.input_ids.to(device))[0] # Unconditional embeddings for classifier-free guidance uncond_input = tokenizer( "", padding="max_length", max_length=77, return_tensors="pt" ) uncond_embeddings = text_encoder(uncond_input.input_ids.to(device))[0] # Concatenate for batch processing text_embeddings = torch.cat([uncond_embeddings, text_embeddings]) # Initialize latents latents = torch.randn( (1, 4, height // 8, width // 8), device=device ) latents = latents * scheduler.init_noise_sigma # Denoising loop scheduler.set_timesteps(num_steps) for t in scheduler.timesteps: latent_model_input = torch.cat([latents] * 2) latent_model_input = scheduler.scale_model_input(latent_model_input, t) # Predict noise with torch.no_grad(): noise_pred = unet( latent_model_input, t, encoder_hidden_states=text_embeddings ).sample # Classifier-free guidance noise_pred_uncond, noise_pred_cond = noise_pred.chunk(2) noise_pred = noise_pred_uncond + guidance_scale * ( noise_pred_cond - noise_pred_uncond ) # Update latents latents = scheduler.step(noise_pred, t, latents).prev_sample # Decode latents latents = latents / vae.config.scaling_factor with torch.no_grad(): image = vae.decode(latents).sample # Convert to PIL image = (image / 2 + 0.5).clamp(0, 1) image = image.cpu().permute(0, 2, 3, 1).numpy() image = (image * 255).round().astype("uint8")[0] return Image.fromarray(image) ``` ## IP-Adapter Use image prompts alongside text: ```python from diffusers import StableDiffusionPi