
Runbook Generator
Generate deployment, incident, and database maintenance runbook skeletons with rollback triggers and quarterly verification steps.
Overview
runbook-generator is an agent skill most often used in Operate (infra), also Ship (launch) and Operate (errors), that produces deployment and incident runbook skeletons for production services.
Install
npx skills add https://github.com/alirezarezvani/claude-skills --skill runbook-generatorWhat is this skill?
- Deployment runbook: pre-checks, deploy steps, smoke tests, rollback triggers
- Incident response: 5-minute triage, diagnosis, mitigation, postmortem actions
- Database maintenance: backup verification, migration lock-risk, vacuum/reindex routines
- Staleness detection hooks for Vercel, Helm, Terraform, and CI workflow files
- Python CLI emits dated runbook markdown with owner and environment metadata
- 5-step quarterly validation checklist
- Incident triage phase scoped to first 5 minutes
- 4 staleness-tracking config families (deploy, CI, schema, runtime env)
Adoption & trust: 526 installs on skills.sh; 17.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You can deploy from memory but lack written rollback paths, escalation owners, and verified steps when something breaks at 2 a.m.
Who is it for?
Solo SaaS or API founders documenting first serious production operations on Vercel, Kubernetes, or scripted CI.
Skip if: Products with no production environment yet, or teams that already maintain enterprise SRE playbooks with dedicated tooling.
When should I use this skill?
Preparing production deployment documentation, incident response playbooks, or database maintenance procedures for a named service.
What do I get? / Deliverables
You get structured runbook markdown and checklists linked to your real config and CI files, ready for staging validation and quarterly refresh.
- Markdown runbook with deploy, incident, or DB maintenance sections
- Last-verified date and escalation placeholders ready for staging drill
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Runbooks are living production artifacts; the Operate phase infra shelf is where solo builders first institutionalize how services are deployed and recovered. Templates tie to deployment config, CI pipelines, and env schema—core infrastructure operations—not marketing or ideation.
Where it fits
Draft pre-deploy checks and smoke tests before flipping production traffic.
Document Helm or Vercel deploy sequence with explicit rollback triggers.
Capture triage and mitigation steps after the first sev-2 outage.
How it compares
Opinionated markdown runbook scaffolds—not a hosted incident platform or passive README.
Common Questions / FAQ
Who is runbook-generator for?
Indie builders wearing devops and support hats who need deploy and incident docs that stay aligned with actual pipeline and infra files.
When should I use runbook-generator?
At Ship (launch) before first prod deploy; in Operate (infra) when adding services; in Operate (errors) after incidents to capture mitigation and postmortem steps.
Is runbook-generator safe to install?
It includes a local Python generator; review SKILL.md and the Security Audits panel on this page before running scripts against production credentials or paths.
SKILL.md
READMESKILL.md - Runbook Generator
# Runbook Templates ## Deployment Runbook Template - Pre-deployment checks - Deploy steps with expected output - Smoke tests - Rollback plan with explicit triggers - Escalation and communication notes ## Incident Response Template - Triage phase (first 5 minutes) - Diagnosis phase (logs, metrics, recent deploys) - Mitigation phase (containment and restoration) - Resolution and postmortem actions ## Database Maintenance Template - Backup and restore verification - Migration sequencing and lock-risk notes - Vacuum/reindex routines - Verification queries and performance checks ## Staleness Detection Template Track referenced config files and update runbooks whenever these change: - deployment config (`vercel.json`, Helm charts, Terraform) - CI pipelines (`.github/workflows/*`, `.gitlab-ci.yml`) - data schema/migration definitions - service runtime/env configuration ## Quarterly Validation Checklist 1. Execute commands in staging. 2. Validate expected outputs. 3. Test rollback paths. 4. Confirm contact/escalation ownership. 5. Update `Last verified` date. #!/usr/bin/env python3 """Generate an operational runbook skeleton for a service.""" from __future__ import annotations import argparse from datetime import date from pathlib import Path def build_runbook(service: str, owner: str, environment: str) -> str: today = date.today().isoformat() return f"""# Runbook - {service} - Service: {service} - Owner: {owner} - Environment: {environment} - Last verified: {today} ## Overview Describe the service purpose, dependencies, and critical user impact. ## Preconditions - Access to deployment platform - Access to logs/metrics - Access to secret/config manager ## Start Procedure 1. Pull latest config/secrets. 2. Start service process. 3. Confirm process is healthy. ```bash # Example # systemctl start {service} ``` ## Stop Procedure 1. Drain traffic if applicable. 2. Stop service process. 3. Confirm no active workers remain. ```bash # Example # systemctl stop {service} ``` ## Health Checks - HTTP health endpoint - Dependency connectivity checks - Error-rate and latency checks ```bash # Example # curl -sf https://{service}.example.com/health ``` ## Deployment Checklist 1. Verify CI status and artifact integrity. 2. Apply migrations (if required) in safe order. 3. Deploy service revision. 4. Run smoke checks. 5. Observe metrics for 10-15 minutes. ## Rollback 1. Identify last known good release. 2. Re-deploy previous version. 3. Re-run health checks. 4. Communicate rollback status to stakeholders. ```bash # Example # deployctl rollback --service {service} ``` ## Incident Response 1. Classify severity. 2. Contain user impact. 3. Triage likely failing component. 4. Escalate if SLA risk is high. ## Escalation - L1: On-call engineer - L2: Service owner ({owner}) - L3: Platform/Engineering leadership ## Post-Incident 1. Write timeline and root cause. 2. Define corrective actions with owners. 3. Update this runbook with missing steps. """ def parse_args() -> argparse.Namespace: parser = argparse.ArgumentParser(description="Generate a markdown runbook skeleton.") parser.add_argument("service", help="Service name") parser.add_argument("--owner", default="platform-team", help="Service owner label") parser.add_argument("--environment", default="production", help="Primary environment") parser.add_argument("--output", help="Optional output path (prints to stdout if omitted)") return parser.parse_args() def main() -> int: args = parse_args() markdown = build_runbook(args.service, owner=args.owner, environment=args.environment) if args.output: path = Path(args.output) path.parent.mkdir(parents=True, exist_ok=True) path.write_text(markdown, encoding="utf-8") print(f"Wrote runbook skeleton to {path}") else: print(markdown) return 0 if __name__ == "__main__": raise SystemExit(main()) --- name: "runbook-generator" descr