Plugin · Claude Code · DevOps

Geored Sre Skill

geored-sre-skill is a Claude Code plugin for the Operate phase that guides systematic, read-only Kubernetes incident investigation and root cause analysis.

by geored · github.com/geored/sre-skill

Run structured Kubernetes incident triage in Claude Code when pods crash, degrade, or fail health checks instead of guessing from a single log line.

1
GitHub stars
0
Installs
0
Community votes
One vote per signed-in builder - it helps surface the tools the community actually relies on.
Install

Add it to Claude Code

Install the plugin in Claude Code. One command, paste-ready.

Install the plugin
/plugin install geored-sre-skill@geored/sre-skill
Add to ClaudeUse the Agent APISkillselion is itself an MCP server - your agent can fetch this config directly.
Agent API

Built to be called by your agent

Skillselion is itself an MCP server. Your agent can pull this entry and a paste-ready install config straight from the API - no copy-paste.

Retrieve this entry with skillselion.get_details("plugin:geored/sre-skill") and the paste-ready config with skillselion.get_install_config("plugin:geored/sre-skill").

About

What it does

geored-sre-skill is a Claude Code plugin that teaches a systematic, read-only Kubernetes incident investigation flow aligned with SRE practice. Solo builders and small teams shipping on K8s install it when an outage or flaky rollout forces fast, evidence-based answers instead of ad-hoc kubectl spelunking. The skill emphasizes correlation across logs, events, and metrics, walks common pod and network failure modes, and pushes toward root cause before suggesting fixes. It fits anyone who owns cluster health but does not want the agent to mutate production blindly. Use it during active incidents, post-mortem prep, or when mentoring an agent through a structured 5-phase checklist. Complexity is intermediate: you should already know basic kubectl and pod lifecycle concepts. The outcome is a documented chain of evidence, likely cause, and next remediation steps you can execute or ticket.

Highlights

  • Five-phase investigation methodology correlating logs, events, and metrics read-only via kubectl
  • Common failure pattern library for pod, network, and resource incidents
  • Multi-source triage distinguishing symptoms from underlying causes
  • Actionable remediation recommendations without inventing cluster state
  • Structured SRE workflow for Kubernetes service degradation

Why builders use it

When a cluster misbehaves, solo builders waste time jumping between kubectl commands without a consistent way to tie symptoms to root cause.

After invoking the skill, you get a structured incident narrative, correlated evidence, and prioritized remediation ideas grounded in what you actually observed.

At a glance

  • Type - Plugin in DevOps.
  • Adoption - 0 installs, 1 stars, 0 votes.

FAQ

Who is geored-sre-skill for?

Builders who operate their own Kubernetes workloads and want Claude Code to investigate outages using logs, events, and metrics in a fixed multi-phase workflow.

When should I use geored-sre-skill?

Use it when pods fail health checks, crash loop, cannot pull images, hit OOM, or show network issues and you need root cause analysis before changing the cluster.

How do I add geored-sre-skill to my agent?

Install the geored/sre-skill Claude Code plugin from the registry, ensure kubectl context points at the target cluster, then invoke the skill with a concise incident summary and namespace scope.

Discussion

Comments

Share how you use geored-sre-skill, gotchas, or tips for other indie builders.

No comments yet - be the first to share how you use it.

This week for builders

Five minutes, every Monday — the tools, releases and tactics for shipping solo.

unsubscribe anytime.