
Debugging And Error Recovery
Stop feature churn and run a ordered triage process when tests, builds, or runtime behavior break unexpectedly.
Install
npx skills add https://github.com/addyosmani/agent-skills --skill debugging-and-error-recoveryWhat is this skill?
- Stop-the-line rule: halt features, preserve evidence, diagnose, fix root cause, guard, then resume after verification
- Structured triage checklist starting with reliable reproduction before deeper diagnosis
- Applies to test failures, build breaks, runtime mismatches, bug reports, log/console errors, and regressions
- Explicit anti-pattern: do not push past failing tests or broken builds to the next feature
- Root-cause focus instead of guess-and-patch loops
Adoption & trust: 4.2k installs on skills.sh; 49.1k GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
Recommended Skills
Journey fit
Canonical shelf is ship/testing because the skill foregrounds failing tests and broken builds immediately after change—the stop-the-line moment before release. Testing is where reproducible failures and verification gates belong; the checklist resumes work only after guards pass.
Common Questions / FAQ
Is Debugging And Error Recovery safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Debugging And Error Recovery
# Debugging and Error Recovery ## Overview Systematic debugging with structured triage. When something breaks, stop adding features, preserve evidence, and follow a structured process to find and fix the root cause. Guessing wastes time. The triage checklist works for test failures, build errors, runtime bugs, and production incidents. ## When to Use - Tests fail after a code change - The build breaks - Runtime behavior doesn't match expectations - A bug report arrives - An error appears in logs or console - Something worked before and stopped working ## The Stop-the-Line Rule When anything unexpected happens: ``` 1. STOP adding features or making changes 2. PRESERVE evidence (error output, logs, repro steps) 3. DIAGNOSE using the triage checklist 4. FIX the root cause 5. GUARD against recurrence 6. RESUME only after verification passes ``` **Don't push past a failing test or broken build to work on the next feature.** Errors compound. A bug in Step 3 that goes unfixed makes Steps 4-10 wrong. ## The Triage Checklist Work through these steps in order. Do not skip steps. ### Step 1: Reproduce Make the failure happen reliably. If you can't reproduce it, you can't fix it with confidence. ``` Can you reproduce the failure? ├── YES → Proceed to Step 2 └── NO ├── Gather more context (logs, environment details) ├── Try reproducing in a minimal environment └── If truly non-reproducible, document conditions and monitor ``` **When a bug is non-reproducible:** ``` Cannot reproduce on demand: ├── Timing-dependent? │ ├── Add timestamps to logs around the suspected area │ ├── Try with artificial delays (setTimeout, sleep) to widen race windows │ └── Run under load or concurrency to increase collision probability ├── Environment-dependent? │ ├── Compare Node/browser versions, OS, environment variables │ ├── Check for differences in data (empty vs populated database) │ └── Try reproducing in CI where the environment is clean ├── State-dependent? │ ├── Check for leaked state between tests or requests │ ├── Look for global variables, singletons, or shared caches │ └── Run the failing scenario in isolation vs after other operations └── Truly random? ├── Add defensive logging at the suspected location ├── Set up an alert for the specific error signature └── Document the conditions observed and revisit when it recurs ``` For test failures: ```bash # Run the specific failing test npm test -- --grep "test name" # Run with verbose output npm test -- --verbose # Run in isolation (rules out test pollution) npm test -- --testPathPattern="specific-file" --runInBand ``` ### Step 2: Localize Narrow down WHERE the failure happens: ``` Which layer is failing? ├── UI/Frontend → Check console, DOM, network tab ├── API/Backend → Check server logs, request/response ├── Database → Check queries, schema, data integrity ├── Build tooling → Check config, dependencies, environment ├── External service → Check connectivity, API changes, rate limits └── Test itself → Check if the test is correct (false negative) ``` **Use bisection for regression bugs:** ```bash # Find which commit introduced the bug git bisect start git bisect bad # Current commit is broken git bisect good <known-good-sha> # This commit worked # Git will checkout midpoint commits; run your test at each git bisect run npm test -- --grep "failing test" ``` ### Step 3: Reduce Create the minimal failing case: - Remove unrelated code/config until only the bug remains - Simplify the input to the smallest example that triggers the failure - Strip the test to the bare minimum that reproduces the issue A