
openai/codex
7 skills2.6k installs628k starsGitHub
Install
npx skills add https://github.com/openai/codexSkills in this repo
1Babysit PrBabysit-pr is a Codex-oriented agent skill for solo and indie builders who want hands-off oversight of an open GitHub pull request. After you open or point at a PR, the skill continuously polls review comments, CI checks and workflow runs, and mergeability instead of treating a single idle snapshot as done while jobs are still pending. It diagnoses failures, retries flaky runs up to three times, and when appropriate auto-fixes and pushes branch-related problems so the PR moves toward merge. Terminal outcomes are merged, closed, or a clear escalation when permissions, infrastructure, repeated flakes, or ambiguous situations need a human. Optional milestones like all-green and review-clean are progress states, not reasons to stop watching—late reviewer feedback still gets surfaced promptly. Install it when you ask your agent to monitor, watch, or babysit a PR through review and CI gates.1.5kinstalls2Test TuiTest TUI is a focused agent skill for contributors and solo maintainers working on OpenAI Codex’s terminal interface. It explains how to spin up the TUI interactively, crank observability to RUST_LOG=trace, and route logs to a chosen directory via -c log_dir so repro steps are shareable. The input sequencing note matters for anyone automating prompts: deliver message text and Enter as separate writes to mirror real terminal behavior and avoid flaky tests. The just codex entry point ties verification to the repo’s standard task runner, reducing drift between local hacks and CI-adjacent workflows. Reach for this skill when you changed TUI rendering, key handling, or session flow and need a repeatable manual smoke pass before you call the change shippable.1.2kinstalls3Code Breaking ChangesCode-breaking-changes is a Codex-oriented agent skill that directs exhaustive breaking-change analysis on external integration surfaces: HTTP app-server APIs, command-line parameters, configuration loading behavior, and resuming sessions from existing rollouts. Solo builders shipping APIs, CLIs, or agent backends use it when a dependency bump or refactor might change contracts without obvious compile errors. The skill’s posture is breadth-first: do not treat one discovered issue as sufficient—enumerate every plausible way consumers could break. It fits the Ship phase as a structured review pass before tagging a release or merging a risky PR, and it remains useful in Operate when you validate an upgrade against production integration patterns. It does not implement fixes; it drives systematic discovery so you can patch, document, or semver appropriately.1installs4Code ReviewCodex code-review is a repository-local orchestrator for solo builders and tiny teams who want a structured final pass on a pull request without manually invoking a dozen review playbooks. Instead of one shallow lint comment, it spawns subagents—one per sibling code-review-* skill—and aggregates unlimited, numbered Markdown findings that each cite a concrete file and line. That design suits Ship when you are hardening a feature branch, preparing a launch candidate, or closing a security-sensitive change where missed issues are costly. The skill encodes GitHub workflow edges: add a code-reviewed label when the reviewer owns the PR, and stay silent on GitHub threads unless asked, which keeps noisy bots off your repo while still giving you a dense report inside the agent session. It is intermediate complexity because you must maintain the code-review-* skill family and pass paths correctly to subagents. Use it after implementation and tests, as a gate before merge, and again in Operate when reviewing hotfix PRs under time pressure.1installs5Code Review Change SizeCode Review Change Size is a compact OpenAI Codex skill that sets explicit line-count guardrails so code review—especially with an agent—stays accurate and actionable. Solo builders shipping through GitHub or similar flows install it to prevent thousand-line dumps that models skim or misread. Unless work is mechanical refactoring, total changed lines should not exceed eight hundred; when logic is intricate, the bar drops to five hundred. If a diff is larger, the skill instructs the agent to explain whether the work can split into reviewable stages and to name the smallest coherent slice to merge first, grounded in the real patch graph and affected call sites. It pairs naturally with Codex-driven review workflows and team norms for incremental delivery.1installs6Code Review ContextCode Review Context documents how OpenAI Codex assembles the message history and injected fragments sent to the model during code review inference. It is aimed at people extending or auditing Codex itself: keep context incremental, avoid changes that bust caches, cap every injected piece, and implement fragments as typed structs in `core/context`. Items approaching or exceeding about one thousand tokens need extra manual review; nothing may exceed ten thousand tokens. Solo builders installing this from a skills catalog should treat it as reference for agent-context design during review automation, not as a substitute for human or linter-driven review of their own app repo. Advanced complexity and Codex-repo-specific.1installs7Code Review TestingCode Review Testing is a Codex-repo agent skill that tells contributors how to prove agent changes safely. Solo builders and small teams hacking on Codex itself use it whenever they alter decision paths, tools, or user-visible agent behavior. The guidance is explicit: favor integration tests under core/suite that spin a real test instance through test_codex, because unit tests alone miss cross-component failures in an agent stack. When logic shifts materially, you must list the major behavioral changes and cover them in integration tests. If unit coverage is still needed, isolate it in *_tests.rs files and keep production modules free of test-only hooks. The skill also nudges you to search for existing helpers so new tests stay readable. It spans active Build work and Ship verification, but catalogs best on the testing shelf where release confidence is decided.1installs