
Ace Step
Generate, inpaint, or extend stereo music tracks with StepFun ACE Step through the RunComfy CLI when you need cheap, tag-driven audio for demos, apps, or ads.
Overview
ACE Step is an agent skill for the Build phase that generates and edits music via the RunComfy CLI using StepFun-AI’s ACE Step text-to-audio, inpaint, and outpaint endpoints.
Install
npx skills add https://github.com/agentspace-so/runcomfy-agent-skills --skill ace-stepWhat is this skill?
- Four CLI endpoints: ACE Step text-to-audio, ACE Step 1.5 (50+ language lyrics), audio-inpaint, and audio-outpaint
- Tag-driven prompts (genre, mood, instruments) plus multilingual lyrics with section markers
- Stereo output from 5 seconds up to 4 minutes per generation call
- Pricing roughly $0.0002–0.0003 per second of audio (~27× cheaper than ElevenLabs Music per skill copy)
- Explicit trigger phrases for ace-step, inpaint/outpaint, and cheap open-weights music generation
- 4 CLI-reachable endpoints (text-to-audio, 1.5 text-to-audio, inpaint, outpaint)
- 5 s to 4 min stereo output per call
Adoption & trust: 142k installs on skills.sh; 15 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need custom music or surgical edits to a track but premium per-second APIs or local model hosting blow the budget for a solo ship cycle.
Who is it for?
Indie builders who already use RunComfy and want predictable, tag-based music generation plus inpaint/outpaint fixes without switching tools.
Skip if: Teams that need guaranteed chart-ready vocal production quality on every take—route to the ai-music skill’s ElevenLabs path or a human mix instead.
When should I use this skill?
User says ace step, ace-step, acestep, ACE music, cheap AI music, inpaint audio, extend music, audio outpaint, music with tags, or asks to generate or edit music with ACE Step on RunComfy.
What do I get? / Deliverables
You get generated or patched stereo audio files with documented `runcomfy` invokes, sized from short loops to four-minute cuts at open-weights pricing.
- Generated stereo audio file(s) from text-to-audio or ACE Step 1.5
- Inpainted or outpainted audio segments with documented time ranges
Recommended Skills
Journey fit
ACE Step is wired as an agent-callable RunComfy CLI integration—the canonical shelf is Build → Integrations where external media APIs are invoked from the coding agent. All four endpoints are reached via `runcomfy run` Bash tooling, not a standalone creative workflow, so integrations is the correct subphase.
How it compares
A RunComfy CLI integration for one music model family, not the broader ai-music router that picks ElevenLabs versus ACE Step by intent.
Common Questions / FAQ
Who is ace-step for?
Solo and indie builders using Claude Code, Cursor, or similar agents who want StepFun ACE Step music generation and edits through the documented RunComfy CLI commands.
When should I use ace-step?
Use it during Build when wiring media into a product or content pipeline—game loops, demo soundtracks, tagged instrumentals, multilingual lyric drafts—or when you must inpaint a bad section or outpaint a short draft into a longer cut before launch assets ship.
Is ace-step safe to install?
It grants Bash access to `runcomfy` and bills your RunComfy account per second of audio; review the Security Audits panel on this Prism page and your API spend limits before enabling it in autonomous agents.
Workflow Chain
Then invoke: ai music
SKILL.md
READMESKILL.md - Ace Step
# ACE Step — Pro Pack on RunComfy Tag-driven music generation, inpainting, and outpainting with StepFun-AI's **ACE Step** open-weights model. Four CLI-reachable endpoints, $0.0002–0.0003 per second of audio, up to 4 minutes per call. [runcomfy.com](https://www.runcomfy.com/?utm_source=skills.sh&utm_medium=skill&utm_campaign=ace-step) · [ACE Step base](https://www.runcomfy.com/models/acestep-ai/ace-step/text-to-audio?utm_source=skills.sh&utm_medium=skill&utm_campaign=ace-step) · [ACE Step 1.5](https://www.runcomfy.com/models/acestep-ai/ace-step-1.5/text-to-audio?utm_source=skills.sh&utm_medium=skill&utm_campaign=ace-step) · [CLI docs](https://docs.runcomfy.com/cli/introduction?utm_source=skills.sh&utm_medium=skill&utm_campaign=ace-step) ## Install this skill ```bash npx skills add agentspace-so/runcomfy-agent-skills --skill ace-step -g ``` ## Powered by the RunComfy CLI **Step 1 — install** (one of, see the `runcomfy-cli` skill for details): ```bash npm i -g @runcomfy/cli # global install npx -y @runcomfy/cli --version # zero-install ``` **Step 2 — sign in** (or set `RUNCOMFY_TOKEN` env var in CI / containers): ```bash runcomfy login ``` **Step 3 — generate**: ```bash runcomfy run acestep-ai/ace-step/text-to-audio \ --input '{"tags": "..."}' \ --output-dir ./out ``` CLI deep dive: [`runcomfy-cli`](https://www.skills.sh/agentspace-so/runcomfy-agent-skills/runcomfy-cli) skill. --- ## Pick the right endpoint Listed newest first. **ACE Step 1.5 (text-to-audio)** — `acestep-ai/ace-step-1.5/text-to-audio` > Latest ACE Step generation. **50+ language vocal support**, refined structured-lyric handling, otherwise same shape as base. Slightly higher cost ($0.0003/s vs $0.0002/s). > Pick for: multilingual lyrics, hero-quality vocal tracks, vocal songs that need clean section structure. > Avoid for: cost-sensitive batches where the base model is good enough. **ACE Step (text-to-audio)** — `acestep-ai/ace-step/text-to-audio` *(default — cheap & fast)* > Original ACE Step. Tag-driven composition, optional lyrics, 5–240 s stereo. $0.0002/s — ~27× cheaper than ElevenLabs Music. > Pick for: high-volume drafts, background music, jingles, game loops, cost-sensitive iteration. > Avoid for: maximally polished commercial vocal hooks — try **ACE Step 1.5** or **ElevenLabs Music** for those. **ACE Step (audio-inpaint)** — `acestep-ai/ace-step/audio-inpaint` > Regenerate a **time range** inside an existing track (not mask-based; uses `start_time` / `end_time` in seconds, each anchored to track start or end). > Pick for: fix a bad chorus in the middle, swap the bridge, replace a 20 s section without re-rendering the whole song. > Avoid for: edits that aren't time-bounded — those don't fit the schema. **ACE Step (audio-outpaint)** — `acestep-ai/ace-step/audio-outpaint` > Extend an existing track **bidirectionally** — add intro before, outro after, or both. > Pick for: