
Automate This
Turn a screen recording of a repetitive manual workflow into analyzed steps and working automation scripts using tools already on the machine.
Overview
Automate This is an agent skill most often used in Build (also Operate, Grow) that analyzes screen recordings and produces working automation scripts from extracted frames and optional narration.
Install
npx skills add https://github.com/github/awesome-copilot --skill automate-thisWhat is this skill?
- Silent prerequisite checks for ffmpeg (required) and Whisper (optional for narrated audio)
- Frame extraction plus optional audio transcription to reconstruct step-by-step workflows from screen recordings
- Proposes automation at multiple complexity levels using locally installed tools
- Phase-1 pipeline: visual frames and audio from a typical Desktop video path before scripting
- Graceful fallback to visual-only analysis when Whisper is declined or unavailable
- Two-phase skill flow: prerequisite checks then Phase 1 content extraction from recordings
- ffmpeg required; Whisper optional when recordings include narration
Adoption & trust: 2k installs on skills.sh; 34.6k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You repeat the same clicks, exports, and file chores on video-worthy loops but lack time to reverse-engineer and script the workflow yourself.
Who is it for?
Builders who can record their screen once and want ffmpeg-based extraction plus optional Whisper transcription before script generation.
Skip if: Fully unattended RPA on locked-down machines without ffmpeg, or processes that cannot be observed safely on screen.
When should I use this skill?
User provides a screen recording of a manual or tedious process and wants automation scripts derived from video (frames and optional audio).
What do I get? / Deliverables
You get a reconstructed procedure and one or more automation options aligned to local tooling, ready for you to review and run—not a vague “you could automate this” suggestion.
- Reconstructed step-by-step workflow
- Automation script options at multiple complexity levels
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Converting observed work into scripts is agent-assisted tooling that accelerates product and ops automation during build and iteration. The skill is about extracting workflow from video and generating scripts—agent tooling and automation craft, not shipping checks or growth campaigns.
Where it fits
Record a one-off data export from an admin UI and generate a script that replays the export nightly.
Capture a weekly log-download and rename ritual to replace it with a reviewed shell or Python job.
Automate a repetitive social or asset prep sequence you demonstrated in a screen capture.
Script a manual bridge between two apps shown in a recording when no official API exists.
How it compares
Video-to-script workflow skill using local ffmpeg/Whisper—not a hosted RPA recorder or generic “write me a bash script” with no visual grounding.
Common Questions / FAQ
Who is automate-this for?
Solo operators and indie builders who capture repetitive desktop or terminal work on video and want an agent to propose concrete automations.
When should I use automate-this?
In Build when scripting internal tooling; in Operate when replacing manual maintenance rituals; in Grow when automating recurring content or data chores—whenever you have a recording and ffmpeg available.
Is automate-this safe to install?
Generated scripts can move files and run shell commands; review the Security Audits panel on this page and dry-run automation in a sandbox before production use.
SKILL.md
READMESKILL.md - Automate This
# Automate This Analyze a screen recording of a manual process and build working automation for it. The user records themselves doing something repetitive or tedious, hands you the video file, and you figure out what they're doing, why, and how to script it away. ## Prerequisites Check Before analyzing any recording, verify the required tools are available. Run these checks silently and only surface problems: ```bash command -v ffmpeg >/dev/null 2>&1 && ffmpeg -version 2>/dev/null | head -1 || echo "NO_FFMPEG" command -v whisper >/dev/null 2>&1 || command -v whisper-cpp >/dev/null 2>&1 || echo "NO_WHISPER" ``` - **ffmpeg is required.** If missing, tell the user: `brew install ffmpeg` (macOS) or the equivalent for their OS. - **Whisper is optional.** Only needed if the recording has narration. If missing AND the recording has an audio track, suggest: `pip install openai-whisper` or `brew install whisper-cpp`. If the user declines, proceed with visual analysis only. ## Phase 1: Extract Content from the Recording Given a video file path (typically on `~/Desktop/`), extract both visual frames and audio: ### Frame Extraction Extract frames at one frame every 2 seconds. This balances coverage with context window limits. ```bash WORK_DIR=$(mktemp -d "${TMPDIR:-/tmp}/automate-this-XXXXXX") chmod 700 "$WORK_DIR" mkdir -p "$WORK_DIR/frames" ffmpeg -y -i "<VIDEO_PATH>" -vf "fps=0.5" -q:v 2 -loglevel warning "$WORK_DIR/frames/frame_%04d.jpg" ls "$WORK_DIR/frames/" | wc -l ``` Use `$WORK_DIR` for all subsequent temp file paths in the session. The per-run directory with mode 0700 ensures extracted frames are only readable by the current user. If the recording is longer than 5 minutes (more than 150 frames), increase the interval to one frame every 4 seconds to stay within context limits. Tell the user you're sampling less frequently for longer recordings. ### Audio Extraction and Transcription Check if the video has an audio track: ```bash ffprobe -i "<VIDEO_PATH>" -show_streams -select_streams a -loglevel error | head -5 ``` If audio exists: ```bash ffmpeg -y -i "<VIDEO_PATH>" -ac 1 -ar 16000 -loglevel warning "$WORK_DIR/audio.wav" # Use whichever whisper binary is available if command -v whisper >/dev/null 2>&1; then whisper "$WORK_DIR/audio.wav" --model small --language en --output_format txt --output_dir "$WORK_DIR/" cat "$WORK_DIR/audio.txt" elif command -v whisper-cpp >/dev/null 2>&1; then whisper-cpp -m "$(brew --prefix 2>/dev/null)/share/whisper-cpp/models/ggml-small.bin" -l en -f "$WORK_DIR/audio.wav" -otxt -of "$WORK_DIR/audio" cat "$WORK_DIR/audio.txt" else echo "NO_WHISPER" fi ``` If neither whisper binary is available and the recording has audio, inform the user they're missing narration context and ask if they want to install Whisper (`pip install openai-whisper` or `brew install whisper-cpp`) or proceed with visual-only analysis. ## Phase 2: Reconstruct the Process Analyze the extracted frames (and transcript, if available) to build a structured understanding of what the user did. Work through the frames sequentially and identify: 1. **Applications used** — Which apps appear in the recording? (browser, terminal, Finder, mail client, spreadsheet, IDE, etc.) 2. **Sequence of actions** — What did the user do, in order? Click-by-click, step-by-step. 3. **Data flow** — What information moved between steps? (copied text, downloaded files, form inputs, etc.) 4. **Decision points** — Were there moments where the user paused, checked something, or made a choice? 5. **Repetition patterns** — Did the user do the same thing multiple times with different inputs? 6. **Pain