
Axiom Vision
Implement Apple Vision features in an iOS app—segmentation, pose, OCR, barcodes, and document scanning—without guessing APIs or coordinate bugs.
Overview
axiom-vision is an agent skill for the Build phase that routes any Apple Vision-framework computer vision work through curated implementation and diagnostic references.
Install
npx skills add https://github.com/charleswiltgen/axiom --skill axiom-visionWhat is this skill?
- Mandated entry point for any Vision-framework computer vision work in this repo
- Quick-reference routing table maps symptoms (subject lift, OCR, barcodes, iOS 26+ structured docs) to vision-framework,
- Covers subject segmentation, hand/body pose, text recognition, barcode/QR, document scanning, and DataScannerViewControl
- Dedicated diagnostics for missed subjects, landmark gaps, low confidence, UI freezes, and coordinate conversion bugs
- iOS 26+ notes for structured document extraction and Visual Intelligence integration
- Quick-reference table with 15+ symptom/task rows routing to vision-framework, vision-ref, and vision-diag docs
Adoption & trust: 916 installs on skills.sh; 956 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
You are adding or fixing on-device vision in an iOS app but lack a single, enforced guide for APIs, iOS-version features, and the usual detection, OCR, and coordinate bugs.
Who is it for?
Solo builders shipping native iOS apps with camera or gallery pipelines that need Vision segmentation, pose, OCR, or barcode features.
Skip if: Android or cross-platform CV without Vision, server-side-only image ML, or teams that will not adopt the repo’s mandatory Vision skill gate.
When should I use this skill?
Implementing ANY computer vision feature—image analysis, pose detection, person segmentation, subject lifting, text recognition, barcode scanning—or debugging related Vision issues.
What do I get? / Deliverables
Your agent follows the Vision quick-reference table to the right skill files and applies documented fixes for segmentation, pose, OCR, barcodes, and performance issues.
- Vision-backed feature implementation aligned to repo reference docs
- Diagnostic resolution path for failed detection, OCR, or coordinate issues
Recommended Skills
Journey fit
Computer vision ships as in-app product code on Apple platforms, so the canonical shelf is Build when you are wiring UI and on-device ML pipelines. Vision work touches SwiftUI/UIKit surfaces, camera frames, and overlay geometry—classic frontend/mobile implementation alongside framework integration.
How it compares
Use as a structured Vision playbook instead of ad-hoc Stack Overflow snippets for each CV symptom.
Common Questions / FAQ
Who is axiom-vision for?
Indie iOS developers and agents implementing or debugging Apple Vision features in Swift apps, from OCR to pose and subject lifting.
When should I use axiom-vision?
During Build whenever you implement image analysis, pose detection, segmentation, subject lifting, text recognition, barcode scanning, document scanning, or DataScannerViewController—or when diagnosing missed detections or coordinate bugs.
Is axiom-vision safe to install?
It is documentation-only procedural guidance under MIT license; review the Security Audits panel on this Prism page before trusting any third-party skill in your workflow.
SKILL.md
READMESKILL.md - Axiom Vision
# Computer Vision **You MUST use this skill for ANY computer vision work using the Vision framework.** ## Quick Reference | Symptom / Task | Reference | |----------------|-----------| | Subject segmentation, lifting | See `skills/vision-framework.md` | | Hand/body pose detection | See `skills/vision-framework.md` | | Text recognition (OCR) | See `skills/vision-framework.md` | | Barcode/QR code detection | See `skills/vision-framework.md` | | Document scanning | See `skills/vision-framework.md` | | DataScannerViewController | See `skills/vision-framework.md` | | Structured document extraction (iOS 26+) | See `skills/vision-framework.md` | | Isolate object excluding hand | See `skills/vision-framework.md` | | Vision framework API reference | See `skills/vision-ref.md` | | Visual Intelligence integration (iOS 26+) | See `skills/vision-ref.md` | | Subject not detected | See `skills/vision-diag.md` | | Hand/body pose missing landmarks | See `skills/vision-diag.md` | | Low confidence observations | See `skills/vision-diag.md` | | UI freezing during processing | See `skills/vision-diag.md` | | Coordinate conversion bugs | See `skills/vision-diag.md` | | Text not recognized / wrong chars | See `skills/vision-diag.md` | | Barcode not detected | See `skills/vision-diag.md` | | DataScanner blank / no items | See `skills/vision-diag.md` | | Document edges not detected | See `skills/vision-diag.md` | ## Decision Tree ```dot digraph vision { start [label="Computer vision task" shape=ellipse]; what [label="What do you need?" shape=diamond]; start -> what; what -> "skills/vision-framework.md" [label="implement feature"]; what -> "skills/vision-ref.md" [label="API reference"]; what -> "skills/vision-ref.md" [label="Visual Intelligence"]; what -> "skills/vision-diag.md" [label="something broken"]; } ``` 1. Implementing (pose, segmentation, OCR, barcodes, documents, live scanning)? → `skills/vision-framework.md` 2. Visual Intelligence system integration (camera feature, iOS 26+)? → `skills/vision-ref.md` (Visual Intelligence section) 3. Need API reference / code examples? → `skills/vision-ref.md` 4. Debugging issues (detection failures, confidence, coordinates)? → `skills/vision-diag.md` ## Critical Patterns **Implementation** (`skills/vision-framework.md`): - Decision tree for choosing the right Vision API - Subject segmentation with VisionKit - Isolating objects while excluding hands (combining APIs) - Hand/body pose detection (21/19 landmarks) - Text recognition (fast vs accurate modes) - Barcode detection with symbology selection - Document scanning and structured extraction (iOS 26+) - Live scanning with DataScannerViewController - CoreImage HDR compositing **Diagnostics** (`skills/vision-diag.md`): - Subject detection failures (edge of frame, lighting) - Landmark tracking issues (confidence thresholds) - Performance optimization (frame skipping, downscaling) - Coordinate conversion (lower-left vs top-left origin) - Text recognition failures (language, contrast) - Barcode detection issues (symbology, size, glare) - DataScanner troubleshooting (availability, data types) ## Anti-Rationalization | Thought | Reality | |---------|---------| | "Vision framework is just a request/handler pattern" | Vision has coordinate conversion, confidence thresholds, and performance gotchas. vision-framework.md covers them. | | "I'll handle text recognition without the skill" | VNRecognizeTextRequest has fast/accurate modes and language-specific settings. vision-framework.md has the patterns. | | "Subject segmentation is straightforward" | Instance masks have HDR compositing and hand-exclusion patterns. vision-framework.md covers complex scenarios. | | "Visual Intelligence is just the camera API" | Visual Intelli