Axiom Vision

Name: Axiom Vision
Author: charleswiltgen

charleswiltgen/axiom·MIT

Implement Apple Vision features in an iOS app—segmentation, pose, OCR, barcodes, and document scanning—without guessing APIs or coordinate bugs.

Overview

axiom-vision is an agent skill for the Build phase that routes any Apple Vision-framework computer vision work through curated implementation and diagnostic references.

Install

npx skills add https://github.com/charleswiltgen/axiom --skill axiom-vision

What is this skill?

Mandated entry point for any Vision-framework computer vision work in this repo
Quick-reference routing table maps symptoms (subject lift, OCR, barcodes, iOS 26+ structured docs) to vision-framework,
Covers subject segmentation, hand/body pose, text recognition, barcode/QR, document scanning, and DataScannerViewControl
Dedicated diagnostics for missed subjects, landmark gaps, low confidence, UI freezes, and coordinate conversion bugs
iOS 26+ notes for structured document extraction and Visual Intelligence integration
Quick-reference table with 15+ symptom/task rows routing to vision-framework, vision-ref, and vision-diag docs

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 916 installs on skills.sh; 956 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What problem does it solve?

You are adding or fixing on-device vision in an iOS app but lack a single, enforced guide for APIs, iOS-version features, and the usual detection, OCR, and coordinate bugs.

Who is it for?

Solo builders shipping native iOS apps with camera or gallery pipelines that need Vision segmentation, pose, OCR, or barcode features.

Skip if: Android or cross-platform CV without Vision, server-side-only image ML, or teams that will not adopt the repo’s mandatory Vision skill gate.

When should I use this skill?

Implementing ANY computer vision feature—image analysis, pose detection, person segmentation, subject lifting, text recognition, barcode scanning—or debugging related Vision issues.

What do I get? / Deliverables

Your agent follows the Vision quick-reference table to the right skill files and applies documented fixes for segmentation, pose, OCR, barcodes, and performance issues.

Vision-backed feature implementation aligned to repo reference docs
Diagnostic resolution path for failed detection, OCR, or coordinate issues

Recommended Skills

Vercel React Native Skillsvercel-labs/agent-skills

Agent-oriented React Native performance playbook spanning core rendering, list optimization, and incremental tuning for …137k installs·27.7k stars

Firebase Basicsfirebase/agent-skills

Firebase-basics is an agent skill that walks solo builders through standing up Firebase for an Android application using…76.3k installs·345 stars

Building Native Uiexpo/skills

building-native-ui is an Expo-oriented agent skill that steers solo builders toward React Native Reanimated v4 for anima…46.9k installs·2k stars

Firebase Ai Logic Basicsfirebase/agent-skills

firebase-ai-logic-basics is a Firebase agent skill for solo builders shipping Flutter apps who want Google Gemini throug…39.7k installs·345 stars

Native Data Fetchingexpo/skills

Native Data Fetching is an Expo agent skill that governs how solo builders implement and troubleshoot every network requ…37.2k installs·2k stars

Firebase Firestorefirebase/agent-skills

Firebase-firestore is a narrow Firebase agent skill for solo builders shipping Android apps in Kotlin who need Cloud Fir…37k installs·345 stars

Journey fit

Primary fit

BuildUI/UX & frontend

Computer vision ships as in-app product code on Apple platforms, so the canonical shelf is Build when you are wiring UI and on-device ML pipelines. Vision work touches SwiftUI/UIKit surfaces, camera frames, and overlay geometry—classic frontend/mobile implementation alongside framework integration.

Also useful

ShipTesting & QA

Also useful

OperateError tracking

How it compares

Use as a structured Vision playbook instead of ad-hoc Stack Overflow snippets for each CV symptom.

Common Questions / FAQ

Who is axiom-vision for?

Indie iOS developers and agents implementing or debugging Apple Vision features in Swift apps, from OCR to pose and subject lifting.

When should I use axiom-vision?

During Build whenever you implement image analysis, pose detection, segmentation, subject lifting, text recognition, barcode scanning, document scanning, or DataScannerViewController—or when diagnosing missed detections or coordinate bugs.

Is axiom-vision safe to install?

It is documentation-only procedural guidance under MIT license; review the Security Audits panel on this Prism page before trusting any third-party skill in your workflow.

SKILL.md

READMESKILL.md - Axiom Vision

# Computer Vision

**You MUST use this skill for ANY computer vision work using the Vision framework.**

## Quick Reference

| Symptom / Task | Reference |
|----------------|-----------|
| Subject segmentation, lifting | See `skills/vision-framework.md` |
| Hand/body pose detection | See `skills/vision-framework.md` |
| Text recognition (OCR) | See `skills/vision-framework.md` |
| Barcode/QR code detection | See `skills/vision-framework.md` |
| Document scanning | See `skills/vision-framework.md` |
| DataScannerViewController | See `skills/vision-framework.md` |
| Structured document extraction (iOS 26+) | See `skills/vision-framework.md` |
| Isolate object excluding hand | See `skills/vision-framework.md` |
| Vision framework API reference | See `skills/vision-ref.md` |
| Visual Intelligence integration (iOS 26+) | See `skills/vision-ref.md` |
| Subject not detected | See `skills/vision-diag.md` |
| Hand/body pose missing landmarks | See `skills/vision-diag.md` |
| Low confidence observations | See `skills/vision-diag.md` |
| UI freezing during processing | See `skills/vision-diag.md` |
| Coordinate conversion bugs | See `skills/vision-diag.md` |
| Text not recognized / wrong chars | See `skills/vision-diag.md` |
| Barcode not detected | See `skills/vision-diag.md` |
| DataScanner blank / no items | See `skills/vision-diag.md` |
| Document edges not detected | See `skills/vision-diag.md` |

## Decision Tree

```dot
digraph vision {
    start [label="Computer vision task" shape=ellipse];
    what [label="What do you need?" shape=diamond];

    start -> what;
    what -> "skills/vision-framework.md" [label="implement feature"];
    what -> "skills/vision-ref.md" [label="API reference"];
    what -> "skills/vision-ref.md" [label="Visual Intelligence"];
    what -> "skills/vision-diag.md" [label="something broken"];
}
```

1. Implementing (pose, segmentation, OCR, barcodes, documents, live scanning)? → `skills/vision-framework.md`
2. Visual Intelligence system integration (camera feature, iOS 26+)? → `skills/vision-ref.md` (Visual Intelligence section)
3. Need API reference / code examples? → `skills/vision-ref.md`
4. Debugging issues (detection failures, confidence, coordinates)? → `skills/vision-diag.md`

## Critical Patterns

**Implementation** (`skills/vision-framework.md`):
- Decision tree for choosing the right Vision API
- Subject segmentation with VisionKit
- Isolating objects while excluding hands (combining APIs)
- Hand/body pose detection (21/19 landmarks)
- Text recognition (fast vs accurate modes)
- Barcode detection with symbology selection
- Document scanning and structured extraction (iOS 26+)
- Live scanning with DataScannerViewController
- CoreImage HDR compositing

**Diagnostics** (`skills/vision-diag.md`):
- Subject detection failures (edge of frame, lighting)
- Landmark tracking issues (confidence thresholds)
- Performance optimization (frame skipping, downscaling)
- Coordinate conversion (lower-left vs top-left origin)
- Text recognition failures (language, contrast)
- Barcode detection issues (symbology, size, glare)
- DataScanner troubleshooting (availability, data types)

## Anti-Rationalization

| Thought | Reality |
|---------|---------|
| "Vision framework is just a request/handler pattern" | Vision has coordinate conversion, confidence thresholds, and performance gotchas. vision-framework.md covers them. |
| "I'll handle text recognition without the skill" | VNRecognizeTextRequest has fast/accurate modes and language-specific settings. vision-framework.md has the patterns. |
| "Subject segmentation is straightforward" | Instance masks have HDR compositing and hand-exclusion patterns. vision-framework.md covers complex scenarios. |
| "Visual Intelligence is just the camera API" | Visual Intelli

What is this skill?

Mandated entry point for any Vision-framework computer vision work in this repo

Quick-reference routing table maps symptoms (subject lift, OCR, barcodes, iOS 26+ structured docs) to vision-framework,

Covers subject segmentation, hand/body pose, text recognition, barcode/QR, document scanning, and DataScannerViewControl

Dedicated diagnostics for missed subjects, landmark gaps, low confidence, UI freezes, and coordinate conversion bugs

iOS 26+ notes for structured document extraction and Visual Intelligence integration

Quick-reference table with 15+ symptom/task rows routing to vision-framework, vision-ref, and vision-diag docs

Compatible agents: Claude Code, Cursor, Codex, Windsurf

Adoption & trust: 916 installs on skills.sh; 956 GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).

What do I get? / Deliverables

Your agent follows the Vision quick-reference table to the right skill files and applies documented fixes for segmentation, pose, OCR, barcodes, and performance issues.

Vision-backed feature implementation aligned to repo reference docs

Diagnostic resolution path for failed detection, OCR, or coordinate issues

Journey fit

Primary fit

BuildUI/UX & frontend

Also useful

ShipTesting & QA

Also useful

OperateError tracking

SKILL.md

READMESKILL.md - Axiom Vision

# Computer Vision

**You MUST use this skill for ANY computer vision work using the Vision framework.**

## Quick Reference

| Symptom / Task | Reference |
|----------------|-----------|
| Subject segmentation, lifting | See `skills/vision-framework.md` |
| Hand/body pose detection | See `skills/vision-framework.md` |
| Text recognition (OCR) | See `skills/vision-framework.md` |
| Barcode/QR code detection | See `skills/vision-framework.md` |
| Document scanning | See `skills/vision-framework.md` |
| DataScannerViewController | See `skills/vision-framework.md` |
| Structured document extraction (iOS 26+) | See `skills/vision-framework.md` |
| Isolate object excluding hand | See `skills/vision-framework.md` |
| Vision framework API reference | See `skills/vision-ref.md` |
| Visual Intelligence integration (iOS 26+) | See `skills/vision-ref.md` |
| Subject not detected | See `skills/vision-diag.md` |
| Hand/body pose missing landmarks | See `skills/vision-diag.md` |
| Low confidence observations | See `skills/vision-diag.md` |
| UI freezing during processing | See `skills/vision-diag.md` |
| Coordinate conversion bugs | See `skills/vision-diag.md` |
| Text not recognized / wrong chars | See `skills/vision-diag.md` |
| Barcode not detected | See `skills/vision-diag.md` |
| DataScanner blank / no items | See `skills/vision-diag.md` |
| Document edges not detected | See `skills/vision-diag.md` |

## Decision Tree

```dot
digraph vision {
    start [label="Computer vision task" shape=ellipse];
    what [label="What do you need?" shape=diamond];

    start -> what;
    what -> "skills/vision-framework.md" [label="implement feature"];
    what -> "skills/vision-ref.md" [label="API reference"];
    what -> "skills/vision-ref.md" [label="Visual Intelligence"];
    what -> "skills/vision-diag.md" [label="something broken"];
}
```

1. Implementing (pose, segmentation, OCR, barcodes, documents, live scanning)? → `skills/vision-framework.md`
2. Visual Intelligence system integration (camera feature, iOS 26+)? → `skills/vision-ref.md` (Visual Intelligence section)
3. Need API reference / code examples? → `skills/vision-ref.md`
4. Debugging issues (detection failures, confidence, coordinates)? → `skills/vision-diag.md`

## Critical Patterns

**Implementation** (`skills/vision-framework.md`):
- Decision tree for choosing the right Vision API
- Subject segmentation with VisionKit
- Isolating objects while excluding hands (combining APIs)
- Hand/body pose detection (21/19 landmarks)
- Text recognition (fast vs accurate modes)
- Barcode detection with symbology selection
- Document scanning and structured extraction (iOS 26+)
- Live scanning with DataScannerViewController
- CoreImage HDR compositing

**Diagnostics** (`skills/vision-diag.md`):
- Subject detection failures (edge of frame, lighting)
- Landmark tracking issues (confidence thresholds)
- Performance optimization (frame skipping, downscaling)
- Coordinate conversion (lower-left vs top-left origin)
- Text recognition failures (language, contrast)
- Barcode detection issues (symbology, size, glare)
- DataScanner troubleshooting (availability, data types)

## Anti-Rationalization

| Thought | Reality |
|---------|---------|
| "Vision framework is just a request/handler pattern" | Vision has coordinate conversion, confidence thresholds, and performance gotchas. vision-framework.md covers them. |
| "I'll handle text recognition without the skill" | VNRecognizeTextRequest has fast/accurate modes and language-specific settings. vision-framework.md has the patterns. |
| "Subject segmentation is straightforward" | Instance masks have HDR compositing and hand-exclusion patterns. vision-framework.md covers complex scenarios. |
| "Visual Intelligence is just the camera API" | Visual Intelli

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is axiom-vision for?

When should I use axiom-vision?

Is axiom-vision safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is axiom-vision for?

When should I use axiom-vision?

Is axiom-vision safe to install?

SKILL.md