
Vision Framework
Add on-device OCR, barcode scanning, face detection, or Core ML vision inference to a Swift iOS app without guessing between modern Vision APIs and legacy VNRequest patterns.
Install
npx skills add https://github.com/dpearson2699/swift-ios-skills --skill vision-frameworkWhat is this skill?
- Covers modern Swift-native Vision API (iOS 16+) and legacy VNRequest patterns with iOS 26+ / Swift 6.3 targets
- Text recognition (OCR), face detection, barcode detection, segmentation, object tracking, and body pose requests
- Document scanning patterns for iOS 26+ plus VisionKit DataScannerViewController for live camera scanning
- VNCoreMLRequest integration for custom on-device model inference
- Separate reference docs for vision-requests and VisionKit scanner integration
Adoption & trust: 1.7k installs on skills.sh; 713 GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Journey fit
Computer vision features are implemented while building the mobile product surface, alongside SwiftUI views and camera flows. Vision and VisionKit patterns map directly to in-app UI and client-side image/video processing rather than backend or ops work.
Common Questions / FAQ
Is Vision Framework safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Vision Framework
# Vision Framework Detect text, faces, barcodes, objects, and body poses in images and video using on-device computer vision. Patterns target iOS 26+ with Swift 6.3, backward-compatible where noted. See [references/vision-requests.md](references/vision-requests.md) for complete code patterns and [references/visionkit-scanner.md](references/visionkit-scanner.md) for DataScannerViewController integration. ## Contents - [Two API Generations](#two-api-generations) - [Request Pattern (Modern API)](#request-pattern-modern-api) - [Text Recognition (OCR)](#text-recognition-ocr) - [Face Detection](#face-detection) - [Barcode Detection](#barcode-detection) - [Document Scanning (iOS 26+)](#document-scanning-ios-26) - [Image Segmentation](#image-segmentation) - [Object Tracking](#object-tracking) - [Other Request Types](#other-request-types) - [Core ML Integration](#core-ml-integration) - [VisionKit: DataScannerViewController](#visionkit-datascannerviewcontroller) - [Common Mistakes](#common-mistakes) - [Review Checklist](#review-checklist) - [References](#references) ## Two API Generations Vision has two distinct API layers. Prefer the modern API for new code. | Aspect | Modern (iOS 18+) | Legacy | |---|---|---| | Pattern | `let result = try await request.perform(on: image)` | `VNImageRequestHandler` + completion handler | | Request types | Swift types — structs and classes (`RecognizeTextRequest`, `DetectFaceRectanglesRequest`) | ObjC classes (`VNRecognizeTextRequest`, `VNDetectFaceRectanglesRequest`) | | Concurrency | Native async/await | Completion handlers or synchronous `perform` | | Observations | Typed return values | Cast `results` from `[Any]` | | Availability | iOS 18+ / macOS 15+ | iOS 11+ | The modern API uses the `ImageProcessingRequest` protocol. Each request type has a `perform(on:orientation:)` method that accepts `CGImage`, `CIImage`, `CVPixelBuffer`, `CMSampleBuffer`, `Data`, or `URL`. Most requests are structs; stateful requests for video tracking (e.g., `TrackObjectRequest`, `TrackRectangleRequest`, `DetectTrajectoriesRequest`) are final classes. ## Request Pattern (Modern API) All modern Vision requests follow the same pattern: create a request struct, call `perform(on:)`, and handle the typed result. ```swift import Vision func recognizeText(in image: CGImage) async throws -> [String] { var request = RecognizeTextRequest() request.recognitionLevel = .accurate request.recognitionLanguages = [Locale.Language(identifier: "en-US")] let observations = try await request.perform(on: image) return observations.compactMap { observation in observation.topCandidates(1).first?.string } } ``` ### Legacy Pattern (Pre-iOS 18) Use `VNImageRequestHandler` with completion-based requests when targeting older deployment versions. ```swift import Vision func recognizeTextLegacy(in image: CGImage) throws -> [String] { var recognized: [String] = [] let request = VNRecognizeTextRequest { request, error in guard let observations = request.results as? [VNRecognizedTextObservation] else { return } recognized = observations.compactMap { $0.topCandidates(1).first?.string } } request.recognitionLevel = .accurate let handler = VNImageRequestHandler(cgImage: image) try handler.perform([request]) return recognized } ``` ## Text Recognition (OCR) ### Modern: RecognizeTextRequest (iOS 18+) ```swift var request = RecognizeTextRequest() request.