
Senior Computer Vision
Ship object-detection or segmentation pipelines with YOLO, Faster R-CNN, or SAM and export to ONNX/TensorRT for production inference.
Overview
Senior Computer Vision is an agent skill for the Build phase that designs, trains, optimizes, and deploys object-detection and image-segmentation systems with mainstream PyTorch-era tooling.
Install
npx skills add https://github.com/alirezarezvani/claude-skills --skill senior-computer-visionWhat is this skill?
- Three documented workflows: object-detection pipeline, model optimization/deployment, and custom dataset preparation
- Architecture coverage: CNNs, Vision Transformers, YOLO/Faster R-CNN/DETR, Mask R-CNN/SAM
- CLI helpers: vision_model_trainer.py (task/arch flags) and inference_optimizer.py (ONNX benchmark path)
- Production path: PyTorch, torchvision, Ultralytics, Detectron2, MMDetection, ONNX, TensorRT
- Architecture selection guide plus reference documentation for picking detectors vs segmenters
- 3 documented workflows: detection pipeline, optimization/deployment, dataset preparation
- Frameworks explicitly listed: PyTorch, torchvision, Ultralytics, Detectron2, MMDetection
Adoption & trust: 843 installs on skills.sh; 17.5k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need a reliable detection or segmentation stack but get lost choosing architectures, preparing data, and getting models out of notebooks into ONNX or TensorRT.
Who is it for?
Indie builders adding vision features (inventory scanning, moderation, robotics perception) who want YOLO-to-ONNX discipline in one guided workflow.
Skip if: Teams that only need a hosted vision API with zero custom training, or projects with no labeled imagery and no plan to collect labels.
When should I use this skill?
Building detection pipelines, training custom models, optimizing inference, or deploying vision systems with CNN/ViT, YOLO, R-CNN, DETR, SAM, ONNX, or TensorRT.
What do I get? / Deliverables
You leave with aligned architecture choices, runnable training and optimization commands, and a deployment-oriented path instead of ad-hoc notebook experiments.
- Training configuration for detection or segmentation architectures
- Optimization benchmark path toward ONNX (and TensorRT-oriented deployment notes)
- Dataset preparation pipeline with augmentation guidance
Recommended Skills
Journey fit
Canonical shelf is Build because the skill centers on training configs, dataset prep, and deployment artifacts—not ideation or post-launch ops alone. Backend fits model training, inference optimization, and serving-oriented exports rather than UI or pure agent prompt tooling.
How it compares
Use this skill package for end-to-end CV engineering—not a generic “write PyTorch code” chat session without deployment and framework selection guardrails.
Common Questions / FAQ
Who is senior-computer-vision for?
Solo and indie developers shipping visual AI in products—SaaS, agents, or APIs—who need detection, segmentation, training, and inference optimization in one place.
When should I use senior-computer-vision?
Use it during Build when you are standing up detection pipelines, training custom YOLO or R-CNN-family models, benchmarking ONNX exports, or preparing datasets before launch.
Is senior-computer-vision safe to install?
Review the Security Audits panel on this Prism page and treat bundled training scripts like any ML tooling that runs shell commands and touches the filesystem before running in production repos.
SKILL.md
READMESKILL.md - Senior Computer Vision
# Senior Computer Vision Engineer Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment. ## Table of Contents - [Quick Start](#quick-start) - [Core Expertise](#core-expertise) - [Tech Stack](#tech-stack) - [Workflow 1: Object Detection Pipeline](#workflow-1-object-detection-pipeline) - [Workflow 2: Model Optimization and Deployment](#workflow-2-model-optimization-and-deployment) - [Workflow 3: Custom Dataset Preparation](#workflow-3-custom-dataset-preparation) - [Architecture Selection Guide](#architecture-selection-guide) - [Reference Documentation](#reference-documentation) ## Quick Start ```bash # Generate training configuration for YOLO or Faster R-CNN python scripts/vision_model_trainer.py models/ --task detection --arch yolov8 # Analyze model for optimization opportunities (quantization, pruning) python scripts/inference_optimizer.py model.pt --target onnx --benchmark # Build dataset pipeline with augmentations python scripts/dataset_pipeline_builder.py images/ --format coco --augment ``` ## Core Expertise This skill provides guidance on: - **Object Detection**: YOLO family (v5-v11), Faster R-CNN, DETR, RT-DETR - **Instance Segmentation**: Mask R-CNN, YOLACT, SOLOv2 - **Semantic Segmentation**: DeepLabV3+, SegFormer, SAM (Segment Anything) - **Image Classification**: ResNet, EfficientNet, Vision Transformers (ViT, DeiT) - **Video Analysis**: Object tracking (ByteTrack, SORT), action recognition - **3D Vision**: Depth estimation, point cloud processing, NeRF - **Production Deployment**: ONNX, TensorRT, OpenVINO, CoreML ## Tech Stack | Category | Technologies | |----------|--------------| | Frameworks | PyTorch, torchvision, timm | | Detection | Ultralytics (YOLO), Detectron2, MMDetection | | Segmentation | segment-anything, mmsegmentation | | Optimization | ONNX, TensorRT, OpenVINO, torch.compile | | Image Processing | OpenCV, Pillow, albumentations | | Annotation | CVAT, Label Studio, Roboflow | | Experiment Tracking | MLflow, Weights & Biases | | Serving | Triton Inference Server, TorchServe | ## Workflow 1: Object Detection Pipeline Use this workflow when building an object detection system from scratch. ### Step 1: Define Detection Requirements Analyze the detection task requirements: ``` Detection Requirements Analysis: - Target objects: [list specific classes to detect] - Real-time requirement: [yes/no, target FPS] - Accuracy priority: [speed vs accuracy trade-off] - Deployment target: [cloud GPU, edge device, mobile] - Dataset size: [number of images, annotations per class] ``` ### Step 2: Select Detection Architecture Choose architecture based on requirements: | Requirement | Recommended Architecture | Why | |-------------|-------------------------|-----| | Real-time (>30 FPS) | YOLOv8/v11, RT-DETR | Single-stage, optimized for speed | | High accuracy | Faster R-CNN, DINO | Two-stage, better localization | | Small objects | YOLO + SAHI, Faster R-CNN + FPN | Multi-scale detection | | Edge deployment | YOLOv8n, MobileNetV3-SSD | Lightweight architectures | | Transformer-based | DETR, DINO, RT-DETR | End-to-end, no NMS required | ### Step 3: Prepare Dataset Convert annotations to required format: ```bash # COCO format (recommended) python scripts/dataset_pipeline_builder.py data/images/ \ --annotations data/labels/ \ --format coco \ --split 0.8 0.1 0.1 \ --output data/coco/ # Verify dataset python -c "from pycocotoo