
Ocr Super Surya
Give your agent a named OCR workflow around the Surya stack when ingesting scans, PDFs, or screenshots into text pipelines.
Overview
OCR Super Surya is an agent skill for the Build phase that supports Surya-oriented OCR so solo builders can extract text from images and documents in agent workflows.
Install
npx skills add https://github.com/aktsmm/agent-skills --skill ocr-super-suryaWhat is this skill?
- Skill slug ocr-super-surya signals Surya-based OCR for agent-driven document workflows
- Suited to turning images and scanned pages into machine-readable text in dev pipelines
- Pairs with content and knowledge-base builds that need local or scripted OCR steps
- Licensed CC BY-NC-SA 4.0 with explicit AI/ML training restriction in upstream readme
Adoption & trust: 506 installs on skills.sh; 17 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have image or scan inputs but no repeatable OCR step your coding agent can invoke while building document features.
Who is it for?
Indie builders adding OCR to personal tools, research notebooks, or internal doc pipelines where Surya is the chosen engine.
Skip if: Teams needing guaranteed commercial licensing, turnkey cloud OCR with SLAs, or skills with full procedural docs already visible in Prism.
When should I use this skill?
Building pipelines that need OCR on images or scans before downstream text processing.
What do I get? / Deliverables
You can run a documented Surya OCR path so extracted text feeds downstream parsing, RAG, or validation in your project.
- Extracted plain or structured text from inputs
Recommended Skills
Journey fit
Build is the primary shelf because OCR is applied while constructing document ingestion, RAG, or automation features. Agent-tooling captures skills that equip the coding agent with specialized document perception capabilities rather than generic UI work.
How it compares
Skill-packaged OCR guidance, not a hosted document API marketplace entry.
Common Questions / FAQ
Who is ocr-super-surya for?
Solo builders and agents automating text extraction from scans and images during product development.
When should I use ocr-super-surya?
In Build when implementing ingestion, CLI tools, or agent actions that must OCR images before search or LLM processing.
Is ocr-super-surya safe to install?
Check the Security Audits panel on this page and read the upstream CC BY-NC-SA license plus AI-training restrictions before relying on it commercially.
SKILL.md
READMESKILL.md - Ocr Super Surya
# Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) ## English Copyright (c) 2025-2026 yamapan (aktsmm) This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You are free to: - **Share** — copy and redistribute the material in any medium or format - **Adapt** — remix, transform, and build upon the material Under the following terms: - **Attribution** — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. - **NonCommercial** — You may not use the material for commercial purposes. *(Please contact the author if you wish to use this material for commercial purposes.)* - **ShareAlike** — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. **AI/ML Training Restriction** — Use of this content for AI/ML training, data mining, or other analytical purposes is prohibited without explicit permission. Full license text: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode --- ## 日本語 Copyright (c) 2025-2026 yamapan (aktsmm) この作品はクリエイティブ・コモンズ 表示-非営利-継承 4.0 国際ライセンスの下に提供されています。 あなたは以下の条件に従う限り、自由に: - **共有** — どのようなメディアやフォーマットでも資料を複製・再配布できます - **翻案** — 資料をリミックス、変形、および加工することができます 以下の条件に従ってください: - **表示** — あなたは適切なクレジットを表示し、ライセンスへのリンクを提供し、 変更があったらその旨を示さなければなりません。これらは合理的であればどのような方法で 行っても構いませんが、許諾者があなたやあなたの利用行為を支持していると示唆するような 方法は除きます。 - **非営利** — あなたは営利目的でこの資料を利用してはなりません。 (※商用利用をご希望の場合は、別途ご連絡ください。) - **継承** — もしあなたがこの資料をリミックス、変形、または加工した場合、 あなたはあなたの貢献部分を元の作品と同じライセンスの下で配布しなければなりません。 追加的な制約は課せません — あなたは、このライセンスが他の者に許諾することを法的に 制限するような法的条項や技術的手段を適用してはなりません。 **AI/MLトレーニング制限** — 本コンテンツをAI/MLモデルのトレーニング、データマイニング、 その他の解析目的での使用は明示的な許可なく禁止されています。 ライセンス全文: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode.ja --- ## Special Permission for Microsoft Employees / Microsoft 社員向け特別許諾 ### English Microsoft Corporation employees are granted permission to use, copy, modify, and distribute this material for any purpose within the scope of their employment duties at Microsoft, including internal business use and customer-facing activities, without the NonCommercial restriction of this license. This special permission applies only to work performed as part of official Microsoft business activities. ### 日本語 Microsoft Corporation の社員は、Microsoft での業務の範疇において、本資料を社内業務 および顧客対応を含むあらゆる目的で使用、複製、改変、配布することが許諾されます。 この場合、本ライセンスの「非営利」制限は適用されません。 この特別許諾は、Microsoft の公式な業務活動の一環として行われる作業にのみ適用されます。 --- ## Disclaimer / 免責事項 ### English THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ### 日本語 本ソフトウェアは「現状のまま」で提供され、明示または黙示を問わず、商品性、 特定目的への適合性、および権利非侵害についての保証を含むがこれに限定されない、 いかなる種類の保証も伴いません。作者または著作権者は、契約行為、不法行為、 またはそれ以外であろうと、ソフトウェアに起因または関連し、あるいはソフトウェアの 使用またはその他の扱いによって生じる一切の請求、損害、その他の責任について 責任を負いません。 #!/usr/bin/env python3 """ OCR Helper - Surya OCR wrapper for common tasks. Usage: from ocr_helper import ocr_image, ocr_pdf # Single image text = ocr_image("screenshot.png") # PDF (all pages) results = ocr_pdf("document.pdf") # With verbose logging text = ocr_image("image.png", verbose=True) """ import os import logging from pathlib import Path from typing import Optional # Configure logging logge