
Site Crawlability
Audit and fix how search and AI crawlers discover, traverse, and index your site structure, links, and budgets.
Overview
site-crawlability is an agent skill most often used in Launch (also Operate iterate, Grow content) that improves technical crawl paths, budgets, and AI bot access for indexable sites.
Install
npx skills add https://github.com/kostja94/marketing-skills --skill site-crawlabilityWhat is this skill?
- Scope spans redirect chains, broken 4xx links, hierarchy depth, orphan pages, and pagination versus infinite scroll
- Covers crawl budget waste reduction on duplicates, redirects, and low-value URLs
- AI crawler guidance for SSR-critical content, URL hygiene, and GPTBot or ClaudeBot reachability
- Initial assessment workflow with optional 1–2 sentence framing on first use
- Points to internal-links skill when the primary gap is link graph design rather than crawl mechanics
- Metadata version 1.2.1
- 7 scoped technical SEO crawlability work areas in the skill outline
Adoption & trust: 776 installs on skills.sh; 586 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
Important pages never get indexed because bots hit redirect chains, orphans, infinite scroll traps, or blocked AI crawlers.
Who is it for?
Solo founders on JS-heavy SaaS, content, or ecommerce sites who see partial indexing, orphan URLs, or AI crawler gaps.
Skip if: Teams that only need keyword copy or on-page meta tweaks without changing site structure, robots, or rendering.
When should I use this skill?
User wants crawlability, crawl budget, orphan pages, internal links structure, infinite scroll or pagination SEO, masonry SEO, AI crawler optimization, GPTBot or ClaudeBot crawlability, or content not indexed.
What do I get? / Deliverables
You leave with prioritized crawlability fixes—structure, links, robots policy, and pagination choices—so critical URLs are reachable and indexation waste drops.
- Prioritized crawlability remediation plan
- Robots, architecture, and pagination recommendations with orphan fixes
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Crawlability work belongs on the Launch shelf because indexing and AI bot access determine whether organic discovery works at all. Technical SEO subphase covers robots rules, architecture depth, orphans, and pagination choices that gate rankings.
Where it fits
Validate robots rules and click depth before the public launch sitemap goes live.
Diagnose sudden non-indexation after a SPA routing or pagination change.
Prevent new blog or catalog templates from creating orphan listing pages.
How it compares
Use for technical crawl and index paths instead of ad-hoc Lighthouse passes that ignore orphan pages and crawl budget.
Common Questions / FAQ
Who is site-crawlability for?
Indie builders and small teams responsible for their own SEO stack who must fix discovery issues without a dedicated technical SEO hire.
When should I use site-crawlability?
At Launch before scaling content, in Operate when index coverage drops after deploys, or in Grow when expanding templates that risk orphans or scroll-only listings.
Is site-crawlability safe to install?
Review the Security Audits panel on this Prism page and treat robots or redirect recommendations as production changes that need staging verification.
Workflow Chain
Then invoke: internal links
SKILL.md
READMESKILL.md - Site Crawlability
# SEO Technical: Crawlability Guides crawlability improvements: robots, X-Robots-Tag, site structure, and internal linking. **When invoking**: On **first use**, if helpful, open with 1–2 sentences on what this skill covers and why it matters, then provide the main output. On **subsequent use** or when the user asks to skip, go directly to the main output. ## Scope (Technical SEO) - **Redirect chains & loops**: Fix multi-hop redirects; point directly to final URL - **Broken links (4xx)**: Fix broken internal/external links; 301 or remove - **Site architecture**: Logical hierarchy; pages within 3–4 clicks from homepage - **Orphan pages**: Add internal links to pages with no incoming links - **Pagination**: Prefer pagination over infinite scroll for crawlability - **Crawl budget**: Reduce waste on duplicates, redirects, low-value URLs (see below) - **AI crawler optimization**: SSR for critical content; URL management; reduce 404/redirect waste (see below) ## Initial Assessment **Check for project context first:** If `.claude/project-context.md` or `.cursor/project-context.md` exists, read it for site structure. Identify: 1. **Site structure**: Flat vs. deep hierarchy 2. **Framework**: Next.js, static, SPA, etc. 3. **Key paths**: Sitemap, robots.txt, API, static assets ## Best Practices ### Redirect Chains & Loops - Fix multi-hop redirects; point directly to final URL - Loops: URLs redirecting back to themselves; break the cycle ### Broken Links (4xx) - Fix broken internal/external links; 301 or remove - Audit regularly; update or remove broken links ### Site Architecture | Principle | Guideline | |-----------|-----------| | **Depth** | Important pages within 3–4 clicks from homepage | | **Orphan pages** | Add internal links to pages with no incoming links; see **internal-links** for link strategy | | **Hierarchy** | Logical structure; hub pages link to content | ### Pagination vs Infinite Scroll **Problem**: With infinite scroll, crawlers cannot emulate user behavior (scroll, click "Load more"); content loaded after initial page load is not discoverable. Same applies to masonry + infinite scroll, lazy-loaded lists, and similar patterns. **Solution**: Prefer pagination for key content. If keeping infinite scroll, make it search-friendly per [Google's recommendations](https://developers.google.com/search/blog/2014/02/infinite-scroll-search-friendly): | Requirement | Practice | |-------------|----------| | **Component pages** | Chunk content into paginated pages accessible without JavaScript | | **Full URLs** | Each page has unique URL (e.g. `?page=1`, `?lastid=567`); avoid `#1` | | **No overlap** | Each item listed once in series; no duplication across pages | | **Direct access** | URL works in new tab; no cookie/history dependency | | **pushState/replaceState** | Update URL as user scrolls; enables back/forward, shareable links | | **404 for out-of-bounds** | `?page=999` returns 404 when only 998 pages exist | **Reference**: [Infinite scroll search-friendly recommendations](https://developers.google.com/search/blog/2014/02/infinite-scroll-search-friendly) (Google Search Central, 2014) ### Pagination (Traditional) - Reference links to next/previous pages; `rel="prev"` / `rel="next"` where applicable - Avoid dynamic-only loading; ensure links in HTML ### Crawl Budget Crawl budget is the number of URLs Googlebot will crawl on your site in a given period. Large sites (10,000+ pages) may waste up to 30% of crawl