
Github Research
Turn deep-research paper outputs into structured GitHub discovery—keywords, repo URLs, and search strategies—before you commit to a stack or agent architecture.
Overview
github-research is an agent skill most often used in Idea (also Validate scope, Build integrations) that turns deep-research artifacts into phased GitHub discovery with keywords, URLs, and search strategies.
Install
npx skills add https://github.com/lingzhi227/agent-research-skills --skill github-researchWhat is this skill?
- Intake phase extracts 5–20 repo URLs and 10–30 keywords from paper_db.jsonl and optional code_repos.md
- Search strategy matrix: broad topic by stars, paper title best-match, method names, and author profiles
- Keyword tiers: title phrases, paper tags, synthesis method names, and prolific author GitHub hunts
- Edge-case handling when code_repos.md or paper_db.jsonl is missing
- Maps which papers mention which repositories for traceable discovery
- 5–20 GitHub URLs expected from intake
- 10–30 search keywords of varying specificity
- 4-row search strategy matrix in discovery phase
Adoption & trust: 725 installs on skills.sh; 114 GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You finished a deep literature review but still lack a disciplined pass that links papers to real repos and ranked search terms.
Who is it for?
Indie builders composing multi-phase agent research workflows who already have or plan deep-research JSONL and synthesis outputs.
Skip if: One-off starred repo lists without paper provenance, or production deploy/runbooks unrelated to research intake.
When should I use this skill?
User has deep-research output (paper_db.jsonl and related phases) and needs phased GitHub discovery, keyword extraction, or repo mapping—not casual star browsing.
What do I get? / Deliverables
You leave intake and discovery with mapped GitHub URLs, a keyword set, and strategy choices you can feed into prototyping or agent tooling decisions.
- Structured keyword lists and search strategy choices
- Paper-to-repository mapping and discovered GitHub URLs
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
The canonical shelf is Idea research because intake explicitly seeds discovery from paper_db.jsonl and synthesis artifacts produced upstream. Research subphase fits the phased intake → discovery → evaluation workflow aimed at mapping the open-source landscape, not shipping code.
Where it fits
After deep-research emits paper_db.jsonl, run intake to harvest repo URLs and tiered keywords before picking a problem to build.
Use the discovery matrix to compare star-sorted landscape repos versus paper-title best-match hits for a niche method.
Narrow prototype scope to two GitHub baselines that papers explicitly reference instead of guessing framework popularity.
Choose fork-or-wrap targets from mapped repos when wiring an agent tool to an existing OSS implementation.
How it compares
Use as a structured research workflow after deep-research, not as a generic GitHub CLI or star-ranking bookmark tool.
Common Questions / FAQ
Who is github-research for?
Solo builders and small teams building research agents who need repeatable GitHub discovery seeded from academic or synthesis outputs.
When should I use github-research?
In Idea research after deep-research runs, again in Validate when scoping which OSS baselines to prototype, and in Build integrations when picking reference implementations.
Is github-research safe to install?
Check the Security Audits panel on this page; the skill implies GitHub and file reads on research directories—review tokens and local data paths before automating.
SKILL.md
READMESKILL.md - Github Research
# GitHub Research — Phase Guide Detailed methodology reference for the github-research skill. ## Phase 1: Intake — Detailed Guide ### Purpose Extract structured information from deep-research output to seed GitHub discovery. ### Input Requirements - Deep-research output directory containing: - `paper_db.jsonl` (required) - `phase4_code/code_repos.md` (optional but valuable) - `phase5_synthesis/synthesis.md` (optional) - `phase6_report/report.md` (optional) ### Keyword Extraction Strategy - **Primary keywords**: From paper titles — extract 2-3 word technical phrases - **Secondary keywords**: From paper tags in paper_db.jsonl - **Tertiary keywords**: Method names, algorithm names, architecture names from synthesis - **Author-based**: Search for prolific authors' GitHub profiles ### Expected Output - 5-20 GitHub URLs directly from papers - 10-30 search keywords of varying specificity - Clear mapping: which papers mention which repos ### Edge Cases - No code_repos.md: rely entirely on paper_db.jsonl keywords - No paper_db.jsonl: ask user for manual topic keywords - Non-English papers: extract English technical terms only --- ## Phase 2: Discovery — Detailed Guide ### Search Strategy Matrix | Strategy | Query Pattern | Sort | When to Use | |----------|--------------|------|-------------| | Broad topic | "multi-agent LLM framework" | stars | Always — establishes landscape | | Paper title | "{exact paper title}" | best-match | For each key paper | | Method name | "{algorithm name} implementation" | stars | For specific techniques | | Author search | "{author name}" + topic | updated | For prolific researchers | | Code pattern | "class {ClassName}" | - | For specific implementations | | Language-specific | topic + language:python | stars | When language matters | | Awesome list | "awesome-{topic}" | stars | To find curated lists | ### Rate Limiting - GitHub search API: 30 requests/minute (unauthenticated), 10 requests/minute (code search) - Papers With Code API: ~60 requests/minute - Always set GITHUB_TOKEN for higher limits (5000 req/hr) ### Deduplication - Primary key: `repo_id` (owner/name, case-insensitive) - When merging duplicates: keep record with more populated fields; merge paper_ids lists ### Target Numbers - Aim for 50-200 unique repos before filtering - Use at least 5 different search queries - Check Papers With Code for all papers with arxiv_ids --- ## Phase 3: Filtering — Detailed Guide ### Scoring Deep Dive **Activity Score** (0-1): - Days since last push: <30d -> 0.9-1.0, 30-90d -> 0.6-0.8, 90-365d -> 0.3-0.5, >365d -> 0.0-0.2 - Frequency weight: pushed_at recency matters most **Quality Score** (0-1): - Stars (log-scaled, 30% weight): log(stars+1) normalized across set - Forks (log-scaled, 20%): log(forks+1) normalized - Has license (15%): any recognized license = 1.0 - Not archived (20%): archived repos get 0 - Has README (15%): non-empty readme_excerpt = 1.0 **Relevance Score** (0-1, manually assigned): - 0.9-1.0: Direct implementation of a paper in the literature review - 0.7-0.89: Closely related technique or framework - 0.5-0.69: Related but tangential (e.g., general ML framework used by papers) - 0.3-0.49: Loosely related (e.g., same domain, different approach) - 0.0-0.29: Unlikely useful **Composite**: relevance x 0.4 + quality x 0.35 + activity x 0.25 ### Selection Criteria - Always include: repos directly linked to papers - Prefer: repos with tests, documentation, active maintenance - Diversity: ensure mix of approaches, not just top-starred - Minimum: 15 repos; Maximum: 30 repos --- ## Phase 4: Deep Dive — Detailed Guide ### What "Deep Dive" Means This is NOT a README scan. You must: 1. Clone the repo (shallow) 2. Read the directory structure 3. Open and read key source files (model definitions, training loops, core algorithms) 4. Trace the execution flow from entry point to core logic 5. Evaluate code quality, documentation, test coverage ### Per-Repo Analysis Template ```mar