Citation Management

Name: Citation Management
Author: lingzhi227

lingzhi227/agent-research-skills

Automatically find missing citations in LaTeX manuscripts by scanning claims and querying Semantic Scholar for BibTeX candidates.

Overview

citation-management is an agent skill most often used in Build (also Idea research, Grow content) that harvests missing LaTeX citations via Semantic Scholar and candidate BibTeX output.

Install

npx skills add https://github.com/lingzhi227/agent-research-skills --skill citation-management

What is this skill?

Scans LaTeX for under-cited factual sentences using claim heuristics
Calls Semantic Scholar search API and outputs candidate BibTeX entries
Stdlib-only Python harvest script with --dry-run, --max-rounds, and --verbose
CLI: python harvest_citations.py --tex main.tex --bib references.bib --output candidates.bib
Semantic Scholar graph v1 paper search API

Compatible agents: Claude Code, Codex, Cursor, any compatible agent

Adoption & trust: 864 installs on skills.sh; 114 GitHub stars; 0/3 security scanners passed (skills.sh audits).

What problem does it solve?

Your LaTeX draft makes factual claims without \\cite commands and manual literature search is slowing publication.

Who is it for?

Solo academics and indie researchers maintaining main.tex plus references.bib who want automated citation gap detection.

Skip if: Writers not using LaTeX/BibTeX or anyone needing guaranteed peer-review–grade citation verification without human review.

When should I use this skill?

You have a .tex manuscript and references.bib and need to find citations for factual sentences lacking \\cite.

What do I get? / Deliverables

You get a candidates.bib (or dry-run report) of Semantic Scholar–matched references mapped to under-cited sentences in your .tex file.

candidates.bib with proposed BibTeX entries
Dry-run or verbose logs of under-cited claim matches

Recommended Skills

Lark Doclarksuite/cli

lark-doc is an agent skill for Feishu cloud documents, knowledge-base wiki pages, and Docx v2 workflows through the `lar…211k installs·13.7k stars

Lark Wikilarksuite/cli

Operates Lark wiki spaces and nodes via lark-cli, emphasizing URL resolution, bot limitations on departments, and safe s…209k installs·13.7k stars

Opensource Guide Coachxixu-me/skills

Open Source Guide Coach distills GitHub's official Open Source Guides into actionable coaching for starting projects, at…200k installs·61 stars

Readme I18nxixu-me/skills

README i18n skill standardizes multilingual README language selectors—placing a canonical README-I18N block after the ti…200k installs·61 stars

Doc Coauthoringanthropics/skills

Doc Co-Authoring is an agent skill that walks solo builders through collaborative creation of substantial documentation—…54.6k installs·148k stars

Obsidian Markdownkepano/obsidian-skills

obsidian-markdown is an agent skill for solo builders who keep specs, research, and runbooks in Obsidian vaults. It teac…41k installs·34.9k stars

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Academic and technical writing most often happens while documenting research outputs during build, even though citation hygiene also supports validate and launch content. Docs is the primary shelf because the workflow centers on .tex manuscripts, references.bib, and BibTeX candidate output—not app runtime code.

Also useful

IdeaOpportunity & market research

Also useful

GrowContent & marketing

Where it fits

Example use

IdeaOpportunity & market research

Map which claims in an outline need sources before you commit to a full experiment write-up.

Example use

BuildDocs & content

Run harvest_citations.py on main.tex to populate candidates.bib before submitting to arXiv.

Example use

GrowContent & marketing

Refresh citations on a technical blog post exported to LaTeX for SEO-heavy evergreen content.

How it compares

A focused LaTeX citation harvester, not a full Zotero integration or generic web research MCP.

Common Questions / FAQ

Who is citation-management for?

Solo builders and researchers drafting LaTeX papers who want agent-assisted Bibliography expansion from Semantic Scholar.

When should I use citation-management?

During Build → docs while drafting papers; in Idea → research when surveying literature gaps; in Grow → content when refreshing cited long-form posts.

Is citation-management safe to install?

The script uses network calls to Semantic Scholar; review the Security Audits panel on this page and inspect harvest_citations.py before running on confidential drafts.

SKILL.md

READMESKILL.md - Citation Management

#!/usr/bin/env python3
"""Harvest missing citations for a LaTeX paper.

Scans .tex for under-cited claims (sentences with factual assertions but no \\cite),
generates search queries, calls Semantic Scholar API, and outputs candidate BibTeX entries.

Self-contained: uses only stdlib.

Usage:
    python harvest_citations.py --tex main.tex --bib references.bib --output candidates.bib
    python harvest_citations.py --tex main.tex --bib references.bib --max-rounds 10 --dry-run
    python harvest_citations.py --tex main.tex --bib references.bib --output candidates.bib --verbose
"""

import argparse
import json
import os
import re
import sys
import time
import urllib.error
import urllib.parse
import urllib.request

S2_API = "https://api.semanticscholar.org/graph/v1/paper/search"

CLAIM_PATTERNS = [
    r"(?:has been shown|have been shown|was shown|were shown|is known|are known)",
    r"(?:recent(?:ly)?|prior|previous) (?:work|studies?|research|methods?|approaches?)",
    r"(?:state[- ]of[- ]the[- ]art|SOTA|benchmark)",
    r"(?:outperform|surpass|exceed|achieve|obtain|report|demonstrate|propose|introduce)",
    r"(?:widely used|commonly used|popular|well-known|established)",
    r"(?:inspired by|motivated by|based on|building on|following)",
    r"(?:\d+\.?\d*)\s*%",  # Numbers that likely need citation
]

COMMON_WORDS = {
    "a", "an", "the", "of", "in", "on", "at", "to", "for", "and", "or",
    "is", "are", "was", "were", "be", "been", "with", "from", "by", "as",
    "we", "our", "this", "that", "these", "those", "it", "its",
}


def extract_existing_keys(bib_content: str) -> set[str]:
    """Extract all BibTeX keys from .bib file."""
    return set(re.findall(r"@\w+\{([^,]+),", bib_content))


def extract_cited_keys(tex_content: str) -> set[str]:
    """Extract all cited keys from .tex file."""
    keys = set()
    for match in re.findall(r"\\cite[a-z]*\{([^}]+)\}", tex_content):
        for key in match.split(","):
            keys.add(key.strip())
    return keys


def find_uncited_claims(tex_content: str) -> list[dict]:
    """Find sentences with factual claims that lack citations."""
    # Remove comments
    text = re.sub(r"%.*$", "", tex_content, flags=re.MULTILINE)
    # Remove math environments
    text = re.sub(r"\$\$.*?\$\$", "", text, flags=re.DOTALL)
    text = re.sub(r"\$.*?\$", "", text)
    # Remove commands but keep text
    text = re.sub(r"\\(?:begin|end)\{[^}]+\}", "", text)

    sentences = re.split(r"(?<=[.!?])\s+", text)
    claims = []

    for sent in sentences:
        sent = sent.strip()
        if not sent or len(sent) < 30:
            continue
        # Skip if already has a citation
        if re.search(r"\\cite", sent):
            continue
        # Check for claim patterns
        for pattern in CLAIM_PATTERNS:
            if re.search(pattern, sent, re.IGNORECASE):
                # Extract key terms for search query
                words = re.findall(r"[A-Za-z]+", sent)
                content_words = [w for w in words if w.lower() not in COMMON_WORDS and len(w) > 2]
                query = " ".join(content_words[:8])
                claims.append({
                    "sentence": sent[:200],
                    "pattern": pattern,
                    "query": query,
                })
                break

    return claims


def search_semantic_scholar(query: str, limit: int = 3, api_key: str = "") -> list[dict]:
    """Search Semantic Scholar for papers matching the query."""
    params = urllib.parse.urlencode({
        "query": query,
        "limit": limit,
        "fields": "title,authors,year,venue,externalIds,citationCount,abstract",
    })
    url = f"{S2_API}?{params}"
    headers = {"User-Agent": "SkillScript/1.0"}
    if api_key:
        headers["x-api-key"] = api_key

    try:
        req = urllib.request.Request(url, headers=headers)
        with urllib.request.urlopen(req, timeout=15) as resp:
            data = json.loads(resp.read().decode("utf-8"))
        return data.get("da

What is this skill?

Scans LaTeX for under-cited factual sentences using claim heuristics

Calls Semantic Scholar search API and outputs candidate BibTeX entries

Stdlib-only Python harvest script with --dry-run, --max-rounds, and --verbose

CLI: python harvest_citations.py --tex main.tex --bib references.bib --output candidates.bib

Semantic Scholar graph v1 paper search API

Compatible agents: Claude Code, Codex, Cursor, any compatible agent

Adoption & trust: 864 installs on skills.sh; 114 GitHub stars; 0/3 security scanners passed (skills.sh audits).

Journey fit

Spans multiple journey phases - primary shelf plus alternate fits below.

Primary fit

Also useful

IdeaOpportunity & market research

Also useful

GrowContent & marketing

Where it fits

Example use

IdeaOpportunity & market research

Map which claims in an outline need sources before you commit to a full experiment write-up.

Example use

BuildDocs & content

Run harvest_citations.py on main.tex to populate candidates.bib before submitting to arXiv.

Example use

GrowContent & marketing

Refresh citations on a technical blog post exported to LaTeX for SEO-heavy evergreen content.

SKILL.md

READMESKILL.md - Citation Management

#!/usr/bin/env python3
"""Harvest missing citations for a LaTeX paper.

Scans .tex for under-cited claims (sentences with factual assertions but no \\cite),
generates search queries, calls Semantic Scholar API, and outputs candidate BibTeX entries.

Self-contained: uses only stdlib.

Usage:
    python harvest_citations.py --tex main.tex --bib references.bib --output candidates.bib
    python harvest_citations.py --tex main.tex --bib references.bib --max-rounds 10 --dry-run
    python harvest_citations.py --tex main.tex --bib references.bib --output candidates.bib --verbose
"""

import argparse
import json
import os
import re
import sys
import time
import urllib.error
import urllib.parse
import urllib.request

S2_API = "https://api.semanticscholar.org/graph/v1/paper/search"

CLAIM_PATTERNS = [
    r"(?:has been shown|have been shown|was shown|were shown|is known|are known)",
    r"(?:recent(?:ly)?|prior|previous) (?:work|studies?|research|methods?|approaches?)",
    r"(?:state[- ]of[- ]the[- ]art|SOTA|benchmark)",
    r"(?:outperform|surpass|exceed|achieve|obtain|report|demonstrate|propose|introduce)",
    r"(?:widely used|commonly used|popular|well-known|established)",
    r"(?:inspired by|motivated by|based on|building on|following)",
    r"(?:\d+\.?\d*)\s*%",  # Numbers that likely need citation
]

COMMON_WORDS = {
    "a", "an", "the", "of", "in", "on", "at", "to", "for", "and", "or",
    "is", "are", "was", "were", "be", "been", "with", "from", "by", "as",
    "we", "our", "this", "that", "these", "those", "it", "its",
}


def extract_existing_keys(bib_content: str) -> set[str]:
    """Extract all BibTeX keys from .bib file."""
    return set(re.findall(r"@\w+\{([^,]+),", bib_content))


def extract_cited_keys(tex_content: str) -> set[str]:
    """Extract all cited keys from .tex file."""
    keys = set()
    for match in re.findall(r"\\cite[a-z]*\{([^}]+)\}", tex_content):
        for key in match.split(","):
            keys.add(key.strip())
    return keys


def find_uncited_claims(tex_content: str) -> list[dict]:
    """Find sentences with factual claims that lack citations."""
    # Remove comments
    text = re.sub(r"%.*$", "", tex_content, flags=re.MULTILINE)
    # Remove math environments
    text = re.sub(r"\$\$.*?\$\$", "", text, flags=re.DOTALL)
    text = re.sub(r"\$.*?\$", "", text)
    # Remove commands but keep text
    text = re.sub(r"\\(?:begin|end)\{[^}]+\}", "", text)

    sentences = re.split(r"(?<=[.!?])\s+", text)
    claims = []

    for sent in sentences:
        sent = sent.strip()
        if not sent or len(sent) < 30:
            continue
        # Skip if already has a citation
        if re.search(r"\\cite", sent):
            continue
        # Check for claim patterns
        for pattern in CLAIM_PATTERNS:
            if re.search(pattern, sent, re.IGNORECASE):
                # Extract key terms for search query
                words = re.findall(r"[A-Za-z]+", sent)
                content_words = [w for w in words if w.lower() not in COMMON_WORDS and len(w) > 2]
                query = " ".join(content_words[:8])
                claims.append({
                    "sentence": sent[:200],
                    "pattern": pattern,
                    "query": query,
                })
                break

    return claims


def search_semantic_scholar(query: str, limit: int = 3, api_key: str = "") -> list[dict]:
    """Search Semantic Scholar for papers matching the query."""
    params = urllib.parse.urlencode({
        "query": query,
        "limit": limit,
        "fields": "title,authors,year,venue,externalIds,citationCount,abstract",
    })
    url = f"{S2_API}?{params}"
    headers = {"User-Agent": "SkillScript/1.0"}
    if api_key:
        headers["x-api-key"] = api_key

    try:
        req = urllib.request.Request(url, headers=headers)
        with urllib.request.urlopen(req, timeout=15) as resp:
            data = json.loads(resp.read().decode("utf-8"))
        return data.get("da

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is citation-management for?

When should I use citation-management?

Is citation-management safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Where it fits

Who is citation-management for?

When should I use citation-management?

Is citation-management safe to install?

SKILL.md