Scholar Evaluation

Name: Scholar Evaluation
Author: k-dense-ai

k-dense-ai/scientific-agent-skills

959 installs
32k repo stars
Updated July 29, 2026
k-dense-ai/scientific-agent-skills

scholar-evaluation is a rubric-based agent skill that runs structured ScholarEval assessments for developers and researchers who need systematic quality scoring of research ideas, drafts, or agent-generated scholarly out

About

scholar-evaluation implements the ScholarEval evaluation framework from k-dense-ai/scientific-agent-skills for systematic scholarly quality review. The skill supplies dimension-specific rubrics—starting with Problem Formulation and Research Questions—each scored on a 5-point scale where Excellent (5) demands specific measurable questions, significant literature gaps, appropriate scope, clear novelty, and compelling significance justification. Developers reach for scholar-evaluation when vetting research proposals, paper drafts, or LLM-generated academic content before peer review or publication. Agents apply the documented quality indicators and rubric tiers to produce structured, comparable evaluation scores across scholarly dimensions.

5-point rubric for each ScholarEval dimension with explicit quality indicators
Evaluates Problem Formulation, Research Questions, Methodology, and Scholarly Contribution
Provides Excellent (5) through Poor (1) tier definitions for consistent scoring
Enables systematic side-by-side comparison of multiple research proposals or outputs
Delivers severity-bucketed feedback that feeds directly into revision or rejection gates

Scholar Evaluation by the numbers

959 all-time installs (skills.sh)
+44 installs in the week ending Jul 29, 2026 (Skillselion tracking)
Ranked #1,108 of 16,570 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Jul 29, 2026 (Skillselion catalog sync)

npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill scholar-evaluation

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/k-dense-ai/scientific-agent-skills/scholar-evaluation.svg)](https://skillselion.com/skills/k-dense-ai/scientific-agent-skills/scholar-evaluation)

Installs	959
repo stars	★ 32k
Security audit	3 / 3 scanners passed
Last updated	July 29, 2026
Repository	k-dense-ai/scientific-agent-skills ↗

How do you evaluate research quality with rubrics?

Run structured, rubric-based evaluations of research ideas, paper drafts, or agent-generated scholarly outputs.

Who is it for?

Researchers and ML engineers validating agent-generated papers, grant drafts, or research proposals with repeatable ScholarEval rubric scoring.

Skip if: Developers who only need grammar editing, citation formatting, or non-academic marketing copy without research-quality assessment.

When should I use this skill?

The user asks to evaluate a research idea, paper draft, scholarly agent output, or apply ScholarEval rubrics.

What you get

Structured ScholarEval dimension scores, rubric-tier ratings, and documented quality-indicator feedback for scholarly drafts.

Dimension rubric scores
Quality indicator assessments

By the numbers

Uses 5-point rubric scale with Excellent rated at 5
Covers ScholarEval Dimension 1: Problem Formulation and Research Questions

Files

SKILL.mdMarkdownGitHub ↗

Scholar Evaluation

Overview

Apply the ScholarEval framework to systematically evaluate scholarly and research work. This skill provides structured evaluation methodology based on peer-reviewed research assessment criteria, enabling comprehensive analysis of academic papers, research proposals, literature reviews, and scholarly writing across multiple quality dimensions.

When to Use This Skill

Use this skill when:

Evaluating research papers for quality and rigor
Assessing literature review comprehensiveness and quality
Reviewing research methodology design
Scoring data analysis approaches
Evaluating scholarly writing and presentation
Providing structured feedback on academic work
Benchmarking research quality against established criteria
Assessing publication readiness for target venues
Providing quantitative evaluation to complement qualitative peer review

Visual Enhancement with Scientific Schematics

When creating documents with this skill, always consider adding scientific diagrams and schematics to enhance visual communication.

If your document does not already contain schematics or diagrams:

Use the scientific-schematics skill to generate AI-powered publication-quality diagrams
Simply describe your desired diagram in natural language
Nano Banana Pro will automatically generate, review, and refine the schematic

For new documents: Scientific schematics should be generated by default to visually represent key concepts, workflows, architectures, or relationships described in the text.

How to generate schematics:

python scripts/generate_schematic.py "your diagram description" -o figures/output.png

The AI will automatically:

Create publication-quality images with proper formatting
Review and refine through multiple iterations
Ensure accessibility (colorblind-friendly, high contrast)
Save outputs in the figures/ directory

When to add schematics:

Evaluation framework diagrams
Quality assessment criteria decision trees
Scholarly workflow visualizations
Assessment methodology flowcharts
Scoring rubric visualizations
Evaluation process diagrams
Any complex concept that benefits from visualization

For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.

---

Evaluation Workflow

Step 1: Initial Assessment and Scope Definition

Begin by identifying the type of scholarly work being evaluated and the evaluation scope:

Work Types:

Full research paper (empirical, theoretical, or review)
Research proposal or protocol
Literature review (systematic, narrative, or scoping)
Thesis or dissertation chapter
Conference abstract or short paper

Evaluation Scope:

Comprehensive (all dimensions)
Targeted (specific aspects like methodology or writing)
Comparative (benchmarking against other work)

Ask the user to clarify if the scope is ambiguous.

Step 2: Dimension-Based Evaluation

Systematically evaluate the work across the ScholarEval dimensions. For each applicable dimension, assess quality, identify strengths and weaknesses, and provide scores where appropriate.

Refer to references/evaluation_framework.md for detailed criteria and rubrics for each dimension.

Core Evaluation Dimensions:

1. Problem Formulation & Research Questions

Clarity and specificity of research questions
Theoretical or practical significance
Feasibility and scope appropriateness
Novelty and contribution potential

2. Literature Review

Comprehensiveness of coverage
Critical synthesis vs. mere summarization
Identification of research gaps
Currency and relevance of sources
Proper contextualization

3. Methodology & Research Design

Appropriateness for research questions
Rigor and validity
Reproducibility and transparency
Ethical considerations
Limitations acknowledgment

4. Data Collection & Sources

Quality and appropriateness of data
Sample size and representativeness
Data collection procedures
Source credibility and reliability

5. Analysis & Interpretation

Appropriateness of analytical methods
Rigor of analysis
Logical coherence
Alternative explanations considered
Results-claims alignment

6. Results & Findings

Clarity of presentation
Statistical or qualitative rigor
Visualization quality
Interpretation accuracy
Implications discussion

7. Scholarly Writing & Presentation

Clarity and organization
Academic tone and style
Grammar and mechanics
Logical flow
Accessibility to target audience

8. Citations & References

Citation completeness
Source quality and appropriateness
Citation accuracy
Balance of perspectives
Adherence to citation standards

Step 3: Scoring and Rating

For each evaluated dimension, provide:

Qualitative Assessment:

Key strengths (2-3 specific points)
Areas for improvement (2-3 specific points)
Critical issues (if any)

Quantitative Scoring (Optional): Use a 5-point scale where applicable:

5: Excellent - Exemplary quality, publishable in top venues
4: Good - Strong quality with minor improvements needed
3: Adequate - Acceptable quality with notable areas for improvement
2: Needs Improvement - Significant revisions required
1: Poor - Fundamental issues requiring major revision

To calculate aggregate scores programmatically, use scripts/calculate_scores.py.

Step 4: Synthesize Overall Assessment

Provide an integrated evaluation summary:

1. Overall Quality Assessment - Holistic judgment of the work's scholarly merit 2. Major Strengths - 3-5 key strengths across dimensions 3. Critical Weaknesses - 3-5 primary areas requiring attention 4. Priority Recommendations - Ranked list of improvements by impact 5. Publication Readiness (if applicable) - Assessment of suitability for target venues

Step 5: Provide Actionable Feedback

Transform evaluation findings into constructive, actionable feedback:

Feedback Structure:

Specific - Reference exact sections, paragraphs, or page numbers
Actionable - Provide concrete suggestions for improvement
Prioritized - Rank recommendations by importance and feasibility
Balanced - Acknowledge strengths while addressing weaknesses
Evidence-based - Ground feedback in evaluation criteria

Feedback Format Options:

Structured report with dimension-by-dimension analysis
Annotated comments mapped to specific document sections
Executive summary with key findings and recommendations
Comparative analysis against benchmark standards

Step 6: Contextual Considerations

Adjust evaluation approach based on:

Stage of Development:

Early draft: Focus on conceptual and structural issues
Advanced draft: Focus on refinement and polish
Final submission: Comprehensive quality check

Purpose and Venue:

Journal article: High standards for rigor and contribution
Conference paper: Balance novelty with presentation clarity
Student work: Educational feedback with developmental focus
Grant proposal: Emphasis on feasibility and impact

Discipline-Specific Norms:

STEM fields: Emphasis on reproducibility and statistical rigor
Social sciences: Balance quantitative and qualitative standards
Humanities: Focus on argumentation and scholarly interpretation

Resources

references/evaluation_framework.md

Detailed evaluation criteria, rubrics, and quality indicators for each ScholarEval dimension. Load this reference when conducting evaluations to access specific assessment guidelines and scoring rubrics.

Search patterns for quick access:

"Problem Formulation criteria"
"Literature Review rubric"
"Methodology assessment"
"Data quality indicators"
"Analysis rigor standards"
"Writing quality checklist"

scripts/calculate_scores.py

Python script for calculating aggregate evaluation scores from dimension-level ratings. Supports weighted averaging, threshold analysis, and score visualization.

Usage:

python scripts/calculate_scores.py --scores <dimension_scores.json> --output <report.txt>

Best Practices

1. Maintain Objectivity - Base evaluations on established criteria, not personal preferences 2. Be Comprehensive - Evaluate all applicable dimensions systematically 3. Provide Evidence - Support assessments with specific examples from the work 4. Stay Constructive - Frame weaknesses as opportunities for improvement 5. Consider Context - Adjust expectations based on work stage and purpose 6. Document Rationale - Explain the reasoning behind assessments and scores 7. Encourage Strengths - Explicitly acknowledge what the work does well 8. Prioritize Feedback - Focus on high-impact improvements first

Example Evaluation Workflow

User Request: "Evaluate this research paper on machine learning for drug discovery"

Response Process: 1. Identify work type (empirical research paper) and scope (comprehensive evaluation) 2. Load references/evaluation_framework.md for detailed criteria 3. Systematically assess each dimension:

Problem formulation: Clear research question about ML model performance
Literature review: Comprehensive coverage of recent ML and drug discovery work
Methodology: Appropriate deep learning architecture with validation procedures
[Continue through all dimensions...]

4. Calculate dimension scores and overall assessment 5. Synthesize findings into structured report highlighting:

Strong methodology and reproducible code
Needs more diverse dataset evaluation
Writing could improve clarity in results section

6. Provide prioritized recommendations with specific suggestions

Integration with Scientific Writer

This skill integrates seamlessly with the scientific writer workflow:

After Paper Generation:

Use Scholar Evaluation as an alternative or complement to peer review
Generate SCHOLAR_EVALUATION.md alongside PEER_REVIEW.md
Provide quantitative scores to track improvement across revisions

During Revision:

Re-evaluate specific dimensions after addressing feedback
Track score improvements over multiple versions
Identify persistent weaknesses requiring attention

Publication Preparation:

Assess readiness for target journal/conference
Identify gaps before submission
Benchmark against publication standards

Notes

Evaluation rigor should match the work's purpose and stage
Some dimensions may not apply to all work types (e.g., data collection for purely theoretical papers)
Cultural and disciplinary differences in scholarly norms should be considered
This framework complements, not replaces, domain-specific expertise
Use in combination with peer-review skill for comprehensive assessment

Citation

This skill is based on the ScholarEval framework introduced in:

Moussa, H. N., Da Silva, P. Q., Adu-Ampratwum, D., East, A., Lu, Z., Puccetti, N., Xue, M., Sun, H., Majumder, B. P., & Kumar, S. (2025). _ScholarEval: Research Idea Evaluation Grounded in Literature_. arXiv preprint arXiv:2510.16234. https://arxiv.org/abs/2510.16234

Abstract: ScholarEval is a retrieval augmented evaluation framework that assesses research ideas based on two fundamental criteria: soundness (the empirical validity of proposed methods based on existing literature) and contribution (the degree of advancement made by the idea across different dimensions relative to prior research). The framework achieves significantly higher coverage of expert-annotated evaluation points and is consistently preferred over baseline systems in terms of evaluation actionability, depth, and evidence support.

ScholarEval Evaluation Framework

Overview

This document provides detailed evaluation criteria, rubrics, and quality indicators for each dimension of the ScholarEval framework. Use these standards when conducting systematic evaluations of scholarly work.

---

Dimension 1: Problem Formulation & Research Questions

Quality Indicators

Excellent (5):

Research question is specific, measurable, and clearly articulated
Problem addresses significant gap in literature with high impact potential
Scope is appropriate and feasible within constraints
Novel contribution is clearly differentiated from existing work
Theoretical or practical significance is compellingly justified

Good (4):

Research question is clear with minor ambiguities
Problem is relevant with moderate impact potential
Scope is generally appropriate with minor feasibility concerns
Contribution is identifiable though not groundbreaking
Significance is adequately justified

Adequate (3):

Research question is present but lacks specificity
Problem relevance is unclear or incremental
Scope may be too broad or narrow
Contribution is unclear or overlaps heavily with existing work
Significance justification is weak

Needs Improvement (2):

Research question is vague or poorly defined
Problem lacks clear relevance or significance
Scope is inappropriate or infeasible
Contribution is not articulated
No clear justification for significance

Poor (1):

No clear research question
Problem is trivial or irrelevant
Scope is fundamentally flawed
No identifiable contribution
No significance justification

Assessment Checklist

[ ] Is the research question clearly stated?
[ ] Can the question be answered with the proposed approach?
[ ] Is the problem significant to the field?
[ ] Is the scope feasible within resource constraints?
[ ] Is the novelty/contribution clearly articulated?
[ ] Are key assumptions explicitly stated?
[ ] Are success criteria or expected outcomes defined?

---

Dimension 2: Literature Review

Quality Indicators

Excellent (5):

Comprehensive coverage of relevant literature across key areas
Critical synthesis identifying patterns, contradictions, and gaps
Literature is current (majority from last 3-5 years for rapidly evolving fields)
Sources are authoritative and peer-reviewed
Clear positioning of current work within scholarly conversation
Identifies genuine research gaps that the work addresses

Good (4):

Good coverage with minor gaps in key areas
Mostly synthesis with some description
Literature is mostly current with some older foundational works
Sources are generally authoritative
Work positioning is present but could be stronger
Research gaps are identified but may not be critical

Adequate (3):

Partial coverage with notable gaps
More descriptive summarization than synthesis
Literature mix of current and dated sources
Mix of authoritative and less rigorous sources
Weak positioning within existing literature
Research gaps are vague or questionable

Needs Improvement (2):

Minimal coverage with major gaps
Purely descriptive without synthesis
Literature is largely outdated
Sources lack authority or rigor
Little to no positioning of current work
No clear research gaps identified

Poor (1):

Inadequate or absent literature review
No synthesis
Outdated or inappropriate sources
No engagement with scholarly conversation
No gap identification

Assessment Checklist

[ ] Does review cover all major relevant areas?
[ ] Is literature synthesized rather than just summarized?
[ ] Are sources current and authoritative?
[ ] Are contrasting viewpoints presented?
[ ] Are research gaps clearly identified?
[ ] Is the current work positioned within existing literature?
[ ] Is citation balance appropriate (not over-relying on few authors)?
[ ] Are seminal/foundational works included?

Common Issues

Insufficient coverage: Missing key papers or research streams
Descriptive listing: Summarizing papers sequentially without synthesis
Outdated sources: Relying on literature more than 5-10 years old
Cherry-picking: Only citing work that supports hypothesis
Poor organization: Lack of thematic or conceptual structure
Weak gap identification: Gaps are trivial or not actually gaps

---

Dimension 3: Methodology & Research Design

Quality Indicators

Excellent (5):

Research design perfectly aligned with research questions
Methods are rigorous, valid, and reliable
Procedures are detailed enough for replication
Controls, randomization, or triangulation appropriate
Potential biases acknowledged and mitigated
Ethical considerations addressed comprehensively
Limitations are explicitly discussed

Good (4):

Design is appropriate with minor alignment issues
Methods are sound with small validity concerns
Procedures are mostly replicable
Some controls or validation present
Major biases addressed
Ethical considerations mentioned
Some limitations discussed

Adequate (3):

Design partially appropriate for questions
Methods have notable validity concerns
Procedures lack detail for full replication
Limited controls or validation
Bias mitigation is minimal
Ethics addressed superficially
Limitations minimally discussed

Needs Improvement (2):

Design poorly aligned with research questions
Methods have serious validity issues
Procedures too vague to replicate
No controls or validation
Biases not addressed
Ethical concerns not addressed
No limitation discussion

Poor (1):

Inappropriate or absent methodology
Methods fundamentally flawed
Not replicable
No validity considerations
No ethical considerations
No acknowledgment of limitations

Assessment Checklist

[ ] Is methodology appropriate for research questions?
[ ] Are procedures described in sufficient detail?
[ ] Can the study be replicated from the description?
[ ] Are validity and reliability addressed?
[ ] Are potential biases identified and mitigated?
[ ] Are ethical considerations discussed?
[ ] Are limitations acknowledged?
[ ] Is sample size justified (for quantitative work)?
[ ] Are qualitative methods rigorous (if applicable)?

Design-Specific Considerations

Quantitative Studies:

Sample size with power analysis
Control groups and randomization
Measurement validity and reliability
Statistical assumptions checking

Qualitative Studies:

Sampling strategy and saturation
Data collection procedures
Coding and analysis framework
Trustworthiness criteria (credibility, transferability, etc.)

Mixed Methods:

Integration rationale
Sequencing justification
Data convergence strategy

---

Dimension 4: Data Collection & Sources

Quality Indicators

Excellent (5):

Data sources are highly credible and appropriate
Sample size is sufficient and well-justified
Data collection procedures are rigorous and systematic
Data quality controls are in place
Sampling strategy ensures representativeness
Missing data is minimal and handled appropriately

Good (4):

Data sources are credible with minor concerns
Sample size is adequate
Collection procedures are systematic
Some quality controls present
Sampling is reasonable
Missing data is addressed

Adequate (3):

Data sources are acceptable but not optimal
Sample size is marginal
Collection procedures lack some rigor
Limited quality controls
Sampling may have bias concerns
Missing data handling is basic

Needs Improvement (2):

Data sources have credibility issues
Sample size is insufficient
Collection procedures are ad hoc
No quality controls
Sampling is clearly biased
Missing data not addressed

Poor (1):

Data sources are inappropriate or unreliable
Sample size is inadequate
Collection is unsystematic
No quality considerations
Sampling is fundamentally flawed
Excessive missing data

Assessment Checklist

[ ] Are data sources credible and appropriate?
[ ] Is sample size sufficient for conclusions?
[ ] Is sampling strategy clearly described?
[ ] Is the sample representative of target population?
[ ] Are data collection procedures systematic?
[ ] Are data quality controls described?
[ ] Is missing data addressed?
[ ] Are any potential data biases discussed?

---

Dimension 5: Analysis & Interpretation

Quality Indicators

Excellent (5):

Analytical methods perfectly suited to data and questions
Analysis is rigorous with appropriate techniques
Results interpretation is logical and well-supported
Alternative explanations are considered
Claims are proportionate to evidence
Assumptions are validated
Analysis is transparent and reproducible

Good (4):

Methods are appropriate with minor issues
Analysis is sound
Interpretation is mostly logical
Some alternatives considered
Claims generally match evidence
Key assumptions checked
Analysis is mostly transparent

Adequate (3):

Methods are acceptable but not optimal
Analysis has some technical issues
Interpretation has logical gaps
Alternatives not thoroughly explored
Some claims exceed evidence
Assumptions not fully validated
Analysis transparency is limited

Needs Improvement (2):

Methods are questionable for data/questions
Analysis has significant technical flaws
Interpretation is poorly supported
No alternative explanations
Claims significantly exceed evidence
Assumptions not checked
Analysis is not transparent

Poor (1):

Methods are inappropriate
Analysis is fundamentally flawed
Interpretation is illogical
No consideration of alternatives
Claims unsupported by evidence
No assumption validation
Analysis is opaque

Assessment Checklist

[ ] Are analytical methods appropriate?
[ ] Are statistical tests/qualitative methods properly applied?
[ ] Are assumptions tested?
[ ] Is interpretation logical and well-supported?
[ ] Are alternative explanations considered?
[ ] Do claims align with evidence strength?
[ ] Is analysis reproducible from description?
[ ] Are uncertainties acknowledged?

Quantitative Analysis

Appropriate statistical tests
Assumptions checked (normality, homogeneity, etc.)
Effect sizes reported
Confidence intervals provided
Multiple testing corrections (if applicable)
Model diagnostics performed

Qualitative Analysis

Coding framework is clear
Inter-rater reliability (if applicable)
Saturation discussed
Negative cases examined
Member checking or validation
Clear audit trail

---

Dimension 6: Results & Findings

Quality Indicators

Excellent (5):

Results are clearly and comprehensively presented
Visualizations are effective and appropriate
Statistical or qualitative rigor is evident
Key findings are highlighted effectively
Results directly address research questions
Patterns and relationships are clearly shown
Negative and null results are reported

Good (4):

Results are clear with minor presentation issues
Visualizations are generally effective
Rigor is present
Main findings are identifiable
Results mostly address questions
Patterns are shown
Some negative results included

Adequate (3):

Results presentation is adequate but could be clearer
Visualizations are basic or have issues
Rigor is questionable in places
Findings are present but not emphasized
Partial alignment with questions
Patterns are unclear
Negative results may be omitted

Needs Improvement (2):

Results presentation is unclear or confusing
Visualizations are poor or misleading
Lack of rigor
Findings are difficult to identify
Weak alignment with questions
No clear patterns
Only positive results shown

Poor (1):

Results are poorly presented or absent
Visualizations are inappropriate or missing
No evidence of rigor
Findings are unclear
Results don't address questions
No identifiable patterns
Results appear selective

Assessment Checklist

[ ] Are results clearly presented?
[ ] Do results directly address research questions?
[ ] Are visualizations appropriate and effective?
[ ] Are key findings highlighted?
[ ] Are negative/null results reported?
[ ] Is appropriate precision reported (p-values, CIs, effect sizes)?
[ ] Are qualitative findings supported by data excerpts?
[ ] Is there evidence of selective reporting?

Presentation Quality

Tables:

Clear labels and captions
Appropriate precision
Organized logically
Not overly complex

Figures:

Clear axes and legends
Appropriate chart type
Professional appearance
Accessible (color-blind friendly)

Text:

Highlights key findings
Avoids redundancy with tables/figures
Uses appropriate statistical language

---

Dimension 7: Scholarly Writing & Presentation

Quality Indicators

Excellent (5):

Writing is clear, concise, and precise
Organization is logical with excellent flow
Academic tone is appropriate and consistent
Grammar and mechanics are flawless
Technical terms are used correctly
Accessible to target audience
Abstract/summary is comprehensive and accurate

Good (4):

Writing is clear with minor awkwardness
Organization is logical with good flow
Tone is mostly appropriate
Few grammar/mechanical errors
Technical terms mostly correct
Generally accessible
Abstract is adequate

Adequate (3):

Writing is understandable but has clarity issues
Organization has some logical gaps
Tone inconsistencies
Noticeable grammar/mechanical errors
Some technical term misuse
Accessibility issues for target audience
Abstract is incomplete or vague

Needs Improvement (2):

Writing is often unclear or verbose
Poor organization and flow
Tone is inappropriate
Frequent grammar/mechanical errors
Technical terminology problems
Not accessible to target audience
Abstract is poor or missing

Poor (1):

Writing is unclear and difficult to follow
No clear organization
Tone is inappropriate
Pervasive grammar/mechanical errors
Incorrect technical terminology
Inaccessible
No adequate abstract

Assessment Checklist

[ ] Is writing clear and concise?
[ ] Is organization logical?
[ ] Is tone appropriate for academic writing?
[ ] Are grammar and mechanics correct?
[ ] Are technical terms used appropriately?
[ ] Is jargon explained when necessary?
[ ] Does abstract accurately summarize the work?
[ ] Are transitions between sections smooth?
[ ] Is the target audience clear?

Common Writing Issues

Wordiness: Unnecessarily complex or lengthy prose
Passive voice overuse: Reduces clarity and directness
Paragraph structure: Lack of topic sentences or coherence
Redundancy: Repeating information unnecessarily
Logical flow: Poor transitions between ideas
Precision: Vague or ambiguous language
Accessibility: Too technical or not technical enough

---

Dimension 8: Citations & References

Quality Indicators

Excellent (5):

All claims are appropriately cited
Sources are authoritative and current
Citations are accurate and complete
Diverse perspectives are represented
Citation format is consistent and correct
Balance between self-citation and others
Primary sources used appropriately

Good (4):

Most claims are cited
Sources are generally authoritative
Few citation errors
Reasonable diversity of sources
Format is mostly consistent
Citation balance is good
Mix of primary and secondary sources

Adequate (3):

Some claims lack citations
Source quality is mixed
Several citation errors
Limited source diversity
Format inconsistencies
Citation balance issues
Over-reliance on secondary sources

Needs Improvement (2):

Many claims uncited
Sources are questionable
Numerous citation errors
Narrow source base
Format is inconsistent
Excessive self-citation or narrow citing
Inappropriate sources (e.g., only secondary)

Poor (1):

Inadequate citations
Unreliable sources
Pervasive citation errors
Minimal source diversity
No consistent format
Severe citation imbalance
Inappropriate source types

Assessment Checklist

[ ] Are all factual claims cited?
[ ] Are citations to primary sources when appropriate?
[ ] Are sources authoritative and peer-reviewed?
[ ] Is there balance in perspectives cited?
[ ] Are citations accurate (authors, dates, pages)?
[ ] Is citation format consistent?
[ ] Are self-citations appropriate (typically <20%)?
[ ] Are sources current (for time-sensitive topics)?
[ ] Are classic/seminal works included where relevant?

Citation Quality Assessment

Source Types (in order of preference for most academic work): 1. Peer-reviewed journal articles 2. Academic books from reputable publishers 3. Conference proceedings (field-dependent) 4. Technical reports from reputable institutions 5. Dissertations/theses 6. Preprints (with caution, field-dependent) 7. Grey literature (limited use) 8. Websites (rarely appropriate, except for factual data)

Red Flags:

Wikipedia as a primary source
Excessive self-citation (>30%)
Only citing papers that support hypothesis
Outdated sources when current ones exist
Missing key papers in the field
Citing abstracts only when full papers are available
Inconsistent or incorrect citation format

---

Cross-Cutting Considerations

Reproducibility

Assess across dimensions:

Are methods detailed enough to replicate?
Are data and code available (or availability explained)?
Are analysis steps transparent?
Are materials/instruments specified?

Ethics

Consider:

IRB approval (for human subjects)
Informed consent
Privacy and confidentiality
Conflicts of interest
Research integrity
Data sharing ethics

Bias and Limitations

Evaluate whether:

Potential biases are acknowledged
Limitations are discussed honestly
Boundary conditions are specified
Generalizability is appropriately claimed

Impact and Significance

Consider:

Theoretical contribution
Practical implications
Policy relevance
Methodological innovation
Field advancement

---

Scoring Guidelines

Dimension Weighting (Suggested, Adjust by Context)

Problem Formulation: 15%
Literature Review: 15%
Methodology: 20%
Data Collection: 10%
Analysis: 15%
Results: 10%
Writing: 10%
Citations: 5%

Overall Assessment Thresholds

Exceptional (4.5-5.0): Ready for top-tier publication
Strong (4.0-4.4): Publication-ready with minor revisions
Good (3.5-3.9): Major revisions required, promising work
Acceptable (3.0-3.4): Significant revisions needed
Weak (2.0-2.9): Fundamental issues, major rework required
Poor (<2.0): Not suitable for publication without complete revision

Contextual Adjustments

Adjust standards based on:

Stage: Proposal < Draft < Final submission
Venue: Student thesis < Conference < Journal < Top-tier journal
Type: Theoretical < Empirical < Meta-analysis
Field: Standards vary by discipline
Purpose: Educational < Professional < Publication

---

Using This Framework

1. Read the work thoroughly before beginning evaluation 2. Score each dimension using the 5-point scale 3. Document evidence for each score with specific examples 4. Consider context and adjust expectations appropriately 5. Synthesize findings across dimensions 6. Provide actionable feedback prioritized by impact 7. Balance criticism with recognition of strengths

This framework is a guide, not a rigid checklist. Professional judgment should always be applied in context.

#!/usr/bin/env python3
"""
ScholarEval Score Calculator

Calculate aggregate evaluation scores from dimension-level ratings.
Supports weighted averaging, threshold analysis, and score visualization.

Usage:
    python calculate_scores.py --scores <dimension_scores.json> --output <report.txt>
    python calculate_scores.py --scores <dimension_scores.json> --weights <weights.json>
    python calculate_scores.py --interactive

Author: ScholarEval Framework
License: MIT
"""

import json
import argparse
import sys
from typing import Dict, List, Optional
from pathlib import Path


# Default dimension weights (total = 100%)
DEFAULT_WEIGHTS = {
    "problem_formulation": 0.15,
    "literature_review": 0.15,
    "methodology": 0.20,
    "data_collection": 0.10,
    "analysis": 0.15,
    "results": 0.10,
    "writing": 0.10,
    "citations": 0.05
}

# Quality level definitions
QUALITY_LEVELS = {
    (4.5, 5.0): ("Exceptional", "Ready for top-tier publication"),
    (4.0, 4.4): ("Strong", "Publication-ready with minor revisions"),
    (3.5, 3.9): ("Good", "Major revisions required, promising work"),
    (3.0, 3.4): ("Acceptable", "Significant revisions needed"),
    (2.0, 2.9): ("Weak", "Fundamental issues, major rework required"),
    (0.0, 1.9): ("Poor", "Not suitable without complete revision")
}


def load_scores(filepath: Path) -> Dict[str, float]:
    """Load dimension scores from JSON file."""
    try:
        with open(filepath, 'r') as f:
            scores = json.load(f)

        # Validate scores
        for dim, score in scores.items():
            if not 1 <= score <= 5:
                raise ValueError(f"Score for {dim} must be between 1 and 5, got {score}")

        return scores
    except FileNotFoundError:
        print(f"Error: File not found: {filepath}")
        sys.exit(1)
    except json.JSONDecodeError:
        print(f"Error: Invalid JSON in {filepath}")
        sys.exit(1)
    except ValueError as e:
        print(f"Error: {e}")
        sys.exit(1)


def load_weights(filepath: Optional[Path] = None) -> Dict[str, float]:
    """Load dimension weights from JSON file or return defaults."""
    if filepath is None:
        return DEFAULT_WEIGHTS

    try:
        with open(filepath, 'r') as f:
            weights = json.load(f)

        # Validate weights sum to 1.0
        total = sum(weights.values())
        if not 0.99 <= total <= 1.01:  # Allow small floating point errors
            raise ValueError(f"Weights must sum to 1.0, got {total}")

        return weights
    except FileNotFoundError:
        print(f"Error: File not found: {filepath}")
        sys.exit(1)
    except json.JSONDecodeError:
        print(f"Error: Invalid JSON in {filepath}")
        sys.exit(1)
    except ValueError as e:
        print(f"Error: {e}")
        sys.exit(1)


def calculate_weighted_average(scores: Dict[str, float], weights: Dict[str, float]) -> float:
    """Calculate weighted average score."""
    total_score = 0.0
    total_weight = 0.0

    for dimension, score in scores.items():
        # Handle dimension name variations (e.g., "problem_formulation" vs "problem-formulation")
        dim_key = dimension.replace('-', '_').lower()
        weight = weights.get(dim_key, 0.0)

        total_score += score * weight
        total_weight += weight

    # Normalize if not all dimensions were scored
    if total_weight > 0:
        return total_score / total_weight * (sum(weights.values()) / total_weight)
    return 0.0


def get_quality_level(score: float) -> tuple:
    """Get quality level description for a given score."""
    for (low, high), (level, description) in QUALITY_LEVELS.items():
        if low <= score <= high:
            return level, description
    return "Unknown", "Score out of expected range"


def generate_bar_chart(scores: Dict[str, float], max_width: int = 50) -> str:
    """Generate ASCII bar chart of dimension scores."""
    lines = []
    max_name_len = max(len(name) for name in scores.keys())

    for dimension, score in sorted(scores.items(), key=lambda x: x[1], reverse=True):
        bar_length = int((score / 5.0) * max_width)
        bar = '█' * bar_length
        padding = ' ' * (max_name_len - len(dimension))
        lines.append(f"  {dimension}{padding} │ {bar} {score:.2f}")

    return '\n'.join(lines)


def identify_strengths_weaknesses(scores: Dict[str, float]) -> tuple:
    """Identify top strengths and areas for improvement."""
    sorted_scores = sorted(scores.items(), key=lambda x: x[1], reverse=True)

    strengths = [dim for dim, score in sorted_scores[:3] if score >= 4.0]
    weaknesses = [dim for dim, score in sorted_scores[-3:] if score < 3.5]

    return strengths, weaknesses


def generate_report(scores: Dict[str, float], weights: Dict[str, float],
                   output_file: Optional[Path] = None) -> str:
    """Generate comprehensive evaluation report."""
    overall_score = calculate_weighted_average(scores, weights)
    quality_level, quality_desc = get_quality_level(overall_score)
    strengths, weaknesses = identify_strengths_weaknesses(scores)

    report_lines = [
        "="*70,
        "SCHOLAREVAL SCORE REPORT",
        "="*70,
        "",
        f"Overall Score: {overall_score:.2f} / 5.00",
        f"Quality Level: {quality_level}",
        f"Assessment: {quality_desc}",
        "",
        "="*70,
        "DIMENSION SCORES",
        "="*70,
        "",
        generate_bar_chart(scores),
        "",
        "="*70,
        "DETAILED BREAKDOWN",
        "="*70,
        ""
    ]

    # Add detailed scores with weights
    for dimension, score in sorted(scores.items()):
        dim_key = dimension.replace('-', '_').lower()
        weight = weights.get(dim_key, 0.0)
        weighted_contribution = score * weight
        percentage = weight * 100

        report_lines.append(
            f"  {dimension:25s} {score:.2f}/5.00  "
            f"(weight: {percentage:4.1f}%, contribution: {weighted_contribution:.3f})"
        )

    report_lines.extend([
        "",
        "="*70,
        "ASSESSMENT SUMMARY",
        "="*70,
        ""
    ])

    if strengths:
        report_lines.append("Top Strengths:")
        for dim in strengths:
            report_lines.append(f"  • {dim}: {scores[dim]:.2f}/5.00")
        report_lines.append("")

    if weaknesses:
        report_lines.append("Areas for Improvement:")
        for dim in weaknesses:
            report_lines.append(f"  • {dim}: {scores[dim]:.2f}/5.00")
        report_lines.append("")

    # Add recommendations based on score
    report_lines.extend([
        "="*70,
        "RECOMMENDATIONS",
        "="*70,
        ""
    ])

    if overall_score >= 4.5:
        report_lines.append("  Excellent work! Ready for submission to top-tier venues.")
    elif overall_score >= 4.0:
        report_lines.append("  Strong work. Address minor issues identified in weaknesses.")
    elif overall_score >= 3.5:
        report_lines.append("  Good foundation. Focus on major revisions in weak dimensions.")
    elif overall_score >= 3.0:
        report_lines.append("  Significant revisions needed. Prioritize weakest dimensions.")
    elif overall_score >= 2.0:
        report_lines.append("  Major rework required. Consider restructuring approach.")
    else:
        report_lines.append("  Fundamental revision needed across multiple dimensions.")

    report_lines.append("")
    report_lines.append("="*70)

    report = '\n'.join(report_lines)

    # Write to file if specified
    if output_file:
        try:
            with open(output_file, 'w') as f:
                f.write(report)
            print(f"\nReport saved to: {output_file}")
        except IOError as e:
            print(f"Error writing to {output_file}: {e}")

    return report


def interactive_mode():
    """Run interactive score entry mode."""
    print("ScholarEval Interactive Score Calculator")
    print("="*50)
    print("\nEnter scores for each dimension (1-5):")
    print("(Press Enter to skip a dimension)\n")

    scores = {}
    dimensions = [
        "problem_formulation",
        "literature_review",
        "methodology",
        "data_collection",
        "analysis",
        "results",
        "writing",
        "citations"
    ]

    for dim in dimensions:
        while True:
            dim_display = dim.replace('_', ' ').title()
            user_input = input(f"{dim_display}: ").strip()

            if not user_input:
                break

            try:
                score = float(user_input)
                if 1 <= score <= 5:
                    scores[dim] = score
                    break
                else:
                    print("  Score must be between 1 and 5")
            except ValueError:
                print("  Invalid input. Please enter a number between 1 and 5")

    if not scores:
        print("\nNo scores entered. Exiting.")
        return

    print("\n" + "="*50)
    print("SCORES ENTERED:")
    for dim, score in scores.items():
        print(f"  {dim.replace('_', ' ').title()}: {score}")

    print("\nCalculating overall assessment...\n")

    report = generate_report(scores, DEFAULT_WEIGHTS)
    print(report)

    # Ask if user wants to save
    save = input("\nSave report to file? (y/n): ").strip().lower()
    if save == 'y':
        filename = input("Enter filename [scholareval_report.txt]: ").strip()
        if not filename:
            filename = "scholareval_report.txt"
        generate_report(scores, DEFAULT_WEIGHTS, Path(filename))


def main():
    parser = argparse.ArgumentParser(
        description="Calculate aggregate ScholarEval scores from dimension ratings",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  # Calculate from JSON file
  python calculate_scores.py --scores my_scores.json

  # Calculate with custom weights
  python calculate_scores.py --scores my_scores.json --weights custom_weights.json

  # Save report to file
  python calculate_scores.py --scores my_scores.json --output report.txt

  # Interactive mode
  python calculate_scores.py --interactive

Score JSON Format:
  {
    "problem_formulation": 4.5,
    "literature_review": 4.0,
    "methodology": 3.5,
    "data_collection": 4.0,
    "analysis": 3.5,
    "results": 4.0,
    "writing": 4.5,
    "citations": 4.0
  }

Weights JSON Format:
  {
    "problem_formulation": 0.15,
    "literature_review": 0.15,
    "methodology": 0.20,
    "data_collection": 0.10,
    "analysis": 0.15,
    "results": 0.10,
    "writing": 0.10,
    "citations": 0.05
  }
        """
    )

    parser.add_argument('--scores', type=Path, help='Path to JSON file with dimension scores')
    parser.add_argument('--weights', type=Path, help='Path to JSON file with dimension weights (optional)')
    parser.add_argument('--output', type=Path, help='Path to output report file (optional)')
    parser.add_argument('--interactive', '-i', action='store_true', help='Run in interactive mode')

    args = parser.parse_args()

    # Interactive mode
    if args.interactive:
        interactive_mode()
        return

    # File mode
    if not args.scores:
        parser.print_help()
        print("\nError: --scores is required (or use --interactive)")
        sys.exit(1)

    scores = load_scores(args.scores)
    weights = load_weights(args.weights)

    report = generate_report(scores, weights, args.output)

    # Print to stdout if no output file specified
    if not args.output:
        print(report)


if __name__ == '__main__':
    main()

#!/usr/bin/env python3
"""
AI-powered scientific schematic generation using Nano Banana 2.

This script uses a smart iterative refinement approach:
1. Generate initial image with Nano Banana 2
2. AI quality review using Gemini 3.1 Pro Preview for scientific critique
3. Only regenerate if quality is below threshold for document type
4. Repeat until quality meets standards (max iterations)

Requirements:
    - OPENROUTER_API_KEY environment variable
    - requests library

Usage:
    python generate_schematic_ai.py "Create a flowchart showing CONSORT participant flow" -o flowchart.png
    python generate_schematic_ai.py "Neural network architecture diagram" -o architecture.png --iterations 2
    python generate_schematic_ai.py "Simple block diagram" -o diagram.png --doc-type poster
"""

import argparse
import base64
import json
import os
import sys
import time
from pathlib import Path
from typing import Optional, Dict, Any, List, Tuple

try:
    import requests
except ImportError:
    print("Error: requests library not found. Install with: pip install requests")
    sys.exit(1)

# Try to load .env file from multiple potential locations
def _load_env_file():
    """Load .env file from current directory or script directory only."""
    try:
        from dotenv import load_dotenv
    except ImportError:
        return False

    for candidate in [Path.cwd() / ".env", Path(__file__).resolve().parent / ".env"]:
        if candidate.exists():
            load_dotenv(dotenv_path=candidate, override=False)
            return True

    return False


class ScientificSchematicGenerator:
    """Generate scientific schematics using AI with smart iterative refinement.
    
    Uses Gemini 3.1 Pro Preview for quality review to determine if regeneration is needed.
    Multiple passes only occur if the generated schematic doesn't meet the
    quality threshold for the target document type.
    """
    
    # Quality thresholds by document type (score out of 10)
    # Higher thresholds for more formal publications
    QUALITY_THRESHOLDS = {
        "journal": 8.5,      # Nature, Science, etc. - highest standards
        "conference": 8.0,   # Conference papers - high standards
        "poster": 7.0,       # Academic posters - good quality
        "presentation": 6.5, # Slides/talks - clear but less formal
        "report": 7.5,       # Technical reports - professional
        "grant": 8.0,        # Grant proposals - must be compelling
        "thesis": 8.0,       # Dissertations - formal academic
        "preprint": 7.5,     # arXiv, etc. - good quality
        "default": 7.5,      # Default threshold
    }
    
    # Scientific diagram best practices prompt template
    SCIENTIFIC_DIAGRAM_GUIDELINES = """
Create a high-quality scientific diagram with these requirements:

VISUAL QUALITY:
- Clean white or light background (no textures or gradients)
- High contrast for readability and printing
- Professional, publication-ready appearance
- Sharp, clear lines and text
- Adequate spacing between elements to prevent crowding

TYPOGRAPHY:
- Clear, readable sans-serif fonts (Arial, Helvetica style)
- Minimum 10pt font size for all labels
- Consistent font sizes throughout
- All text horizontal or clearly readable
- No overlapping text

SCIENTIFIC STANDARDS:
- Accurate representation of concepts
- Clear labels for all components
- Include scale bars, legends, or axes where appropriate
- Use standard scientific notation and symbols
- Include units where applicable

ACCESSIBILITY:
- Colorblind-friendly color palette (use Okabe-Ito colors if using color)
- High contrast between elements
- Redundant encoding (shapes + colors, not just colors)
- Works well in grayscale

LAYOUT:
- Logical flow (left-to-right or top-to-bottom)
- Clear visual hierarchy
- Balanced composition
- Appropriate use of whitespace
- No clutter or unnecessary decorative elements

IMPORTANT - NO FIGURE NUMBERS:
- Do NOT include "Figure 1:", "Fig. 1", or any figure numbering in the image
- Do NOT add captions or titles like "Figure: ..." at the top or bottom
- Figure numbers and captions are added separately in the document/LaTeX
- The diagram should contain only the visual content itself
"""
    
    def __init__(self, api_key: Optional[str] = None, verbose: bool = False):
        """
        Initialize the generator.
        
        Args:
            api_key: OpenRouter API key (or use OPENROUTER_API_KEY env var)
            verbose: Print detailed progress information
        """
        # Priority: 1) explicit api_key param, 2) environment variable, 3) .env file
        self.api_key = api_key or os.getenv("OPENROUTER_API_KEY")
        
        # If not found in environment, try loading from .env file
        if not self.api_key:
            _load_env_file()
            self.api_key = os.getenv("OPENROUTER_API_KEY")
        
        if not self.api_key:
            raise ValueError(
                "OPENROUTER_API_KEY not found. Please either:\n"
                "  1. Set the OPENROUTER_API_KEY environment variable\n"
                "  2. Add OPENROUTER_API_KEY to your .env file\n"
                "  3. Pass api_key parameter to the constructor\n"
                "Get your API key from: https://openrouter.ai/keys"
            )
        
        self.verbose = verbose
        self._last_error = None  # Track last error for better reporting
        self.base_url = "https://openrouter.ai/api/v1"
        # Nano Banana 2 - Google's advanced image generation model
        # https://openrouter.ai/google/gemini-3-pro-image-preview
        self.image_model = "google/gemini-3.1-flash-image-preview"
        # Gemini 3.1 Pro Preview for quality review - excellent vision and reasoning
        self.review_model = "google/gemini-3.1-pro-preview"
        
    def _log(self, message: str):
        """Log message if verbose mode is enabled."""
        if self.verbose:
            print(f"[{time.strftime('%H:%M:%S')}] {message}")
    
    def _make_request(self, model: str, messages: List[Dict[str, Any]], 
                     modalities: Optional[List[str]] = None) -> Dict[str, Any]:
        """
        Make a request to OpenRouter API.
        
        Args:
            model: Model identifier
            messages: List of message dictionaries
            modalities: Optional list of modalities (e.g., ["image", "text"])
            
        Returns:
            API response as dictionary
        """
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
            "HTTP-Referer": "https://github.com/scientific-writer",
            "X-Title": "Scientific Schematic Generator"
        }
        
        payload = {
            "model": model,
            "messages": messages
        }
        
        if modalities:
            payload["modalities"] = modalities
        
        self._log(f"Making request to {model}...")
        
        try:
            response = requests.post(
                f"{self.base_url}/chat/completions",
                headers=headers,
                json=payload,
                timeout=120
            )
            
            # Try to get response body even on error
            try:
                response_json = response.json()
            except json.JSONDecodeError:
                response_json = {"raw_text": response.text[:500]}
            
            # Check for HTTP errors but include response body in error message
            if response.status_code != 200:
                error_detail = response_json.get("error", response_json)
                self._log(f"HTTP {response.status_code}: {error_detail}")
                raise RuntimeError(f"API request failed (HTTP {response.status_code}): {error_detail}")
            
            return response_json
        except requests.exceptions.Timeout:
            raise RuntimeError("API request timed out after 120 seconds")
        except requests.exceptions.RequestException as e:
            raise RuntimeError(f"API request failed: {str(e)}")
    
    def _extract_image_from_response(self, response: Dict[str, Any]) -> Optional[bytes]:
        """
        Extract base64-encoded image from API response.
        
        For Nano Banana 2, images are returned in the 'images' field of the message,
        not in the 'content' field.
        
        Args:
            response: API response dictionary
            
        Returns:
            Image bytes or None if not found
        """
        try:
            choices = response.get("choices", [])
            if not choices:
                self._log("No choices in response")
                return None
            
            message = choices[0].get("message", {})
            
            # IMPORTANT: Nano Banana 2 returns images in the 'images' field
            images = message.get("images", [])
            if images and len(images) > 0:
                self._log(f"Found {len(images)} image(s) in 'images' field")
                
                # Get first image
                first_image = images[0]
                if isinstance(first_image, dict):
                    # Extract image_url
                    if first_image.get("type") == "image_url":
                        url = first_image.get("image_url", {})
                        if isinstance(url, dict):
                            url = url.get("url", "")
                        
                        if url and url.startswith("data:image"):
                            # Extract base64 data after comma
                            if "," in url:
                                base64_str = url.split(",", 1)[1]
                                # Clean whitespace
                                base64_str = base64_str.replace('\n', '').replace('\r', '').replace(' ', '')
                                self._log(f"Extracted base64 data (length: {len(base64_str)})")
                                return base64.b64decode(base64_str)
            
            # Fallback: check content field (for other models or future changes)
            content = message.get("content", "")
            
            if self.verbose:
                self._log(f"Content type: {type(content)}, length: {len(str(content))}")
            
            # Handle string content
            if isinstance(content, str) and "data:image" in content:
                import re
                match = re.search(r'data:image/[^;]+;base64,([A-Za-z0-9+/=\n\r]+)', content, re.DOTALL)
                if match:
                    base64_str = match.group(1).replace('\n', '').replace('\r', '').replace(' ', '')
                    self._log(f"Found image in content field (length: {len(base64_str)})")
                    return base64.b64decode(base64_str)
            
            # Handle list content
            if isinstance(content, list):
                for i, block in enumerate(content):
                    if isinstance(block, dict) and block.get("type") == "image_url":
                        url = block.get("image_url", {})
                        if isinstance(url, dict):
                            url = url.get("url", "")
                        if url and url.startswith("data:image") and "," in url:
                            base64_str = url.split(",", 1)[1].replace('\n', '').replace('\r', '').replace(' ', '')
                            self._log(f"Found image in content block {i}")
                            return base64.b64decode(base64_str)
            
            self._log("No image data found in response")
            return None
            
        except Exception as e:
            self._log(f"Error extracting image: {str(e)}")
            import traceback
            if self.verbose:
                traceback.print_exc()
            return None
    
    def _image_to_base64(self, image_path: str) -> str:
        """
        Convert image file to base64 data URL.
        
        Args:
            image_path: Path to image file
            
        Returns:
            Base64 data URL string
        """
        with open(image_path, "rb") as f:
            image_data = f.read()
        
        # Determine image type from extension
        ext = Path(image_path).suffix.lower()
        mime_type = {
            ".png": "image/png",
            ".jpg": "image/jpeg",
            ".jpeg": "image/jpeg",
            ".gif": "image/gif",
            ".webp": "image/webp"
        }.get(ext, "image/png")
        
        base64_data = base64.b64encode(image_data).decode("utf-8")
        return f"data:{mime_type};base64,{base64_data}"
    
    def generate_image(self, prompt: str) -> Optional[bytes]:
        """
        Generate an image using Nano Banana 2.
        
        Args:
            prompt: Description of the diagram to generate
            
        Returns:
            Image bytes or None if generation failed
        """
        self._last_error = None  # Reset error
        
        messages = [
            {
                "role": "user",
                "content": prompt
            }
        ]
        
        try:
            response = self._make_request(
                model=self.image_model,
                messages=messages,
                modalities=["image", "text"]
            )
            
            # Debug: print response structure if verbose
            if self.verbose:
                self._log(f"Response keys: {response.keys()}")
                if "error" in response:
                    self._log(f"API Error: {response['error']}")
                if "choices" in response and response["choices"]:
                    msg = response["choices"][0].get("message", {})
                    self._log(f"Message keys: {msg.keys()}")
                    # Show content preview without printing huge base64 data
                    content = msg.get("content", "")
                    if isinstance(content, str):
                        preview = content[:200] + "..." if len(content) > 200 else content
                        self._log(f"Content preview: {preview}")
                    elif isinstance(content, list):
                        self._log(f"Content is list with {len(content)} items")
                        for i, item in enumerate(content[:3]):
                            if isinstance(item, dict):
                                self._log(f"  Item {i}: type={item.get('type')}")
            
            # Check for API errors in response
            if "error" in response:
                error_msg = response["error"]
                if isinstance(error_msg, dict):
                    error_msg = error_msg.get("message", str(error_msg))
                self._last_error = f"API Error: {error_msg}"
                print(f"✗ {self._last_error}")
                return None
            
            image_data = self._extract_image_from_response(response)
            if image_data:
                self._log(f"✓ Generated image ({len(image_data)} bytes)")
            else:
                self._last_error = "No image data in API response - model may not support image generation"
                self._log(f"✗ {self._last_error}")
                # Additional debug info when image extraction fails
                if self.verbose and "choices" in response:
                    msg = response["choices"][0].get("message", {})
                    self._log(f"Full message structure: {json.dumps({k: type(v).__name__ for k, v in msg.items()})}")
            
            return image_data
        except RuntimeError as e:
            self._last_error = str(e)
            self._log(f"✗ Generation failed: {self._last_error}")
            return None
        except Exception as e:
            self._last_error = f"Unexpected error: {str(e)}"
            self._log(f"✗ Generation failed: {self._last_error}")
            import traceback
            if self.verbose:
                traceback.print_exc()
            return None
    
    def review_image(self, image_path: str, original_prompt: str, 
                    iteration: int, doc_type: str = "default",
                    max_iterations: int = 2) -> Tuple[str, float, bool]:
        """
        Review generated image using Gemini 3.1 Pro Preview for quality analysis.
        
        Uses Gemini 3.1 Pro Preview's superior vision and reasoning capabilities to
        evaluate the schematic quality and determine if regeneration is needed.
        
        Args:
            image_path: Path to the generated image
            original_prompt: Original user prompt
            iteration: Current iteration number
            doc_type: Document type (journal, poster, presentation, etc.)
            max_iterations: Maximum iterations allowed
            
        Returns:
            Tuple of (critique text, quality score 0-10, needs_improvement bool)
        """
        # Use Gemini 3.1 Pro Preview for review - excellent vision and analysis
        image_data_url = self._image_to_base64(image_path)
        
        # Get quality threshold for this document type
        threshold = self.QUALITY_THRESHOLDS.get(doc_type.lower(), 
                                                 self.QUALITY_THRESHOLDS["default"])
        
        review_prompt = f"""You are an expert reviewer evaluating a scientific diagram for publication quality.

ORIGINAL REQUEST: {original_prompt}

DOCUMENT TYPE: {doc_type} (quality threshold: {threshold}/10)
ITERATION: {iteration}/{max_iterations}

Carefully evaluate this diagram on these criteria:

1. **Scientific Accuracy** (0-2 points)
   - Correct representation of concepts
   - Proper notation and symbols
   - Accurate relationships shown

2. **Clarity and Readability** (0-2 points)
   - Easy to understand at a glance
   - Clear visual hierarchy
   - No ambiguous elements

3. **Label Quality** (0-2 points)
   - All important elements labeled
   - Labels are readable (appropriate font size)
   - Consistent labeling style

4. **Layout and Composition** (0-2 points)
   - Logical flow (top-to-bottom or left-to-right)
   - Balanced use of space
   - No overlapping elements

5. **Professional Appearance** (0-2 points)
   - Publication-ready quality
   - Clean, crisp lines and shapes
   - Appropriate colors/contrast

RESPOND IN THIS EXACT FORMAT:
SCORE: [total score 0-10]

STRENGTHS:
- [strength 1]
- [strength 2]

ISSUES:
- [issue 1 if any]
- [issue 2 if any]

VERDICT: [ACCEPTABLE or NEEDS_IMPROVEMENT]

If score >= {threshold}, the diagram is ACCEPTABLE for {doc_type} publication.
If score < {threshold}, mark as NEEDS_IMPROVEMENT with specific suggestions."""

        messages = [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": review_prompt
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": image_data_url
                        }
                    }
                ]
            }
        ]
        
        try:
            # Use Gemini 3.1 Pro Preview for high-quality review
            response = self._make_request(
                model=self.review_model,
                messages=messages
            )
            
            # Extract text response
            choices = response.get("choices", [])
            if not choices:
                return "Image generated successfully", 8.0
            
            message = choices[0].get("message", {})
            content = message.get("content", "")
            
            # Check reasoning field (Nano Banana 2 puts analysis here)
            reasoning = message.get("reasoning", "")
            if reasoning and not content:
                content = reasoning
            
            if isinstance(content, list):
                # Extract text from content blocks
                text_parts = []
                for block in content:
                    if isinstance(block, dict) and block.get("type") == "text":
                        text_parts.append(block.get("text", ""))
                content = "\n".join(text_parts)
            
            # Try to extract score
            score = 7.5  # Default score if extraction fails
            import re
            
            # Look for SCORE: X or SCORE: X/10 format
            score_match = re.search(r'SCORE:\s*(\d+(?:\.\d+)?)', content, re.IGNORECASE)
            if score_match:
                score = float(score_match.group(1))
            else:
                # Fallback: look for any score pattern
                score_match = re.search(r'(?:score|rating|quality)[:\s]+(\d+(?:\.\d+)?)\s*(?:/\s*10)?', content, re.IGNORECASE)
                if score_match:
                    score = float(score_match.group(1))
            
            # Determine if improvement is needed based on verdict or score
            needs_improvement = False
            if "NEEDS_IMPROVEMENT" in content.upper():
                needs_improvement = True
            elif score < threshold:
                needs_improvement = True
            
            self._log(f"✓ Review complete (Score: {score}/10, Threshold: {threshold}/10)")
            self._log(f"  Verdict: {'Needs improvement' if needs_improvement else 'Acceptable'}")
            
            return (content if content else "Image generated successfully", 
                    score, 
                    needs_improvement)
        except Exception as e:
            self._log(f"Review skipped: {str(e)}")
            # Don't fail the whole process if review fails - assume acceptable
            return "Image generated successfully (review skipped)", 7.5, False
    
    def improve_prompt(self, original_prompt: str, critique: str, 
                      iteration: int) -> str:
        """
        Improve the generation prompt based on critique.
        
        Args:
            original_prompt: Original user prompt
            critique: Review critique from previous iteration
            iteration: Current iteration number
            
        Returns:
            Improved prompt for next generation
        """
        improved_prompt = f"""{self.SCIENTIFIC_DIAGRAM_GUIDELINES}

USER REQUEST: {original_prompt}

ITERATION {iteration}: Based on previous feedback, address these specific improvements:
{critique}

Generate an improved version that addresses all the critique points while maintaining scientific accuracy and professional quality."""
        
        return improved_prompt
    
    def generate_iterative(self, user_prompt: str, output_path: str,
                          iterations: int = 2, 
                          doc_type: str = "default") -> Dict[str, Any]:
        """
        Generate scientific schematic with smart iterative refinement.
        
        Only regenerates if the quality score is below the threshold for the
        specified document type. This saves API calls and time when the first
        generation is already good enough.
        
        Args:
            user_prompt: User's description of desired diagram
            output_path: Path to save final image
            iterations: Maximum refinement iterations (default: 2, max: 2)
            doc_type: Document type for quality threshold (journal, poster, etc.)
            
        Returns:
            Dictionary with generation results and metadata
        """
        output_path = Path(output_path)
        output_dir = output_path.parent
        output_dir.mkdir(parents=True, exist_ok=True)
        
        base_name = output_path.stem
        extension = output_path.suffix or ".png"
        
        # Get quality threshold for this document type
        threshold = self.QUALITY_THRESHOLDS.get(doc_type.lower(), 
                                                 self.QUALITY_THRESHOLDS["default"])
        
        results = {
            "user_prompt": user_prompt,
            "doc_type": doc_type,
            "quality_threshold": threshold,
            "iterations": [],
            "final_image": None,
            "final_score": 0.0,
            "success": False,
            "early_stop": False,
            "early_stop_reason": None
        }
        
        current_prompt = f"""{self.SCIENTIFIC_DIAGRAM_GUIDELINES}

USER REQUEST: {user_prompt}

Generate a publication-quality scientific diagram that meets all the guidelines above."""
        
        print(f"\n{'='*60}")
        print(f"Generating Scientific Schematic")
        print(f"{'='*60}")
        print(f"Description: {user_prompt}")
        print(f"Document Type: {doc_type}")
        print(f"Quality Threshold: {threshold}/10")
        print(f"Max Iterations: {iterations}")
        print(f"Output: {output_path}")
        print(f"{'='*60}\n")
        
        for i in range(1, iterations + 1):
            print(f"\n[Iteration {i}/{iterations}]")
            print("-" * 40)
            
            # Generate image
            print(f"Generating image...")
            image_data = self.generate_image(current_prompt)
            
            if not image_data:
                error_msg = getattr(self, '_last_error', 'Image generation failed - no image data returned')
                print(f"✗ Generation failed: {error_msg}")
                results["iterations"].append({
                    "iteration": i,
                    "success": False,
                    "error": error_msg
                })
                continue
            
            # Save iteration image
            iter_path = output_dir / f"{base_name}_v{i}{extension}"
            with open(iter_path, "wb") as f:
                f.write(image_data)
            print(f"✓ Saved: {iter_path}")
            
            # Review image using Gemini 3.1 Pro Preview
            print(f"Reviewing image with Gemini 3.1 Pro Preview...")
            critique, score, needs_improvement = self.review_image(
                str(iter_path), user_prompt, i, doc_type, iterations
            )
            print(f"✓ Score: {score}/10 (threshold: {threshold}/10)")
            
            # Save iteration results
            iteration_result = {
                "iteration": i,
                "image_path": str(iter_path),
                "prompt": current_prompt,
                "critique": critique,
                "score": score,
                "needs_improvement": needs_improvement,
                "success": True
            }
            results["iterations"].append(iteration_result)
            
            # Check if quality is acceptable - STOP EARLY if so
            if not needs_improvement:
                print(f"\n✓ Quality meets {doc_type} threshold ({score} >= {threshold})")
                print(f"  No further iterations needed!")
                results["final_image"] = str(iter_path)
                results["final_score"] = score
                results["success"] = True
                results["early_stop"] = True
                results["early_stop_reason"] = f"Quality score {score} meets threshold {threshold} for {doc_type}"
                break
            
            # If this is the last iteration, we're done regardless
            if i == iterations:
                print(f"\n⚠ Maximum iterations reached")
                results["final_image"] = str(iter_path)
                results["final_score"] = score
                results["success"] = True
                break
            
            # Quality below threshold - improve prompt for next iteration
            print(f"\n⚠ Quality below threshold ({score} < {threshold})")
            print(f"Improving prompt based on feedback...")
            current_prompt = self.improve_prompt(user_prompt, critique, i + 1)
        
        # Copy final version to output path
        if results["success"] and results["final_image"]:
            final_iter_path = Path(results["final_image"])
            if final_iter_path != output_path:
                import shutil
                shutil.copy(final_iter_path, output_path)
                print(f"\n✓ Final image: {output_path}")
        
        # Save review log
        log_path = output_dir / f"{base_name}_review_log.json"
        with open(log_path, "w") as f:
            json.dump(results, f, indent=2)
        print(f"✓ Review log: {log_path}")
        
        print(f"\n{'='*60}")
        print(f"Generation Complete!")
        print(f"Final Score: {results['final_score']}/10")
        if results["early_stop"]:
            print(f"Iterations Used: {len([r for r in results['iterations'] if r.get('success')])}/{iterations} (early stop)")
        print(f"{'='*60}\n")
        
        return results


def main():
    """Command-line interface."""
    parser = argparse.ArgumentParser(
        description="Generate scientific schematics using AI with smart iterative refinement",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
Examples:
  # Generate a flowchart for a journal paper
  python generate_schematic_ai.py "CONSORT participant flow diagram" -o flowchart.png --doc-type journal
  
  # Generate neural network architecture for presentation (lower threshold)
  python generate_schematic_ai.py "Transformer encoder-decoder architecture" -o transformer.png --doc-type presentation
  
  # Generate with custom max iterations for poster
  python generate_schematic_ai.py "Biological signaling pathway" -o pathway.png --iterations 2 --doc-type poster
  
  # Verbose output
  python generate_schematic_ai.py "Circuit diagram" -o circuit.png -v

Document Types (quality thresholds):
  journal      8.5/10  - Nature, Science, peer-reviewed journals
  conference   8.0/10  - Conference papers
  thesis       8.0/10  - Dissertations, theses
  grant        8.0/10  - Grant proposals
  preprint     7.5/10  - arXiv, bioRxiv, etc.
  report       7.5/10  - Technical reports
  poster       7.0/10  - Academic posters
  presentation 6.5/10  - Slides, talks
  default      7.5/10  - General purpose

Note: Multiple iterations only occur if quality is BELOW the threshold.
      If the first generation meets the threshold, no extra API calls are made.

Environment:
  OPENROUTER_API_KEY    OpenRouter API key (required)
        """
    )
    
    parser.add_argument("prompt", help="Description of the diagram to generate")
    parser.add_argument("-o", "--output", required=True, 
                       help="Output image path (e.g., diagram.png)")
    parser.add_argument("--iterations", type=int, default=2,
                       help="Maximum refinement iterations (default: 2, max: 2)")
    parser.add_argument("--doc-type", default="default",
                       choices=["journal", "conference", "poster", "presentation", 
                               "report", "grant", "thesis", "preprint", "default"],
                       help="Document type for quality threshold (default: default)")
    parser.add_argument("--api-key", help="OpenRouter API key (or set OPENROUTER_API_KEY)")
    parser.add_argument("-v", "--verbose", action="store_true",
                       help="Verbose output")
    
    args = parser.parse_args()
    
    # Check for API key
    api_key = args.api_key or os.getenv("OPENROUTER_API_KEY")
    if not api_key:
        print("Error: OPENROUTER_API_KEY environment variable not set")
        print("\nSet it with:")
        print("  export OPENROUTER_API_KEY='your_api_key'")
        print("\nOr provide via --api-key flag")
        sys.exit(1)
    
    # Validate iterations - enforce max of 2
    if args.iterations < 1 or args.iterations > 2:
        print("Error: Iterations must be between 1 and 2")
        sys.exit(1)
    
    try:
        generator = ScientificSchematicGenerator(api_key=api_key, verbose=args.verbose)
        results = generator.generate_iterative(
            user_prompt=args.prompt,
            output_path=args.output,
            iterations=args.iterations,
            doc_type=args.doc_type
        )
        
        if results["success"]:
            print(f"\n✓ Success! Image saved to: {args.output}")
            if results.get("early_stop"):
                print(f"  (Completed in {len([r for r in results['iterations'] if r.get('success')])} iteration(s) - quality threshold met)")
            sys.exit(0)
        else:
            print(f"\n✗ Generation failed. Check review log for details.")
            sys.exit(1)
    except Exception as e:
        print(f"\n✗ Error: {str(e)}")
        sys.exit(1)


if __name__ == "__main__":
    main()

#!/usr/bin/env python3
"""
Scientific schematic generation using Nano Banana 2.

Generate any scientific diagram by describing it in natural language.
Nano Banana 2 handles everything automatically with smart iterative refinement.

Smart iteration: Only regenerates if quality is below threshold for your document type.
Quality review: Uses Gemini 3.1 Pro Preview for professional scientific evaluation.

Usage:
    # Generate for journal paper (highest quality threshold)
    python generate_schematic.py "CONSORT flowchart" -o flowchart.png --doc-type journal
    
    # Generate for presentation (lower threshold, faster)
    python generate_schematic.py "Transformer architecture" -o transformer.png --doc-type presentation
    
    # Generate for poster
    python generate_schematic.py "MAPK signaling pathway" -o pathway.png --doc-type poster
"""

import argparse
import os
import subprocess
import sys
from pathlib import Path


def main():
    """Command-line interface."""
    parser = argparse.ArgumentParser(
        description="Generate scientific schematics using AI with smart iterative refinement",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        epilog="""
How it works:
  Simply describe your diagram in natural language
  Nano Banana 2 generates it automatically with:
  - Smart iteration (only regenerates if quality is below threshold)
  - Quality review by Gemini 3.1 Pro Preview
  - Document-type aware quality thresholds
  - Publication-ready output

Document Types (quality thresholds):
  journal      8.5/10  - Nature, Science, peer-reviewed journals
  conference   8.0/10  - Conference papers
  thesis       8.0/10  - Dissertations, theses
  grant        8.0/10  - Grant proposals
  preprint     7.5/10  - arXiv, bioRxiv, etc.
  report       7.5/10  - Technical reports
  poster       7.0/10  - Academic posters
  presentation 6.5/10  - Slides, talks
  default      7.5/10  - General purpose

Examples:
  # Generate for journal paper (strict quality)
  python generate_schematic.py "CONSORT participant flow" -o flowchart.png --doc-type journal
  
  # Generate for poster (moderate quality)
  python generate_schematic.py "Transformer architecture" -o arch.png --doc-type poster
  
  # Generate for slides (faster, lower threshold)
  python generate_schematic.py "System diagram" -o system.png --doc-type presentation
  
  # Custom max iterations
  python generate_schematic.py "Complex pathway" -o pathway.png --iterations 2
  
  # Verbose output
  python generate_schematic.py "Circuit diagram" -o circuit.png -v

Environment Variables:
  OPENROUTER_API_KEY    Required for AI generation
        """
    )
    
    parser.add_argument("prompt", 
                       help="Description of the diagram to generate")
    parser.add_argument("-o", "--output", required=True,
                       help="Output file path")
    parser.add_argument("--doc-type", default="default",
                       choices=["journal", "conference", "poster", "presentation",
                               "report", "grant", "thesis", "preprint", "default"],
                       help="Document type for quality threshold (default: default)")
    parser.add_argument("--iterations", type=int, default=2,
                       help="Maximum refinement iterations (default: 2, max: 2)")
    parser.add_argument("--api-key", 
                       help="OpenRouter API key (or use OPENROUTER_API_KEY env var)")
    parser.add_argument("-v", "--verbose", action="store_true",
                       help="Verbose output")
    
    args = parser.parse_args()
    
    # Check for API key
    api_key = args.api_key or os.getenv("OPENROUTER_API_KEY")
    if not api_key:
        print("Error: OPENROUTER_API_KEY environment variable not set")
        print("\nFor AI generation, you need an OpenRouter API key.")
        print("Get one at: https://openrouter.ai/keys")
        print("\nSet it with:")
        print("  export OPENROUTER_API_KEY='your_api_key'")
        print("\nOr use --api-key flag")
        sys.exit(1)
    
    # Find AI generation script
    script_dir = Path(__file__).parent
    ai_script = script_dir / "generate_schematic_ai.py"
    
    if not ai_script.exists():
        print(f"Error: AI generation script not found: {ai_script}")
        sys.exit(1)
    
    # Build command
    cmd = [sys.executable, str(ai_script), args.prompt, "-o", args.output]
    
    if args.doc_type != "default":
        cmd.extend(["--doc-type", args.doc_type])
    
    # Enforce max 2 iterations
    iterations = min(args.iterations, 2)
    if iterations != 2:
        cmd.extend(["--iterations", str(iterations)])
    
    if args.verbose:
        cmd.append("-v")
    
    # Execute — pass API key via environment to avoid exposure in process listings
    try:
        env = os.environ.copy()
        if api_key:
            env["OPENROUTER_API_KEY"] = api_key
        result = subprocess.run(cmd, check=False, env=env)
        sys.exit(result.returncode)
    except Exception as e:
        print(f"Error executing AI generation: {e}")
        sys.exit(1)


if __name__ == "__main__":
    main()

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

Forks & variants (1)

Scholar Evaluation has 1 known copy in the catalog totaling 115 installs. They canonicalize to this original listing.

k-dense-ai - 115 installs

FAQ

What does scholar-evaluation score?

scholar-evaluation scores scholarly work across ScholarEval dimensions such as Problem Formulation and Research Questions. Each dimension uses rubric tiers up to Excellent (5) with explicit quality indicators for measurability, gap significance, scope, and novelty.

Can scholar-evaluation assess agent-generated papers?

scholar-evaluation supports structured evaluation of research ideas, paper drafts, and agent-generated scholarly outputs. Agents apply the documented rubrics and quality indicators to produce comparable dimension scores before human peer review.

Is Scholar Evaluation safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

AI & Agent Buildingresearchagents

About

Scholar Evaluation by the numbers

Add your badge

How do you evaluate research quality with rubrics?

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Scholar Evaluation

Overview

When to Use This Skill

Visual Enhancement with Scientific Schematics

Evaluation Workflow

Step 1: Initial Assessment and Scope Definition

Step 2: Dimension-Based Evaluation

Step 3: Scoring and Rating

Step 4: Synthesize Overall Assessment

Step 5: Provide Actionable Feedback

Step 6: Contextual Considerations

Resources

references/evaluation_framework.md

scripts/calculate_scores.py

Best Practices

Example Evaluation Workflow

Integration with Scientific Writer

Notes

Citation

ScholarEval Evaluation Framework

Overview

Dimension 1: Problem Formulation & Research Questions

Quality Indicators

Assessment Checklist

Dimension 2: Literature Review

Quality Indicators

Assessment Checklist

Common Issues

Dimension 3: Methodology & Research Design

Quality Indicators

Assessment Checklist

Design-Specific Considerations

Dimension 4: Data Collection & Sources

Quality Indicators

Assessment Checklist

Dimension 5: Analysis & Interpretation

Quality Indicators

Assessment Checklist

Quantitative Analysis

Qualitative Analysis

Dimension 6: Results & Findings

Quality Indicators

Assessment Checklist

Presentation Quality

Dimension 7: Scholarly Writing & Presentation

Quality Indicators

Assessment Checklist

Common Writing Issues

Dimension 8: Citations & References

Quality Indicators

Assessment Checklist

Citation Quality Assessment

Cross-Cutting Considerations

Reproducibility

Ethics

Bias and Limitations

Impact and Significance

Scoring Guidelines

Dimension Weighting (Suggested, Adjust by Context)

Overall Assessment Thresholds

Contextual Adjustments

Using This Framework

Related skills

Forks & variants (1)

FAQ

What does scholar-evaluation score?

Can scholar-evaluation assess agent-generated papers?

Is Scholar Evaluation safe to install?

This week in AI coding