
Exploratory Data Analysis
Generate a structured EDA report for a local dataset file before modeling, cleaning, or dashboard work.
Overview
Exploratory Data Analysis is an agent skill most often used in Build (also Validate, Operate) that produces a structured markdown EDA report for a named dataset file.
Install
npx skills add https://github.com/davila7/claude-code-templates --skill exploratory-data-analysisWhat is this skill?
- Markdown report template with Executive Summary, Basic Information, and File Type Details sections
- File-type identification with format description, typical content, and Python library hints
- Data structure block covering overview, dimensions, and column data types
- Quality assessment for completeness, validity, and integrity (missing values, range, corruption)
- Statistical summary placeholders for numerical, categorical, and distribution views
Adoption & trust: 1.4k installs on skills.sh; 27.8k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have a dataset file but no consistent picture of its schema, quality, or whether it is safe to use in models or dashboards.
Who is it for?
Solo builders onboarding messy CSV/Parquet exports, client drops, or internal logs before committing to a feature or model.
Skip if: Teams that already have automated Great Expectations/dbt tests and only need one-line health checks without a narrative report.
When should I use this skill?
User shares or names a dataset file and wants structure, quality, and statistical understanding documented before further analysis.
What do I get? / Deliverables
You get a filled or partially filled EDA report covering metadata, structure, quality assessment, and statistical summaries ready for cleaning, modeling, or stakeholder review.
- Markdown EDA report with executive summary, structure, quality, and stats sections
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Canonical shelf in Build because the skill produces analysis artifacts from raw files during product and data work. Docs fits the deliverable: a written EDA report with structure, quality, and statistical sections—not a deployed pipeline.
Where it fits
Profile a sample export to decide if your MVP metric is measurable from the vendor file.
Document dimensions and dtypes before writing ingestion code for a new upload feature.
Attach an EDA report to a PR so reviewers see coverage and validity risks.
Re-run EDA on a broken nightly dump to compare missing-value spikes to last week.
How it compares
Structured EDA documentation template—not a live notebook kernel or hosted BI connector.
Common Questions / FAQ
Who is exploratory-data-analysis for?
Indie builders and small data-minded developers who need a repeatable written pass over a local file before analytics or ML work.
When should I use exploratory-data-analysis?
During Validate when scoping whether data supports your idea; during Build when integrating a new file into your app or pipeline; during Operate when investigating a production export that looks wrong.
Is exploratory-data-analysis safe to install?
Review the Security Audits panel on this Prism page and only point the skill at files you are allowed to read on your machine.
SKILL.md
READMESKILL.md - Exploratory Data Analysis
# Exploratory Data Analysis Report: {FILENAME} **Generated:** {TIMESTAMP} --- ## Executive Summary This report provides a comprehensive exploratory data analysis of the file `{FILENAME}`. The analysis includes file type identification, format-specific metadata extraction, data quality assessment, and recommendations for downstream analysis. --- ## Basic Information - **Filename:** `{FILENAME}` - **Full Path:** `{FILEPATH}` - **File Size:** {FILE_SIZE_HUMAN} ({FILE_SIZE_BYTES} bytes) - **Last Modified:** {MODIFIED_DATE} - **Extension:** `.{EXTENSION}` - **Format Category:** {CATEGORY} --- ## File Type Details ### Format Description {FORMAT_DESCRIPTION} ### Typical Data Content {TYPICAL_DATA} ### Common Use Cases {USE_CASES} ### Python Libraries for Reading {PYTHON_LIBRARIES} --- ## Data Structure Analysis ### Overview {DATA_STRUCTURE_OVERVIEW} ### Dimensions {DIMENSIONS} ### Data Types {DATA_TYPES} --- ## Quality Assessment ### Completeness - **Missing Values:** {MISSING_VALUES} - **Data Coverage:** {COVERAGE} ### Validity - **Range Check:** {RANGE_CHECK} - **Format Compliance:** {FORMAT_COMPLIANCE} - **Consistency:** {CONSISTENCY} ### Integrity - **Checksum/Validation:** {VALIDATION} - **File Corruption Check:** {CORRUPTION_CHECK} --- ## Statistical Summary ### Numerical Variables {NUMERICAL_STATS} ### Categorical Variables {CATEGORICAL_STATS} ### Distributions {DISTRIBUTIONS} --- ## Data Characteristics ### Temporal Properties (if applicable) - **Time Range:** {TIME_RANGE} - **Sampling Rate:** {SAMPLING_RATE} - **Missing Time Points:** {MISSING_TIMEPOINTS} ### Spatial Properties (if applicable) - **Dimensions:** {SPATIAL_DIMENSIONS} - **Resolution:** {SPATIAL_RESOLUTION} - **Coordinate System:** {COORDINATE_SYSTEM} ### Experimental Metadata (if applicable) - **Instrument:** {INSTRUMENT} - **Method:** {METHOD} - **Sample Info:** {SAMPLE_INFO} --- ## Key Findings 1. **Data Volume:** {DATA_VOLUME_FINDING} 2. **Data Quality:** {DATA_QUALITY_FINDING} 3. **Notable Patterns:** {PATTERNS_FINDING} 4. **Potential Issues:** {ISSUES_FINDING} --- ## Visualizations ### Distribution Plots {DISTRIBUTION_PLOTS} ### Correlation Analysis {CORRELATION_PLOTS} ### Time Series (if applicable) {TIMESERIES_PLOTS} --- ## Recommendations for Further Analysis ### Immediate Actions 1. {RECOMMENDATION_1} 2. {RECOMMENDATION_2} 3. {RECOMMENDATION_3} ### Preprocessing Steps - {PREPROCESSING_1} - {PREPROCESSING_2} - {PREPROCESSING_3} ### Analytical Approaches {ANALYTICAL_APPROACHES} ### Tools and Methods - **Recommended Software:** {RECOMMENDED_SOFTWARE} - **Statistical Methods:** {STATISTICAL_METHODS} - **Visualization Tools:** {VIZ_TOOLS} --- ## Data Processing Workflow ``` {WORKFLOW_DIAGRAM} ``` --- ## Potential Challenges 1. **Challenge:** {CHALLENGE_1} - **Mitigation:** {MITIGATION_1} 2. **Challenge:** {CHALLENGE_2} - **Mitigation:** {MITIGATION_2} --- ## References and Resources ### Format Specification - {FORMAT_SPEC_LINK} ### Python Libraries Documentation - {LIBRARY_DOCS} ### Related Analysis Examples - {EXAMPLE_LINKS} --- ## Appendix ### Complete File Metadata ```json {COMPLETE_METADATA} ``` ### Analysis Parameters ```json {ANALYSIS_PARAMETERS} ``` ### Software Versions - Python: {PYTHON_VERSION} - Key Libraries: {LIBRARY_VERSIONS} --- *This report was automatically generated by the exploratory-data-analysis skill.* *For questions or issues, refer to the skill documentation.* # Bioinformatics and Genomics File Formats Reference This reference covers file formats used in genomics, transcriptomics, sequence analysis, and related bioinformatics applications. ## Sequence Data Formats ### .fasta / .fa / .fna - FASTA Format **Description:** Text-based format for nucleotide or protein sequences **Typical Data:** DNA, RNA, or protein sequences with headers **Use Cases:** Sequence storage, BLAST searches, alignments **Python Libraries:** - `Biopython`: `SeqIO.parse('file.fasta', 'fa