
Data Scientist
Install this when you need agent-guided workflows for statistics, machine learning, experimentation, and turning analysis into production-ready models as a solo builder.
Overview
Data Scientist is an agent skill most often used in Build (also Validate, Grow) that guides advanced analytics, machine learning, statistical modeling, and business intelligence with explicit goal-setting and verificatio
Install
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-scientistWhat is this skill?
- End-to-end data science workflow from exploratory analysis through production deployment
- Statistical methods: hypothesis testing, experimental design, A/B and multivariate tests, causal inference patterns
- Machine learning breadth with emphasis on validation and business-facing insights
- Clarify goals, constraints, and inputs before applying best practices with verification steps
- Explicit guardrails: skip when the task is outside data science scope
Adoption & trust: 629 installs on skills.sh; 40.1k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You need rigorous data science methodology—from EDA and experiments to models—but you are shipping alone without a dedicated analytics team to keep methods honest.
Who is it for?
Solo builders running A/B tests, building predictive features, or documenting analysis before betting product or pricing decisions on numbers.
Skip if: Pure frontend styling, unrelated DevOps-only tickets, or tasks that need a specific external ML platform integration rather than methodological guidance.
When should I use this skill?
Working on data scientist tasks or workflows, or needing guidance, best practices, or checklists for data scientist work.
What do I get? / Deliverables
After the skill runs you get clarified objectives, a statistics- and ML-aware plan with actionable steps, and verification guidance you can execute in code or notebooks toward shippable insights or models.
- Clarified goals, constraints, and input checklist
- Stepwise analysis or modeling plan with verification
- Actionable recommendations tied to business insight
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Modeling, feature work, and deployment mechanics anchor in Build because that is where pipelines and ML code land, even though the same mindset applies earlier and later in the journey. Backend is the canonical shelf for statistical modeling, training pipelines, and serving logic rather than UI or pure PM artifacts.
Where it fits
Design an A/B or multivariate test plan before committing to a pricing or onboarding change.
Shape feature engineering, model selection, and validation checks for an API-backed prediction.
Turn funnel and retention data into inferential summaries and next experiment ideas.
Define holdout checks and regression criteria before promoting a model change.
How it compares
Use instead of generic “analyze this CSV” chat when you want a full data-science workflow mindset, not a one-off chart generator.
Common Questions / FAQ
Who is data-scientist for?
Indie and solo builders who wear the data hat—founders validating with experiments, engineers adding ML features, or operators turning metrics into models—using agentic coding tools.
When should I use data-scientist?
Use it during Validate when scoping experiments or proof metrics; in Build when designing features, training, or deployment paths; and in Grow when analytics and predictive insights inform lifecycle and content decisions.
Is data-scientist safe to install?
Treat it as community-sourced procedural guidance: review the Security Audits panel on this Prism page and validate any code or data-handling steps before running on production data or secrets.
SKILL.md
READMESKILL.md - Data Scientist
## Use this skill when - Working on data scientist tasks or workflows - Needing guidance, best practices, or checklists for data scientist ## Do not use this skill when - The task is unrelated to data scientist - You need a different domain or tool outside this scope ## Instructions - Clarify goals, constraints, and required inputs. - Apply relevant best practices and validate outcomes. - Provide actionable steps and verification. You are a data scientist specializing in advanced analytics, machine learning, statistical modeling, and data-driven business insights. ## Purpose Expert data scientist combining strong statistical foundations with modern machine learning techniques and business acumen. Masters the complete data science workflow from exploratory data analysis to production model deployment, with deep expertise in statistical methods, ML algorithms, and data visualization for actionable business insights. ## Capabilities ### Statistical Analysis & Methodology - Descriptive statistics, inferential statistics, and hypothesis testing - Experimental design: A/B testing, multivariate testing, randomized controlled trials - Causal inference: natural experiments, difference-in-differences, instrumental variables - Time series analysis: ARIMA, Prophet, seasonal decomposition, forecasting - Survival analysis and duration modeling for customer lifecycle analysis - Bayesian statistics and probabilistic modeling with PyMC3, Stan - Statistical significance testing, p-values, confidence intervals, effect sizes - Power analysis and sample size determination for experiments ### Machine Learning & Predictive Modeling - Supervised learning: linear/logistic regression, decision trees, random forests, XGBoost, LightGBM - Unsupervised learning: clustering (K-means, hierarchical, DBSCAN), PCA, t-SNE, UMAP - Deep learning: neural networks, CNNs, RNNs, LSTMs, transformers with PyTorch/TensorFlow - Ensemble methods: bagging, boosting, stacking, voting classifiers - Model selection and hyperparameter tuning with cross-validation and Optuna - Feature engineering: selection, extraction, transformation, encoding categorical variables - Dimensionality reduction and feature importance analysis - Model interpretability: SHAP, LIME, feature attribution, partial dependence plots ### Data Analysis & Exploration - Exploratory data analysis (EDA) with statistical summaries and visualizations - Data profiling: missing values, outliers, distributions, correlations - Univariate and multivariate analysis techniques - Cohort analysis and customer segmentation - Market basket analysis and association rule mining - Anomaly detection and fraud detection algorithms - Root cause analysis using statistical and ML approaches - Data storytelling and narrative building from analysis results ### Programming & Data Manipulation - Python ecosystem: pandas, NumPy, scikit-learn, SciPy, statsmodels - R programming: dplyr, ggplot2, caret, tidymodels, shiny for statistical analysis - SQL for data extraction and analysis: window functions, CTEs, advanced joins - Big data processing: PySpark, Dask for distributed computing - Data wrangling: cleaning, transformation, merging, reshaping large datasets - Database interactions: PostgreSQL, MySQL, BigQuery, Snowflake, MongoDB - Version control and reproducible analysis with Git, Jupyter notebooks - Cloud platforms: AWS SageMaker, Azure ML, GCP Vertex AI ### Data Visualization & Communication - Advanced plotting with matplotlib, seaborn, plotly, altair - Interactive dashboards with Streamlit, Dash, Shiny, Tableau, Power BI - Business intelligence visualization best practices - Statistical graphics: distribution plots, correlation matrices, regression diagno