
Statsmodels
Fit and interpret statsmodels discrete choice models—logit, multinomial, count, and ordinal—for surveys, experiments, and pricing hypotheses during product decisions.
Overview
Statsmodels is an agent skill most often used in Validate (also Grow analytics and Build backend) that documents discrete choice models—binary, multinomial, count, and ordinal—in statsmodels with fit and interpretation p
Install
npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill statsmodelsWhat is this skill?
- Covers binary logit, multinomial, ordinal, and count discrete choice families with MLE assumptions
- Includes Logit fit pattern with sm.add_constant, summary output, odds ratios, and 95% CI
- Documents when to use logistic regression for yes/no and probability estimation
- Shows average marginal effects (AME) via get_margeff for interpretable increments
- Structured reference for i.i.d. error discrete outcomes—not a generic pandas EDA skill
- Documents four discrete outcome families: binary, multinomial, ordinal, and count under MLE i.i.d. error assumptions
Adoption & trust: 592 installs on skills.sh; 27.6k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You have categorical or count outcome data and need correct statsmodels model choice plus odds ratios and marginal effects you can defend in a scope or metrics review.
Who is it for?
Indie SaaS founders and solo data-curious builders validating surveys, A/B buckets, churn, or tier choice with statsmodels in Python.
Skip if: Deep learning classification, time-series forecasting outside discrete choice framing, or teams that only need descriptive charts without inferential models.
When should I use this skill?
You need discrete choice modeling guidance in statsmodels for binary, multinomial, ordinal, or count outcomes including fit, summary, odds ratios, and marginal effects.
What do I get? / Deliverables
You obtain fitted discrete choice models, summary tables, and interpretable odds ratios or AME outputs ready for decision memos or downstream code.
- Fitted discrete choice model results object
- Regression summary with coefficients and standard errors
- Odds ratios and confidence intervals or marginal effects tables
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Validate is the primary shelf because choice models answer whether patterns in binary or categorical outcomes support shipping a scoped bet. Scope fits when you translate raw responses or usage buckets into odds ratios, marginal effects, and confidence intervals that bound what to build next.
Where it fits
Fit a logit on waitlist yes/no against feature interest covariates before locking MVP scope.
Model tier selection as multinomial outcomes to see which price anchors shift choice probabilities.
Estimate churn as a binary logit and report odds ratios for lifecycle campaigns.
Embed a fitted Logit scorer behind an internal admin API for experiment readouts.
How it compares
Use instead of generic ML classifier prompts when you need classical MLE discrete choice models with odds ratios—not sklearn black-box defaults.
Common Questions / FAQ
Who is statsmodels for?
Solo and indie builders who analyze binary, multinomial, ordinal, or count outcomes in Python and want agent-guided statsmodels syntax and interpretation.
When should I use statsmodels?
Use it in Validate when scoping experiments and pricing, in Grow when explaining segment conversion or retention drivers, and in Build when implementing logistic or count models in an API or batch job.
Is statsmodels safe to install?
It is reference and code-generation oriented; review the Security Audits panel on this Prism page and avoid piping untrusted data paths into automated shell execution.
SKILL.md
READMESKILL.md - Statsmodels
# Discrete Choice Models Reference This document provides comprehensive guidance on discrete choice models in statsmodels, including binary, multinomial, count, and ordinal models. ## Overview Discrete choice models handle outcomes that are: - **Binary**: 0/1, success/failure - **Multinomial**: Multiple unordered categories - **Ordinal**: Ordered categories - **Count**: Non-negative integers All models use maximum likelihood estimation and assume i.i.d. errors. ## Binary Models ### Logit (Logistic Regression) Uses logistic distribution for binary outcomes. **When to use:** - Binary classification (yes/no, success/failure) - Probability estimation for binary outcomes - Interpretable odds ratios **Model**: P(Y=1|X) = 1 / (1 + exp(-Xβ)) ```python import statsmodels.api as sm from statsmodels.discrete.discrete_model import Logit # Prepare data X = sm.add_constant(X_data) # Fit model model = Logit(y, X) results = model.fit() print(results.summary()) ``` **Interpretation:** ```python import numpy as np # Odds ratios odds_ratios = np.exp(results.params) print("Odds ratios:", odds_ratios) # For 1-unit increase in X, odds multiply by exp(β) # OR > 1: increases odds of success # OR < 1: decreases odds of success # OR = 1: no effect # Confidence intervals for odds ratios odds_ci = np.exp(results.conf_int()) print("Odds ratio 95% CI:") print(odds_ci) ``` **Marginal effects:** ```python # Average marginal effects (AME) marginal_effects = results.get_margeff(at='mean') print(marginal_effects.summary()) # Marginal effects at means (MEM) marginal_effects_mem = results.get_margeff(at='mean', method='dydx') # Marginal effects at representative values marginal_effects_custom = results.get_margeff(at='mean', atexog={'x1': 1, 'x2': 5}) ``` **Predictions:** ```python # Predicted probabilities probs = results.predict(X) # Binary predictions (0.5 threshold) predictions = (probs > 0.5).astype(int) # Custom threshold threshold = 0.3 predictions_custom = (probs > threshold).astype(int) # For new data X_new = sm.add_constant(X_new_data) new_probs = results.predict(X_new) ``` **Model evaluation:** ```python from sklearn.metrics import (classification_report, confusion_matrix, roc_auc_score, roc_curve) # Classification report print(classification_report(y, predictions)) # Confusion matrix print(confusion_matrix(y, predictions)) # AUC-ROC auc = roc_auc_score(y, probs) print(f"AUC: {auc:.4f}") # Pseudo R-squared print(f"McFadden's Pseudo R²: {results.prsquared:.4f}") ``` ### Probit Uses normal distribution for binary outcomes. **When to use:** - Binary outcomes - Prefer normal distribution assumption - Field convention (econometrics often uses probit) **Model**: P(Y=1|X) = Φ(Xβ), where Φ is standard normal CDF ```python from statsmodels.discrete.discrete_model import Probit model = Probit(y, X) results = model.fit() print(results.summary()) ``` **Comparison with Logit:** - Probit and Logit usually give similar results - Probit: symmetric, based on normal distribution - Logit: slightly heavier tails, easier interpretation (odds ratios) - Coefficients not directly comparable (scale difference) ```python # Marginal effects are comparable logit_me = logit_results.get_margeff().margeff probit_me = probit_results.get_margeff().margeff print("Logit marginal effects:", logit_me) print("Probit marginal effects:", probit_me) ``` ## Multinomial Models ### MNLogit (Multinomial Logit) For unordered categorical outcomes with 3+ categories. **When to use:** - Multiple unordered categories (e.g., transportation mode, brand choice) - No natural ordering among categories - Need probabilities for each category **Model**: P(Y=j|X) = exp(Xβⱼ) / Σₖ exp(Xβₖ) ```python from statsmodels.discrete.discrete_model import MNLogit # y should be integers 0, 1, 2, ... for categories model = MNLogit(y, X) results = model.fit() print(results.summary()) ``` **Interpretatio