Statsmodels

Name: Statsmodels
Author: k-dense-ai

k-dense-ai/scientific-agent-skills

967 installs
32k repo stars
Updated July 29, 2026
k-dense-ai/scientific-agent-skills

statsmodels is a K-Dense scientific-agent skill that fits and interprets OLS, GLM, discrete choice, and time-series models with statsmodels 0.14.6 diagnostics for developers needing rigorous econometric inference.

About

statsmodels is a K-Dense scientific-agent-skills module (metadata version 1.1, BSD-3-Clause) for Python statistical modeling with statsmodels 0.14.6. It covers OLS and WLS regression, generalized linear models, binary and multinomial logit, Poisson and count models including hurdle specifications, ordinal outcomes, and ARIMA/SARIMAX time series with AIC/BIC model comparison. Examples emphasize sm.add_constant for intercepts, results.summary() tables, Breusch-Pagan heteroskedasticity tests, residual plots, and publication-ready inference. Install via uv pip install statsmodels==0.14.6 on Python 3.9+. Reach for statsmodels when you need explicit model classes with residual diagnostics and coefficient inference—use the separate statistical-analysis skill instead for guided test selection with APA reporting.

Comprehensive reference for binary Logit, Multinomial, Ordinal, and Count models in statsmodels
Includes ready-to-run Python patterns for model fitting, odds ratios, marginal effects, and confidence intervals
Focuses on maximum-likelihood estimation and i.i.d. error assumptions common to all discrete choice models
Delivers interpretable statistical outputs (odds ratios, marginal effects, p-values) for agent review
4 distinct model families covered with usage triggers and interpretation guides

Statsmodels by the numbers

967 all-time installs (skills.sh)
+45 installs in the week ending Jul 29, 2026 (Skillselion tracking)
Ranked #302 of 2,065 Data Science & ML skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Jul 29, 2026 (Skillselion catalog sync)

npx skills add https://github.com/k-dense-ai/scientific-agent-skills --skill statsmodels

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/k-dense-ai/scientific-agent-skills/statsmodels.svg)](https://skillselion.com/skills/k-dense-ai/scientific-agent-skills/statsmodels)

Installs	967
repo stars	★ 32k
Security audit	3 / 3 scanners passed
Last updated	July 29, 2026
Repository	k-dense-ai/scientific-agent-skills ↗

How do you run discrete choice models in statsmodels?

Run, interpret, and explain discrete choice, logistic, multinomial, and count regression models inside agentic data-science workflows.

Who is it for?

Data scientists and econometrics developers who need statsmodels model classes with residuals, inference, and assumption diagnostics in agent workflows.

Skip if: Skip statsmodels when you only need automatic test selection with APA-formatted reporting—use statistical-analysis instead.

When should I use this skill?

User asks to fit OLS, logistic, multinomial, count, ordinal, or ARIMA models with statsmodels diagnostics and coefficient tables.

What you get

Fitted model summaries, coefficient tables, diagnostic test results, and forecast outputs

model summary tables
diagnostic test output
forecast series

By the numbers

Targets statsmodels 0.14.6 released Dec 5, 2025
Skill metadata version 1.1

Files

SKILL.mdMarkdownGitHub ↗

Statsmodels: Statistical Modeling and Econometrics

Overview

Statsmodels is Python's premier library for statistical modeling, providing tools for estimation, inference, and diagnostics across a wide range of statistical methods. Apply this skill for rigorous statistical analysis, from simple linear regression to complex time series models and econometric analyses.

Current Compatibility

Examples target statsmodels 0.14.6, released Dec 5, 2025. For reproducible environments, pin the primary package:

uv pip install statsmodels==0.14.6

Use statsmodels.api and statsmodels.formula.api for stable high-level imports, and direct module imports when examples require newer or specialized classes such as HurdleCountModel.

When to Use This Skill

This skill should be used when:

Fitting regression models (OLS, WLS, GLS, quantile regression)
Performing generalized linear modeling (logistic, Poisson, Gamma, etc.)
Analyzing discrete outcomes (binary, multinomial, count, ordinal)
Conducting time series analysis (ARIMA, SARIMAX, VAR, forecasting)
Running statistical tests and diagnostics
Testing model assumptions (heteroskedasticity, autocorrelation, normality)
Detecting outliers and influential observations
Comparing models (AIC/BIC, likelihood ratio tests)
Estimating causal effects
Producing publication-ready statistical tables and inference

Quick Start Guide

Linear Regression (OLS)

import statsmodels.api as sm
import numpy as np
import pandas as pd

# Prepare data - ALWAYS add constant for intercept
X = sm.add_constant(X_data)

# Fit OLS model
model = sm.OLS(y, X)
results = model.fit()

# View comprehensive results
print(results.summary())

# Key results
print(f"R-squared: {results.rsquared:.4f}")
print(f"Coefficients:\\n{results.params}")
print(f"P-values:\\n{results.pvalues}")

# Predictions with confidence intervals
predictions = results.get_prediction(X_new)
pred_summary = predictions.summary_frame()
print(pred_summary)  # includes mean, CI, prediction intervals

# Diagnostics
from statsmodels.stats.diagnostic import het_breuschpagan
bp_test = het_breuschpagan(results.resid, X)
print(f"Breusch-Pagan p-value: {bp_test[1]:.4f}")

# Visualize residuals
import matplotlib.pyplot as plt
plt.scatter(results.fittedvalues, results.resid)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel('Fitted values')
plt.ylabel('Residuals')
plt.show()

Logistic Regression (Binary Outcomes)

from statsmodels.discrete.discrete_model import Logit

# Add constant
X = sm.add_constant(X_data)

# Fit logit model
model = Logit(y_binary, X)
results = model.fit()

print(results.summary())

# Odds ratios
odds_ratios = np.exp(results.params)
print("Odds ratios:\\n", odds_ratios)

# Predicted probabilities
probs = results.predict(X)

# Binary predictions (0.5 threshold)
predictions = (probs > 0.5).astype(int)

# Model evaluation
from sklearn.metrics import classification_report, roc_auc_score

print(classification_report(y_binary, predictions))
print(f"AUC: {roc_auc_score(y_binary, probs):.4f}")

# Marginal effects
marginal = results.get_margeff()
print(marginal.summary())

Time Series (ARIMA)

from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Check stationarity
from statsmodels.tsa.stattools import adfuller

adf_result = adfuller(y_series)
print(f"ADF p-value: {adf_result[1]:.4f}")

if adf_result[1] > 0.05:
    # Series is non-stationary, difference it
    y_for_acf = y_series.diff().dropna()
    d = 1
else:
    y_for_acf = y_series.dropna()
    d = 0

# Plot ACF/PACF to identify p, q
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
plot_acf(y_for_acf, lags=40, ax=ax1)
plot_pacf(y_for_acf, lags=40, ax=ax2)
plt.show()

# Fit ARIMA(p,d,q)
model = ARIMA(y_series, order=(1, d, 1))
results = model.fit()

print(results.summary())

# Forecast
forecast = results.forecast(steps=10)
forecast_obj = results.get_forecast(steps=10)
forecast_df = forecast_obj.summary_frame()

print(forecast_df)  # includes mean and confidence intervals

# Residual diagnostics
results.plot_diagnostics(figsize=(12, 8))
plt.show()

Generalized Linear Models (GLM)

import statsmodels.api as sm

# Poisson regression for count data
X = sm.add_constant(X_data)
model = sm.GLM(y_counts, X, family=sm.families.Poisson())
results = model.fit()

print(results.summary())

# Rate ratios (for Poisson with log link)
rate_ratios = np.exp(results.params)
print("Rate ratios:\\n", rate_ratios)

# Check overdispersion
overdispersion = results.pearson_chi2 / results.df_resid
print(f"Overdispersion: {overdispersion:.2f}")

if overdispersion > 1.5:
    # Use Negative Binomial instead
    from statsmodels.discrete.discrete_model import NegativeBinomial
    nb_model = NegativeBinomial(y_counts, X)
    nb_results = nb_model.fit()
    print(nb_results.summary())

Core Statistical Modeling Capabilities

1. Linear Regression Models

Comprehensive suite of linear models for continuous outcomes with various error structures.

Available models:

OLS: Standard linear regression with i.i.d. errors
WLS: Weighted least squares for heteroskedastic errors
GLS: Generalized least squares for arbitrary covariance structure
GLSAR: GLS with autoregressive errors for time series
Quantile Regression: Conditional quantiles (robust to outliers)
Mixed Effects: Hierarchical/multilevel models with random effects
Recursive/Rolling: Time-varying parameter estimation

Key features:

Comprehensive diagnostic tests
Robust standard errors (HC, HAC, cluster-robust)
Influence statistics (Cook's distance, leverage, DFFITS)
Hypothesis testing (F-tests, Wald tests)
Model comparison (AIC, BIC, likelihood ratio tests)
Prediction with confidence and prediction intervals

When to use: Continuous outcome variable, want inference on coefficients, need diagnostics

Reference: See references/linear_models.md for detailed guidance on model selection, diagnostics, and best practices.

2. Generalized Linear Models (GLM)

Flexible framework extending linear models to non-normal distributions.

Distribution families:

Binomial: Binary outcomes or proportions (logistic regression)
Poisson: Count data
Negative Binomial: Overdispersed counts
Gamma: Positive continuous, right-skewed data
Inverse Gaussian: Positive continuous with specific variance structure
Gaussian: Equivalent to OLS
Tweedie: Flexible family for semi-continuous data

Link functions:

Logit, Probit, Log, Identity, Inverse, Sqrt, CLogLog, Power
Choose based on interpretation needs and model fit

Key features:

Maximum likelihood estimation via IRLS
Deviance and Pearson residuals
Goodness-of-fit statistics
Pseudo R-squared measures
Robust standard errors

When to use: Non-normal outcomes, need flexible variance and link specifications

Reference: See references/glm.md for family selection, link functions, interpretation, and diagnostics.

3. Discrete Choice Models

Models for categorical and count outcomes.

Binary models:

Logit: Logistic regression (odds ratios)
Probit: Probit regression (normal distribution)

Multinomial models:

MNLogit: Unordered categories (3+ levels)
Conditional Logit: Choice models with alternative-specific variables
Ordered Model: Ordinal outcomes (ordered categories)

Count models:

Poisson: Standard count model
Negative Binomial: Overdispersed counts
Zero-Inflated: Excess zeros (ZIP, ZINB)
Hurdle Models: Two-stage models for zero-heavy data

Key features:

Maximum likelihood estimation
Marginal effects at means or average marginal effects
Model comparison via AIC/BIC
Predicted probabilities and classification
Goodness-of-fit tests

When to use: Binary, categorical, or count outcomes

Reference: See references/discrete_choice.md for model selection, interpretation, and evaluation.

4. Time Series Analysis

Comprehensive time series modeling and forecasting capabilities.

Univariate models:

AutoReg (AR): Autoregressive models
ARIMA: Autoregressive integrated moving average
SARIMAX: Seasonal ARIMA with exogenous variables
Exponential Smoothing: Simple, Holt, Holt-Winters
ETS: Innovations state space models

Multivariate models:

VAR: Vector autoregression
VARMAX: VAR with MA and exogenous variables
Dynamic Factor Models: Extract common factors
VECM: Vector error correction models (cointegration)

Advanced models:

State Space: Kalman filtering, custom specifications
Regime Switching: Markov switching models
ARDL: Autoregressive distributed lag

Key features:

ACF/PACF analysis for model identification
Stationarity tests (ADF, KPSS)
Forecasting with prediction intervals
Residual diagnostics (Ljung-Box, heteroskedasticity)
Granger causality testing
Impulse response functions (IRF)
Forecast error variance decomposition (FEVD)

When to use: Time-ordered data, forecasting, understanding temporal dynamics

Reference: See references/time_series.md for model selection, diagnostics, and forecasting methods.

5. Statistical Tests and Diagnostics

Extensive testing and diagnostic capabilities for model validation.

Residual diagnostics:

Autocorrelation tests (Ljung-Box, Durbin-Watson, Breusch-Godfrey)
Heteroskedasticity tests (Breusch-Pagan, White, ARCH)
Normality tests (Jarque-Bera, Omnibus, Anderson-Darling, Lilliefors)
Specification tests (RESET, Harvey-Collier)

Influence and outliers:

Leverage (hat values)
Cook's distance
DFFITS and DFBETAs
Studentized residuals
Influence plots

Hypothesis testing:

t-tests (one-sample, two-sample, paired)
Proportion tests
Chi-square tests
Non-parametric tests (Mann-Whitney, Wilcoxon, Kruskal-Wallis)
ANOVA (one-way, two-way, repeated measures)

Multiple comparisons:

Tukey's HSD
Bonferroni correction
False Discovery Rate (FDR)

Effect sizes and power:

Cohen's d, eta-squared
Power analysis for t-tests, proportions
Sample size calculations

Robust inference:

Heteroskedasticity-consistent SEs (HC0-HC3)
HAC standard errors (Newey-West)
Cluster-robust standard errors

When to use: Validating assumptions, detecting problems, ensuring robust inference

Reference: See references/stats_diagnostics.md for comprehensive testing and diagnostic procedures.

Formula API (R-style)

Statsmodels supports R-style formulas for intuitive model specification:

import statsmodels.formula.api as smf

# OLS with formula
results = smf.ols('y ~ x1 + x2 + x1:x2', data=df).fit()

# Categorical variables (automatic dummy coding)
results = smf.ols('y ~ x1 + C(category)', data=df).fit()

# Interactions
results = smf.ols('y ~ x1 * x2', data=df).fit()  # x1 + x2 + x1:x2

# Polynomial terms
results = smf.ols('y ~ x + I(x**2)', data=df).fit()

# Logit
results = smf.logit('y ~ x1 + x2 + C(group)', data=df).fit()

# Poisson
results = smf.poisson('count ~ x1 + x2', data=df).fit()

# ARIMA (not available via formula, use regular API)

Model Selection and Comparison

Information Criteria

# Compare models using AIC/BIC
models = {
    'Model 1': model1_results,
    'Model 2': model2_results,
    'Model 3': model3_results
}

comparison = pd.DataFrame({
    'AIC': {name: res.aic for name, res in models.items()},
    'BIC': {name: res.bic for name, res in models.items()},
    'Log-Likelihood': {name: res.llf for name, res in models.items()}
})

print(comparison.sort_values('AIC'))
# Lower AIC/BIC indicates better model

Likelihood Ratio Test (Nested Models)

# For nested models (one is subset of the other)
from scipy import stats

lr_stat = 2 * (full_model.llf - reduced_model.llf)
df = full_model.df_model - reduced_model.df_model
p_value = 1 - stats.chi2.cdf(lr_stat, df)

print(f"LR statistic: {lr_stat:.4f}")
print(f"p-value: {p_value:.4f}")

if p_value < 0.05:
    print("Full model significantly better")
else:
    print("Reduced model preferred (parsimony)")

Cross-Validation

from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error

kf = KFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = []

for train_idx, val_idx in kf.split(X):
    X_train, X_val = X.iloc[train_idx], X.iloc[val_idx]
    y_train, y_val = y.iloc[train_idx], y.iloc[val_idx]

    # Fit model
    model = sm.OLS(y_train, X_train).fit()

    # Predict
    y_pred = model.predict(X_val)

    # Score
    rmse = np.sqrt(mean_squared_error(y_val, y_pred))
    cv_scores.append(rmse)

print(f"CV RMSE: {np.mean(cv_scores):.4f} ± {np.std(cv_scores):.4f}")

Best Practices

Data Preparation

1. Always add constant: Use sm.add_constant() unless excluding intercept 2. Check for missing values: Handle or impute before fitting 3. Scale if needed: Improves convergence, interpretation (but not required for tree models) 4. Encode categoricals: Use formula API or manual dummy coding

Model Building

1. Start simple: Begin with basic model, add complexity as needed 2. Check assumptions: Test residuals, heteroskedasticity, autocorrelation 3. Use appropriate model: Match model to outcome type (binary→Logit, count→Poisson) 4. Consider alternatives: If assumptions violated, use robust methods or different model

Inference

1. Report effect sizes: Not just p-values 2. Use robust SEs: When heteroskedasticity or clustering present 3. Multiple comparisons: Correct when testing many hypotheses 4. Confidence intervals: Always report alongside point estimates

Model Evaluation

1. Check residuals: Plot residuals vs fitted, Q-Q plot 2. Influence diagnostics: Identify and investigate influential observations 3. Out-of-sample validation: Test on holdout set or cross-validate 4. Compare models: Use AIC/BIC for non-nested, LR test for nested

Reporting

1. Comprehensive summary: Use .summary() for detailed output 2. Document decisions: Note transformations, excluded observations 3. Interpret carefully: Account for link functions (e.g., exp(β) for log link) 4. Visualize: Plot predictions, confidence intervals, diagnostics

Common Workflows

Workflow 1: Linear Regression Analysis

1. Explore data (plots, descriptives) 2. Fit initial OLS model 3. Check residual diagnostics 4. Test for heteroskedasticity, autocorrelation 5. Check for multicollinearity (VIF) 6. Identify influential observations 7. Refit with robust SEs if needed 8. Interpret coefficients and inference 9. Validate on holdout or via CV

Workflow 2: Binary Classification

1. Fit logistic regression (Logit) 2. Check for convergence issues 3. Interpret odds ratios 4. Calculate marginal effects 5. Evaluate classification performance (AUC, confusion matrix) 6. Check for influential observations 7. Compare with alternative models (Probit) 8. Validate predictions on test set

Workflow 3: Count Data Analysis

1. Fit Poisson regression 2. Check for overdispersion 3. If overdispersed, fit Negative Binomial 4. Check for excess zeros (consider ZIP/ZINB) 5. Interpret rate ratios 6. Assess goodness of fit 7. Compare models via AIC 8. Validate predictions

Workflow 4: Time Series Forecasting

1. Plot series, check for trend/seasonality 2. Test for stationarity (ADF, KPSS) 3. Difference if non-stationary 4. Identify p, q from ACF/PACF 5. Fit ARIMA or SARIMAX 6. Check residual diagnostics (Ljung-Box) 7. Generate forecasts with confidence intervals 8. Evaluate forecast accuracy on test set

Reference Documentation

This skill includes comprehensive reference files for detailed guidance:

references/linear_models.md

Detailed coverage of linear regression models including:

OLS, WLS, GLS, GLSAR, Quantile Regression
Mixed effects models
Recursive and rolling regression
Comprehensive diagnostics (heteroskedasticity, autocorrelation, multicollinearity)
Influence statistics and outlier detection
Robust standard errors (HC, HAC, cluster)
Hypothesis testing and model comparison

references/glm.md

Complete guide to generalized linear models:

All distribution families (Binomial, Poisson, Gamma, etc.)
Link functions and when to use each
Model fitting and interpretation
Pseudo R-squared and goodness of fit
Diagnostics and residual analysis
Applications (logistic, Poisson, Gamma regression)

references/discrete_choice.md

Comprehensive guide to discrete outcome models:

Binary models (Logit, Probit)
Multinomial models (MNLogit, Conditional Logit)
Count models (Poisson, Negative Binomial, Zero-Inflated, Hurdle)
Ordinal models
Marginal effects and interpretation
Model diagnostics and comparison

references/time_series.md

In-depth time series analysis guidance:

Univariate models (AR, ARIMA, SARIMAX, Exponential Smoothing)
Multivariate models (VAR, VARMAX, Dynamic Factor)
State space models
Stationarity testing and diagnostics
Forecasting methods and evaluation
Granger causality, IRF, FEVD

references/stats_diagnostics.md

Comprehensive statistical testing and diagnostics:

Residual diagnostics (autocorrelation, heteroskedasticity, normality)
Influence and outlier detection
Hypothesis tests (parametric and non-parametric)
ANOVA and post-hoc tests
Multiple comparisons correction
Robust covariance matrices
Power analysis and effect sizes

When to reference:

Need detailed parameter explanations
Choosing between similar models
Troubleshooting convergence or diagnostic issues
Understanding specific test statistics
Looking for code examples for advanced features

Search patterns:

# Find information about specific models
rg "Quantile Regression" references/

# Find diagnostic tests
rg "Breusch-Pagan" references/stats_diagnostics.md

# Find time series guidance
rg "SARIMAX" references/time_series.md

Common Pitfalls to Avoid

1. Forgetting constant term: Always use sm.add_constant() unless no intercept desired 2. Ignoring assumptions: Check residuals, heteroskedasticity, autocorrelation 3. Wrong model for outcome type: Binary→Logit/Probit, Count→Poisson/NB, not OLS 4. Not checking convergence: Look for optimization warnings 5. Misinterpreting coefficients: Remember link functions (log, logit, etc.) 6. Using Poisson with overdispersion: Check dispersion, use Negative Binomial if needed 7. Not using robust SEs: When heteroskedasticity or clustering present 8. Overfitting: Too many parameters relative to sample size 9. Data leakage: Fitting on test data or using future information 10. Not validating predictions: Always check out-of-sample performance 11. Comparing non-nested models: Use AIC/BIC, not LR test 12. Ignoring influential observations: Check Cook's distance and leverage 13. Multiple testing: Correct p-values when testing many hypotheses 14. Not differencing time series: Fit ARIMA on non-stationary data 15. Confusing prediction vs confidence intervals: Prediction intervals are wider

Getting Help

For detailed documentation and examples:

Official docs: https://www.statsmodels.org/stable/
User guide: https://www.statsmodels.org/stable/user-guide.html
Examples: https://www.statsmodels.org/stable/examples/index.html
API reference: https://www.statsmodels.org/stable/api.html

Discrete Choice Models Reference

This document provides comprehensive guidance on discrete choice models in statsmodels, including binary, multinomial, count, and ordinal models.

Overview

Discrete choice models handle outcomes that are:

Binary: 0/1, success/failure
Multinomial: Multiple unordered categories
Ordinal: Ordered categories
Count: Non-negative integers

All models use maximum likelihood estimation and assume i.i.d. errors.

Binary Models

Logit (Logistic Regression)

Uses logistic distribution for binary outcomes.

When to use:

Binary classification (yes/no, success/failure)
Probability estimation for binary outcomes
Interpretable odds ratios

Model: P(Y=1|X) = 1 / (1 + exp(-Xβ))

import statsmodels.api as sm
from statsmodels.discrete.discrete_model import Logit

# Prepare data
X = sm.add_constant(X_data)

# Fit model
model = Logit(y, X)
results = model.fit()

print(results.summary())

Interpretation:

import numpy as np

# Odds ratios
odds_ratios = np.exp(results.params)
print("Odds ratios:", odds_ratios)

# For 1-unit increase in X, odds multiply by exp(β)
# OR > 1: increases odds of success
# OR < 1: decreases odds of success
# OR = 1: no effect

# Confidence intervals for odds ratios
odds_ci = np.exp(results.conf_int())
print("Odds ratio 95% CI:")
print(odds_ci)

Marginal effects:

# Average marginal effects (AME)
marginal_effects = results.get_margeff(at='mean')
print(marginal_effects.summary())

# Marginal effects at means (MEM)
marginal_effects_mem = results.get_margeff(at='mean', method='dydx')

# Marginal effects at representative values
marginal_effects_custom = results.get_margeff(at='mean',
                                              atexog={'x1': 1, 'x2': 5})

Predictions:

# Predicted probabilities
probs = results.predict(X)

# Binary predictions (0.5 threshold)
predictions = (probs > 0.5).astype(int)

# Custom threshold
threshold = 0.3
predictions_custom = (probs > threshold).astype(int)

# For new data
X_new = sm.add_constant(X_new_data)
new_probs = results.predict(X_new)

Model evaluation:

from sklearn.metrics import (classification_report, confusion_matrix,
                             roc_auc_score, roc_curve)

# Classification report
print(classification_report(y, predictions))

# Confusion matrix
print(confusion_matrix(y, predictions))

# AUC-ROC
auc = roc_auc_score(y, probs)
print(f"AUC: {auc:.4f}")

# Pseudo R-squared
print(f"McFadden's Pseudo R²: {results.prsquared:.4f}")

Probit

Uses normal distribution for binary outcomes.

When to use:

Binary outcomes
Prefer normal distribution assumption
Field convention (econometrics often uses probit)

Model: P(Y=1|X) = Φ(Xβ), where Φ is standard normal CDF

from statsmodels.discrete.discrete_model import Probit

model = Probit(y, X)
results = model.fit()

print(results.summary())

Comparison with Logit:

Probit and Logit usually give similar results
Probit: symmetric, based on normal distribution
Logit: slightly heavier tails, easier interpretation (odds ratios)
Coefficients not directly comparable (scale difference)

# Marginal effects are comparable
logit_me = logit_results.get_margeff().margeff
probit_me = probit_results.get_margeff().margeff

print("Logit marginal effects:", logit_me)
print("Probit marginal effects:", probit_me)

Multinomial Models

MNLogit (Multinomial Logit)

For unordered categorical outcomes with 3+ categories.

When to use:

Multiple unordered categories (e.g., transportation mode, brand choice)
No natural ordering among categories
Need probabilities for each category

Model: P(Y=j|X) = exp(Xβⱼ) / Σₖ exp(Xβₖ)

from statsmodels.discrete.discrete_model import MNLogit

# y should be integers 0, 1, 2, ... for categories
model = MNLogit(y, X)
results = model.fit()

print(results.summary())

Interpretation:

# One category is reference (usually category 0)
# Coefficients represent log-odds relative to reference

# For category j vs reference:
# exp(β_j) = odds ratio of category j vs reference

# Predicted probabilities for each category
probs = results.predict(X)  # Shape: (n_samples, n_categories)

# Most likely category
predicted_categories = probs.argmax(axis=1)

Relative risk ratios:

# Exponentiate coefficients for relative risk ratios
import numpy as np
import pandas as pd

# Get parameter names and values
params_df = pd.DataFrame({
    'coef': results.params,
    'RRR': np.exp(results.params)
})
print(params_df)

Conditional Logit

For choice models where alternatives have characteristics.

When to use:

Alternative-specific regressors (vary across choices)
Panel data with choices
Discrete choice experiments

from statsmodels.discrete.conditional_models import ConditionalLogit

# Data structure: long format with choice indicator
model = ConditionalLogit(y_choice, X_alternatives, groups=individual_id)
results = model.fit()

Count Models

Poisson

Standard model for count data.

When to use:

Count outcomes (events, occurrences)
Rare events
Mean ≈ variance

Model: P(Y=k|X) = exp(-λ) λᵏ / k!, where log(λ) = Xβ

from statsmodels.discrete.discrete_model import Poisson

model = Poisson(y_counts, X)
results = model.fit()

print(results.summary())

Interpretation:

# Rate ratios (incident rate ratios)
rate_ratios = np.exp(results.params)
print("Rate ratios:", rate_ratios)

# For 1-unit increase in X, expected count multiplies by exp(β)

Check overdispersion:

# Mean and variance should be similar for Poisson
print(f"Mean: {y_counts.mean():.2f}")
print(f"Variance: {y_counts.var():.2f}")

# Formal test
from statsmodels.stats.stattools import durbin_watson

# Overdispersion if variance >> mean
# Rule of thumb: variance/mean > 1.5 suggests overdispersion
overdispersion_ratio = y_counts.var() / y_counts.mean()
print(f"Variance/Mean: {overdispersion_ratio:.2f}")

if overdispersion_ratio > 1.5:
    print("Consider Negative Binomial model")

With offset (for rates):

# When modeling rates with varying exposure
# log(λ) = log(exposure) + Xβ

model = Poisson(y_counts, X, offset=np.log(exposure))
results = model.fit()

Negative Binomial

For overdispersed count data (variance > mean).

When to use:

Count data with overdispersion
Excess variance not explained by Poisson
Heterogeneity in counts

Model: Adds dispersion parameter α to account for overdispersion

from statsmodels.discrete.discrete_model import NegativeBinomial

model = NegativeBinomial(y_counts, X)
results = model.fit()

print(results.summary())
print(f"Dispersion parameter alpha: {results.params['alpha']:.4f}")

Compare with Poisson:

# Fit both models
poisson_results = Poisson(y_counts, X).fit()
nb_results = NegativeBinomial(y_counts, X).fit()

# AIC comparison (lower is better)
print(f"Poisson AIC: {poisson_results.aic:.2f}")
print(f"Negative Binomial AIC: {nb_results.aic:.2f}")

# Likelihood ratio test (if NB is better)
from scipy import stats
lr_stat = 2 * (nb_results.llf - poisson_results.llf)
lr_pval = 1 - stats.chi2.cdf(lr_stat, df=1)  # 1 extra parameter (alpha)
print(f"LR test p-value: {lr_pval:.4f}")

if lr_pval < 0.05:
    print("Negative Binomial significantly better")

Zero-Inflated Models

For count data with excess zeros.

When to use:

More zeros than expected from Poisson/NB
Two processes: one for zeros, one for counts
Examples: number of doctor visits, insurance claims

Models:

ZeroInflatedPoisson (ZIP)
ZeroInflatedNegativeBinomialP (ZINB)

from statsmodels.discrete.count_model import (ZeroInflatedPoisson,
                                               ZeroInflatedNegativeBinomialP)

# ZIP model
zip_model = ZeroInflatedPoisson(y_counts, X, exog_infl=X_inflation)
zip_results = zip_model.fit()

# ZINB model (for overdispersion + excess zeros)
zinb_model = ZeroInflatedNegativeBinomialP(y_counts, X, exog_infl=X_inflation)
zinb_results = zinb_model.fit()

print(zip_results.summary())

Two parts of the model:

# 1. Inflation model: P(Y=0 due to inflation)
# 2. Count model: distribution of counts

# Predicted probabilities of inflation
inflation_probs = zip_results.predict(X, which='prob')

# Predicted counts
predicted_counts = zip_results.predict(X, which='mean')

Hurdle Models

Two-stage model: whether any counts, then how many.

When to use:

Excess zeros
Different processes for zero vs positive counts
Zeros structurally different from positive values

from statsmodels.discrete.truncated_model import HurdleCountModel

# Poisson-Poisson hurdle model; use `zerodist` for the zero hurdle process
model = HurdleCountModel(y_counts, X,
                         dist='poisson',
                         zerodist='poisson')
results = model.fit()

print(results.summary())

Ordinal Models

Ordered Logit/Probit

For ordered categorical outcomes.

When to use:

Ordered categories (e.g., low/medium/high, ratings 1-5)
Natural ordering matters
Want to respect ordinal structure

Model: Cumulative probability model with cutpoints

from statsmodels.miscmodels.ordinal_model import OrderedModel

# y should be ordered integers: 0, 1, 2, ...
model = OrderedModel(y_ordered, X, distr='logit')  # or 'probit'
results = model.fit(method='bfgs')

print(results.summary())

Interpretation:

# Cutpoints (thresholds between categories)
cutpoints = results.params[-n_categories+1:]
print("Cutpoints:", cutpoints)

# Coefficients
coefficients = results.params[:-n_categories+1]
print("Coefficients:", coefficients)

# Predicted probabilities for each category
probs = results.predict(X)  # Shape: (n_samples, n_categories)

# Most likely category
predicted_categories = probs.argmax(axis=1)

Proportional odds assumption:

# Test if coefficients are same across cutpoints
# (Brant test - implement manually or check residuals)

# Check: model each cutpoint separately and compare coefficients

Model Diagnostics

Goodness of Fit

# Pseudo R-squared (McFadden)
print(f"Pseudo R²: {results.prsquared:.4f}")

# AIC/BIC for model comparison
print(f"AIC: {results.aic:.2f}")
print(f"BIC: {results.bic:.2f}")

# Log-likelihood
print(f"Log-likelihood: {results.llf:.2f}")

# Likelihood ratio test vs null model
lr_stat = 2 * (results.llf - results.llnull)
from scipy import stats
lr_pval = 1 - stats.chi2.cdf(lr_stat, results.df_model)
print(f"LR test p-value: {lr_pval}")

Classification Metrics (Binary)

from sklearn.metrics import (accuracy_score, precision_score, recall_score,
                             f1_score, roc_auc_score)

# Predictions
probs = results.predict(X)
predictions = (probs > 0.5).astype(int)

# Metrics
print(f"Accuracy: {accuracy_score(y, predictions):.4f}")
print(f"Precision: {precision_score(y, predictions):.4f}")
print(f"Recall: {recall_score(y, predictions):.4f}")
print(f"F1: {f1_score(y, predictions):.4f}")
print(f"AUC: {roc_auc_score(y, probs):.4f}")

Classification Metrics (Multinomial)

from sklearn.metrics import accuracy_score, classification_report, log_loss

# Predicted categories
probs = results.predict(X)
predictions = probs.argmax(axis=1)

# Accuracy
accuracy = accuracy_score(y, predictions)
print(f"Accuracy: {accuracy:.4f}")

# Classification report
print(classification_report(y, predictions))

# Log loss
logloss = log_loss(y, probs)
print(f"Log Loss: {logloss:.4f}")

Count Model Diagnostics

# Observed vs predicted frequencies
observed = pd.Series(y_counts).value_counts().sort_index()
predicted = results.predict(X)
predicted_counts = pd.Series(np.round(predicted)).value_counts().sort_index()

# Compare distributions
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
observed.plot(kind='bar', alpha=0.5, label='Observed', ax=ax)
predicted_counts.plot(kind='bar', alpha=0.5, label='Predicted', ax=ax)
ax.legend()
ax.set_xlabel('Count')
ax.set_ylabel('Frequency')
plt.show()

# Rootogram (better visualization)
from statsmodels.graphics.agreement import mean_diff_plot
# Custom rootogram implementation needed

Influence and Outliers

# Standardized residuals
std_resid = (y - results.predict(X)) / np.sqrt(results.predict(X))

# Check for outliers (|std_resid| > 2)
outliers = np.where(np.abs(std_resid) > 2)[0]
print(f"Number of outliers: {len(outliers)}")

# Leverage (hat values) - for logit/probit
# from statsmodels.stats.outliers_influence

Hypothesis Testing

# Single parameter test (automatic in summary)

# Multiple parameters: Wald test
# Test H0: β₁ = β₂ = 0
R = [[0, 1, 0, 0], [0, 0, 1, 0]]
wald_test = results.wald_test(R)
print(wald_test)

# Likelihood ratio test for nested models
model_reduced = Logit(y, X_reduced).fit()
model_full = Logit(y, X_full).fit()

lr_stat = 2 * (model_full.llf - model_reduced.llf)
df = model_full.df_model - model_reduced.df_model
from scipy import stats
lr_pval = 1 - stats.chi2.cdf(lr_stat, df)
print(f"LR test p-value: {lr_pval:.4f}")

Model Selection and Comparison

# Fit multiple models
models = {
    'Logit': Logit(y, X).fit(),
    'Probit': Probit(y, X).fit(),
    # Add more models
}

# Compare AIC/BIC
comparison = pd.DataFrame({
    'AIC': {name: model.aic for name, model in models.items()},
    'BIC': {name: model.bic for name, model in models.items()},
    'Pseudo R²': {name: model.prsquared for name, model in models.items()}
})
print(comparison.sort_values('AIC'))

# Cross-validation for predictive performance
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

# Use sklearn wrapper or manual CV

Formula API

Use R-style formulas for easier specification.

import statsmodels.formula.api as smf

# Logit with formula
formula = 'y ~ x1 + x2 + C(category) + x1:x2'
results = smf.logit(formula, data=df).fit()

# MNLogit with formula
results = smf.mnlogit(formula, data=df).fit()

# Poisson with formula
results = smf.poisson(formula, data=df).fit()

# Negative Binomial with formula
results = smf.negativebinomial(formula, data=df).fit()

Common Applications

Binary Classification (Marketing Response)

# Predict customer purchase probability
X = sm.add_constant(customer_features)
model = Logit(purchased, X)
results = model.fit()

# Targeting: select top 20% likely to purchase
probs = results.predict(X)
top_20_pct_idx = np.argsort(probs)[-int(0.2*len(probs)):]

Multinomial Choice (Transportation Mode)

# Predict transportation mode choice
model = MNLogit(mode_choice, X)
results = model.fit()

# Predicted mode for new commuter
new_commuter = sm.add_constant(new_features)
mode_probs = results.predict(new_commuter)
predicted_mode = mode_probs.argmax(axis=1)

Count Data (Number of Doctor Visits)

# Model healthcare utilization
model = NegativeBinomial(num_visits, X)
results = model.fit()

# Expected visits for new patient
expected_visits = results.predict(new_patient_X)

Zero-Inflated (Insurance Claims)

# Many people have zero claims
# Zero-inflation: some never claim
# Count process: those who might claim

zip_model = ZeroInflatedPoisson(claims, X_count, exog_infl=X_inflation)
results = zip_model.fit()

# P(never file claim)
never_claim_prob = results.predict(X, which='prob-zero')

# Expected claims
expected_claims = results.predict(X, which='mean')

Best Practices

1. Check data type: Ensure response matches model (binary, counts, categories) 2. Add constant: Always use sm.add_constant() unless no intercept desired 3. Scale continuous predictors: For better convergence and interpretation 4. Check convergence: Look for convergence warnings 5. Use formula API: For categorical variables and interactions 6. Marginal effects: Report marginal effects, not just coefficients 7. Model comparison: Use AIC/BIC and cross-validation 8. Validate: Holdout set or cross-validation for predictive models 9. Check overdispersion: For count models, test Poisson assumption 10. Consider alternatives: Zero-inflation, hurdle models for excess zeros

Common Pitfalls

1. Forgetting constant: No intercept term 2. Perfect separation: Logit/probit may not converge 3. Using Poisson with overdispersion: Check and use Negative Binomial 4. Misinterpreting coefficients: Remember they're on log-odds/log scale 5. Not checking convergence: Optimization may fail silently 6. Wrong distribution: Match model to data type (binary/count/categorical) 7. Ignoring excess zeros: Use ZIP/ZINB when appropriate 8. Not validating predictions: Always check out-of-sample performance 9. Comparing non-nested models: Use AIC/BIC, not likelihood ratio test 10. Ordinal as nominal: Use OrderedModel for ordered categories

Generalized Linear Models (GLM) Reference

This document provides comprehensive guidance on generalized linear models in statsmodels, including families, link functions, and applications.

Overview

GLMs extend linear regression to non-normal response distributions through: 1. Distribution family: Specifies the conditional distribution of the response 2. Link function: Transforms the linear predictor to the scale of the mean 3. Variance function: Relates variance to the mean

General form: g(μ) = Xβ, where g is the link function and μ = E(Y|X)

When to Use GLM

Binary outcomes: Logistic regression (Binomial family with logit link)
Count data: Poisson or Negative Binomial regression
Positive continuous data: Gamma or Inverse Gaussian
Non-normal distributions: When OLS assumptions violated
Link functions: Need non-linear relationship between predictors and response scale

Distribution Families

Binomial Family

For binary outcomes (0/1) or proportions (k/n).

When to use:

Binary classification
Success/failure outcomes
Proportions or rates

Common links:

Logit (default): log(μ/(1-μ))
Probit: Φ⁻¹(μ)
Log: log(μ)

import statsmodels.api as sm
import statsmodels.formula.api as smf

# Binary logistic regression
model = sm.GLM(y, X, family=sm.families.Binomial())
results = model.fit()

# Formula API
results = smf.glm('success ~ x1 + x2', data=df,
                  family=sm.families.Binomial()).fit()

# Access predictions (probabilities)
probs = results.predict(X_new)

# Classification (0.5 threshold)
predictions = (probs > 0.5).astype(int)

Interpretation:

import numpy as np

# Odds ratios (for logit link)
odds_ratios = np.exp(results.params)
print("Odds ratios:", odds_ratios)

# For 1-unit increase in x, odds multiply by exp(beta)

Poisson Family

For count data (non-negative integers).

When to use:

Count outcomes (number of events)
Rare events
Rate modeling (with offset)

Common links:

Log (default): log(μ)
Identity: μ
Sqrt: √μ

# Poisson regression
model = sm.GLM(y, X, family=sm.families.Poisson())
results = model.fit()

# With exposure/offset for rates
# If modeling rate = counts/exposure
model = sm.GLM(y, X, family=sm.families.Poisson(),
               offset=np.log(exposure))
results = model.fit()

# Interpretation: exp(beta) = multiplicative effect on expected count
import numpy as np
rate_ratios = np.exp(results.params)
print("Rate ratios:", rate_ratios)

Overdispersion check:

# Deviance / df should be ~1 for Poisson
overdispersion = results.deviance / results.df_resid
print(f"Overdispersion: {overdispersion}")

# If >> 1, consider Negative Binomial
if overdispersion > 1.5:
    print("Consider Negative Binomial model for overdispersion")

Negative Binomial Family

For overdispersed count data.

When to use:

Count data with variance > mean
Excess zeros or large variance
Poisson model shows overdispersion

# Negative Binomial GLM with fixed alpha
model = sm.GLM(y, X, family=sm.families.NegativeBinomial(alpha=1.0))
results = model.fit()

# Use the discrete count model when alpha should be estimated
from statsmodels.discrete.discrete_model import NegativeBinomial
nb_model = NegativeBinomial(y, X)
nb_results = nb_model.fit()

print(f"Dispersion parameter alpha: {nb_results.params[-1]}")

Gaussian Family

Equivalent to OLS but fit via IRLS (Iteratively Reweighted Least Squares).

When to use:

Want GLM framework for consistency
Need robust standard errors
Comparing with other GLMs

Common links:

Identity (default): μ
Log: log(μ)
Inverse: 1/μ

# Gaussian GLM (equivalent to OLS)
model = sm.GLM(y, X, family=sm.families.Gaussian())
results = model.fit()

# Verify equivalence with OLS
ols_results = sm.OLS(y, X).fit()
print("Parameters close:", np.allclose(results.params, ols_results.params))

Gamma Family

For positive continuous data, often right-skewed.

When to use:

Positive outcomes (insurance claims, survival times)
Right-skewed distributions
Variance proportional to mean²

Common links:

Inverse (default): 1/μ
Log: log(μ)
Identity: μ

# Gamma regression (common for cost data)
model = sm.GLM(y, X, family=sm.families.Gamma())
results = model.fit()

# Log link often preferred for interpretation
model = sm.GLM(y, X, family=sm.families.Gamma(link=sm.families.links.Log()))
results = model.fit()

# With log link, exp(beta) = multiplicative effect
import numpy as np
effects = np.exp(results.params)

Inverse Gaussian Family

For positive continuous data with specific variance structure.

When to use:

Positive skewed outcomes
Variance proportional to mean³
Alternative to Gamma

Common links:

Inverse squared (default): 1/μ²
Log: log(μ)

model = sm.GLM(y, X, family=sm.families.InverseGaussian())
results = model.fit()

Tweedie Family

Flexible family covering multiple distributions.

When to use:

Insurance claims (mixture of zeros and continuous)
Semi-continuous data
Need flexible variance function

Special cases (power parameter p):

p=0: Normal
p=1: Poisson
p=2: Gamma
p=3: Inverse Gaussian
1<p<2: Compound Poisson-Gamma (common for insurance)

# Tweedie with power=1.5
model = sm.GLM(y, X, family=sm.families.Tweedie(link=sm.families.links.Log(),
                                                 var_power=1.5))
results = model.fit()

Link Functions

Link functions connect the linear predictor to the mean of the response.

Available Links

from statsmodels.genmod import families

# Identity: g(μ) = μ
link = families.links.Identity()

# Log: g(μ) = log(μ)
link = families.links.Log()

# Logit: g(μ) = log(μ/(1-μ))
link = families.links.Logit()

# Probit: g(μ) = Φ⁻¹(μ)
link = families.links.Probit()

# Complementary log-log: g(μ) = log(-log(1-μ))
link = families.links.CLogLog()

# Inverse: g(μ) = 1/μ
link = families.links.InversePower()

# Inverse squared: g(μ) = 1/μ²
link = families.links.InverseSquared()

# Square root: g(μ) = √μ
link = families.links.Sqrt()

# Power: g(μ) = μ^p
link = families.links.Power(power=2)

Choosing Link Functions

Canonical links (default for each family):

Binomial → Logit
Poisson → Log
Gamma → Inverse
Gaussian → Identity
Inverse Gaussian → Inverse squared

When to use non-canonical:

Log link with Binomial: Risk ratios instead of odds ratios
Identity link: Direct additive effects (when sensible)
Probit vs Logit: Similar results, preference based on field
CLogLog: Asymmetric relationship, common in survival analysis

# Example: Risk ratios with log-binomial model
model = sm.GLM(y, X, family=sm.families.Binomial(link=sm.families.links.Log()))
results = model.fit()

# exp(beta) now gives risk ratios, not odds ratios
risk_ratios = np.exp(results.params)

Model Fitting and Results

Basic Workflow

import statsmodels.api as sm

# Add constant
X = sm.add_constant(X_data)

# Specify family and link
family = sm.families.Poisson(link=sm.families.links.Log())

# Fit model using IRLS
model = sm.GLM(y, X, family=family)
results = model.fit()

# Summary
print(results.summary())

Results Attributes

# Parameters and inference
results.params              # Coefficients
results.bse                 # Standard errors
results.tvalues            # Z-statistics
results.pvalues            # P-values
results.conf_int()         # Confidence intervals

# Predictions
results.fittedvalues       # Fitted values (μ)
results.predict(X_new)     # Predictions for new data

# Model fit statistics
results.aic                # Akaike Information Criterion
results.bic                # Bayesian Information Criterion
results.deviance           # Deviance
results.null_deviance      # Null model deviance
results.pearson_chi2       # Pearson chi-squared statistic
results.df_resid           # Residual degrees of freedom
results.llf                # Log-likelihood

# Residuals
results.resid_response     # Response residuals (y - μ)
results.resid_pearson      # Pearson residuals
results.resid_deviance     # Deviance residuals
results.resid_anscombe     # Anscombe residuals
results.resid_working      # Working residuals

Pseudo R-squared

# McFadden's pseudo R-squared
pseudo_r2 = 1 - (results.deviance / results.null_deviance)
print(f"Pseudo R²: {pseudo_r2:.4f}")

# Adjusted pseudo R-squared
n = len(y)
k = len(results.params)
adj_pseudo_r2 = 1 - ((n-1)/(n-k)) * (results.deviance / results.null_deviance)
print(f"Adjusted Pseudo R²: {adj_pseudo_r2:.4f}")

Diagnostics

Goodness of Fit

# Deviance should be approximately χ² with df_resid degrees of freedom
from scipy import stats

deviance_pval = 1 - stats.chi2.cdf(results.deviance, results.df_resid)
print(f"Deviance test p-value: {deviance_pval}")

# Pearson chi-squared test
pearson_pval = 1 - stats.chi2.cdf(results.pearson_chi2, results.df_resid)
print(f"Pearson chi² test p-value: {pearson_pval}")

# Check for overdispersion/underdispersion
dispersion = results.pearson_chi2 / results.df_resid
print(f"Dispersion: {dispersion}")
# Should be ~1; >1 suggests overdispersion, <1 underdispersion

Residual Analysis

import matplotlib.pyplot as plt

# Deviance residuals vs fitted
plt.figure(figsize=(10, 6))
plt.scatter(results.fittedvalues, results.resid_deviance, alpha=0.5)
plt.xlabel('Fitted values')
plt.ylabel('Deviance residuals')
plt.axhline(y=0, color='r', linestyle='--')
plt.title('Deviance Residuals vs Fitted')
plt.show()

# Q-Q plot of deviance residuals
from statsmodels.graphics.gofplots import qqplot
qqplot(results.resid_deviance, line='s')
plt.title('Q-Q Plot of Deviance Residuals')
plt.show()

# For binary outcomes: binned residual plot
if isinstance(results.model.family, sm.families.Binomial):
    from statsmodels.graphics.gofplots import qqplot
    # Group predictions and compute average residuals
    # (custom implementation needed)
    pass

Influence and Outliers

from statsmodels.stats.outliers_influence import GLMInfluence

influence = GLMInfluence(results)

# Leverage
leverage = influence.hat_matrix_diag

# Cook's distance
cooks_d = influence.cooks_distance[0]

# DFFITS
dffits = influence.dffits[0]

# Find influential observations
influential = np.where(cooks_d > 4/len(y))[0]
print(f"Influential observations: {influential}")

Hypothesis Testing

# Wald test for single parameter (automatically in summary)

# Likelihood ratio test for nested models
# Fit reduced model
model_reduced = sm.GLM(y, X_reduced, family=family).fit()
model_full = sm.GLM(y, X_full, family=family).fit()

# LR statistic
lr_stat = 2 * (model_full.llf - model_reduced.llf)
df = model_full.df_model - model_reduced.df_model

from scipy import stats
lr_pval = 1 - stats.chi2.cdf(lr_stat, df)
print(f"LR test p-value: {lr_pval}")

# Wald test for multiple parameters
# Test beta_1 = beta_2 = 0
R = [[0, 1, 0, 0], [0, 0, 1, 0]]
wald_test = results.wald_test(R)
print(wald_test)

Robust Standard Errors

# Heteroscedasticity-robust (sandwich estimator)
results_robust = results.get_robustcov_results(cov_type='HC0')

# Cluster-robust
results_cluster = results.get_robustcov_results(cov_type='cluster',
                                                groups=cluster_ids)

# Compare standard errors
print("Regular SE:", results.bse)
print("Robust SE:", results_robust.bse)

Model Comparison

# AIC/BIC for non-nested models
models = [model1_results, model2_results, model3_results]
for i, res in enumerate(models, 1):
    print(f"Model {i}: AIC={res.aic:.2f}, BIC={res.bic:.2f}")

# Likelihood ratio test for nested models (as shown above)

# Cross-validation for predictive performance
from sklearn.model_selection import KFold
from sklearn.metrics import log_loss

kf = KFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = []

for train_idx, val_idx in kf.split(X):
    X_train, X_val = X[train_idx], X[val_idx]
    y_train, y_val = y[train_idx], y[val_idx]

    model_cv = sm.GLM(y_train, X_train, family=family).fit()
    pred_probs = model_cv.predict(X_val)

    score = log_loss(y_val, pred_probs)
    cv_scores.append(score)

print(f"CV Log Loss: {np.mean(cv_scores):.4f} ± {np.std(cv_scores):.4f}")

Prediction

# Point predictions
predictions = results.predict(X_new)

# For classification: get probabilities and convert
if isinstance(family, sm.families.Binomial):
    probs = predictions
    class_predictions = (probs > 0.5).astype(int)

# For counts: predictions are expected counts
if isinstance(family, sm.families.Poisson):
    expected_counts = predictions

# Prediction intervals via bootstrap
n_boot = 1000
boot_preds = np.zeros((n_boot, len(X_new)))

for i in range(n_boot):
    # Bootstrap resample
    boot_idx = np.random.choice(len(y), size=len(y), replace=True)
    X_boot, y_boot = X[boot_idx], y[boot_idx]

    # Fit and predict
    boot_model = sm.GLM(y_boot, X_boot, family=family).fit()
    boot_preds[i] = boot_model.predict(X_new)

# 95% prediction intervals
pred_lower = np.percentile(boot_preds, 2.5, axis=0)
pred_upper = np.percentile(boot_preds, 97.5, axis=0)

Common Applications

Logistic Regression (Binary Classification)

import statsmodels.api as sm

# Fit logistic regression
X = sm.add_constant(X_data)
model = sm.GLM(y, X, family=sm.families.Binomial())
results = model.fit()

# Odds ratios
odds_ratios = np.exp(results.params)
odds_ci = np.exp(results.conf_int())

# Classification metrics
from sklearn.metrics import classification_report, roc_auc_score

probs = results.predict(X)
predictions = (probs > 0.5).astype(int)

print(classification_report(y, predictions))
print(f"AUC: {roc_auc_score(y, probs):.4f}")

# ROC curve
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

fpr, tpr, thresholds = roc_curve(y, probs)
plt.plot(fpr, tpr)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.show()

Poisson Regression (Count Data)

# Fit Poisson model
X = sm.add_constant(X_data)
model = sm.GLM(y_counts, X, family=sm.families.Poisson())
results = model.fit()

# Rate ratios
rate_ratios = np.exp(results.params)
print("Rate ratios:", rate_ratios)

# Check overdispersion
dispersion = results.pearson_chi2 / results.df_resid
if dispersion > 1.5:
    print(f"Overdispersion detected ({dispersion:.2f}). Consider Negative Binomial.")

Gamma Regression (Cost/Duration Data)

# Fit Gamma model with log link
X = sm.add_constant(X_data)
model = sm.GLM(y_cost, X,
               family=sm.families.Gamma(link=sm.families.links.Log()))
results = model.fit()

# Multiplicative effects
effects = np.exp(results.params)
print("Multiplicative effects on mean:", effects)

Best Practices

1. Check distribution assumptions: Plot histograms and Q-Q plots of response 2. Verify link function: Use canonical links unless there's a reason not to 3. Examine residuals: Deviance residuals should be approximately normal 4. Test for overdispersion: Especially for Poisson models 5. Use offsets appropriately: For rate modeling with varying exposure 6. Consider robust SEs: When variance assumptions questionable 7. Compare models: Use AIC/BIC for non-nested, LR test for nested 8. Interpret on original scale: Transform coefficients (e.g., exp for log link) 9. Check influential observations: Use Cook's distance 10. Validate predictions: Use cross-validation or holdout set

Common Pitfalls

1. Forgetting to add constant: No intercept term 2. Using wrong family: Check distribution of response 3. Ignoring overdispersion: Use Negative Binomial instead of Poisson 4. Misinterpreting coefficients: Remember link function transformation 5. Not checking convergence: IRLS may not converge; check warnings 6. Complete separation in logistic: Some categories perfectly predict outcome 7. Using identity link with bounded outcomes: May predict outside valid range 8. Comparing models with different samples: Use same observations 9. Forgetting offset in rate models: Must use log(exposure) as offset 10. Not considering alternatives: Mixed models, zero-inflation for complex data

Linear Regression Models Reference

This document provides detailed guidance on linear regression models in statsmodels, including OLS, GLS, WLS, quantile regression, and specialized variants.

Core Model Classes

OLS (Ordinary Least Squares)

Assumes independent, identically distributed errors (Σ=I). Best for standard regression with homoscedastic errors.

When to use:

Standard regression analysis
Errors are independent and have constant variance
No autocorrelation or heteroscedasticity
Most common starting point

Basic usage:

import statsmodels.api as sm
import numpy as np

# Prepare data - ALWAYS add constant for intercept
X = sm.add_constant(X_data)  # Adds column of 1s for intercept

# Fit model
model = sm.OLS(y, X)
results = model.fit()

# View results
print(results.summary())

Key results attributes:

results.params           # Coefficients
results.bse              # Standard errors
results.tvalues          # T-statistics
results.pvalues          # P-values
results.rsquared         # R-squared
results.rsquared_adj     # Adjusted R-squared
results.fittedvalues     # Fitted values (predictions on training data)
results.resid            # Residuals
results.conf_int()       # Confidence intervals for parameters

Prediction with confidence/prediction intervals:

# For in-sample predictions
pred = results.get_prediction(X)
pred_summary = pred.summary_frame()
print(pred_summary)  # Contains mean, std, confidence intervals

# For out-of-sample predictions
X_new = sm.add_constant(X_new_data)
pred_new = results.get_prediction(X_new)
pred_summary = pred_new.summary_frame()

# Access intervals
mean_ci_lower = pred_summary["mean_ci_lower"]
mean_ci_upper = pred_summary["mean_ci_upper"]
obs_ci_lower = pred_summary["obs_ci_lower"]  # Prediction intervals
obs_ci_upper = pred_summary["obs_ci_upper"]

Formula API (R-style):

import statsmodels.formula.api as smf

# Automatic handling of categorical variables and interactions
formula = 'y ~ x1 + x2 + C(category) + x1:x2'
results = smf.ols(formula, data=df).fit()

WLS (Weighted Least Squares)

Handles heteroscedastic errors (diagonal Σ) where variance differs across observations.

When to use:

Known heteroscedasticity (non-constant error variance)
Different observations have different reliability
Weights are known or can be estimated

Usage:

# If you know the weights (inverse variance)
weights = 1 / error_variance
model = sm.WLS(y, X, weights=weights)
results = model.fit()

# Common weight patterns:
# - 1/variance: when variance is known
# - n_i: sample size for grouped data
# - 1/x: when variance proportional to x

Feasible WLS (estimating weights):

# Step 1: Fit OLS
ols_results = sm.OLS(y, X).fit()

# Step 2: Model squared residuals to estimate variance
abs_resid = np.abs(ols_results.resid)
variance_model = sm.OLS(np.log(abs_resid**2), X).fit()

# Step 3: Use estimated variance as weights
weights = 1 / np.exp(variance_model.fittedvalues)
wls_results = sm.WLS(y, X, weights=weights).fit()

GLS (Generalized Least Squares)

Handles arbitrary covariance structure (Σ). Superclass for other regression methods.

When to use:

Known covariance structure
Correlated errors
More general than WLS

Usage:

# Specify covariance structure
# Sigma should be (n x n) covariance matrix
model = sm.GLS(y, X, sigma=Sigma)
results = model.fit()

GLSAR (GLS with Autoregressive Errors)

Feasible generalized least squares with AR(p) errors for time series data.

When to use:

Time series regression with autocorrelated errors
Need to account for serial correlation
Violations of error independence

Usage:

# AR(1) errors
model = sm.GLSAR(y, X, rho=1)  # rho=1 for AR(1), rho=2 for AR(2), etc.
results = model.iterative_fit()  # Iteratively estimates AR parameters

print(results.summary())
print(f"Estimated rho: {results.model.rho}")

RLS (Recursive Least Squares)

Sequential parameter estimation, useful for adaptive or online learning.

When to use:

Parameters change over time
Online/streaming data
Want to see parameter evolution

Usage:

from statsmodels.regression.recursive_ls import RecursiveLS

model = RecursiveLS(y, X)
results = model.fit()

# Access time-varying parameters
params_over_time = results.recursive_coefficients
cusum = results.cusum  # CUSUM statistic for structural breaks

Rolling Regressions

Compute estimates across moving windows for time-varying parameter detection.

When to use:

Parameters vary over time
Want to detect structural changes
Time series with evolving relationships

Usage:

from statsmodels.regression.rolling import RollingOLS, RollingWLS

# Rolling OLS with 60-period window
rolling_model = RollingOLS(y, X, window=60)
rolling_results = rolling_model.fit()

# Extract time-varying parameters
rolling_params = rolling_results.params  # DataFrame with parameters over time
rolling_rsquared = rolling_results.rsquared

# Plot parameter evolution
import matplotlib.pyplot as plt
rolling_params.plot()
plt.title('Time-Varying Coefficients')
plt.show()

Quantile Regression

Analyzes conditional quantiles rather than conditional mean.

When to use:

Interest in quantiles (median, 90th percentile, etc.)
Robust to outliers (median regression)
Distributional effects across quantiles
Heterogeneous effects

Usage:

from statsmodels.regression.quantile_regression import QuantReg

# Median regression (50th percentile)
model = QuantReg(y, X)
results_median = model.fit(q=0.5)

# Multiple quantiles
quantiles = [0.1, 0.25, 0.5, 0.75, 0.9]
results_dict = {}
for q in quantiles:
    results_dict[q] = model.fit(q=q)

# Plot quantile-varying effects
import matplotlib.pyplot as plt
coef_dict = {q: res.params for q, res in results_dict.items()}
coef_df = pd.DataFrame(coef_dict).T
coef_df.plot()
plt.xlabel('Quantile')
plt.ylabel('Coefficient')
plt.show()

Mixed Effects Models

For hierarchical/nested data with random effects.

When to use:

Clustered/grouped data (students in schools, patients in hospitals)
Repeated measures
Need random effects to account for grouping

Usage:

from statsmodels.regression.mixed_linear_model import MixedLM

# Random intercept model
model = MixedLM(y, X, groups=group_ids)
results = model.fit()

# Random intercept and slope
model = MixedLM(y, X, groups=group_ids, exog_re=X_random)
results = model.fit()

print(results.summary())

Diagnostics and Model Assessment

Residual Analysis

# Basic residual plots
import matplotlib.pyplot as plt

# Residuals vs fitted
plt.scatter(results.fittedvalues, results.resid)
plt.xlabel('Fitted values')
plt.ylabel('Residuals')
plt.axhline(y=0, color='r', linestyle='--')
plt.title('Residuals vs Fitted')
plt.show()

# Q-Q plot for normality
from statsmodels.graphics.gofplots import qqplot
qqplot(results.resid, line='s')
plt.show()

# Histogram of residuals
plt.hist(results.resid, bins=30, edgecolor='black')
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.title('Distribution of Residuals')
plt.show()

Specification Tests

from statsmodels.stats.diagnostic import het_breuschpagan, het_white
from statsmodels.stats.stattools import durbin_watson, jarque_bera

# Heteroscedasticity tests
lm_stat, lm_pval, f_stat, f_pval = het_breuschpagan(results.resid, X)
print(f"Breusch-Pagan test p-value: {lm_pval}")

# White test
white_test = het_white(results.resid, X)
print(f"White test p-value: {white_test[1]}")

# Autocorrelation
dw_stat = durbin_watson(results.resid)
print(f"Durbin-Watson statistic: {dw_stat}")
# DW ~ 2 indicates no autocorrelation
# DW < 2 suggests positive autocorrelation
# DW > 2 suggests negative autocorrelation

# Normality test
jb_stat, jb_pval, skew, kurtosis = jarque_bera(results.resid)
print(f"Jarque-Bera test p-value: {jb_pval}")

Multicollinearity

from statsmodels.stats.outliers_influence import variance_inflation_factor

# Calculate VIF for each variable
vif_data = pd.DataFrame()
vif_data["Variable"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]

print(vif_data)
# VIF > 10 indicates problematic multicollinearity
# VIF > 5 suggests moderate multicollinearity

# Condition number (from summary)
print(f"Condition number: {results.condition_number}")
# Condition number > 20 suggests multicollinearity
# Condition number > 30 indicates serious problems

Influence Statistics

from statsmodels.stats.outliers_influence import OLSInfluence

influence = results.get_influence()

# Leverage (hat values)
leverage = influence.hat_matrix_diag
# High leverage: > 2*p/n (p=predictors, n=observations)

# Cook's distance
cooks_d = influence.cooks_distance[0]
# Influential if Cook's D > 4/n

# DFFITS
dffits = influence.dffits[0]
# Influential if |DFFITS| > 2*sqrt(p/n)

# Create influence plot
from statsmodels.graphics.regressionplots import influence_plot
fig, ax = plt.subplots(figsize=(12, 8))
influence_plot(results, ax=ax)
plt.show()

Hypothesis Testing

# Test single coefficient
# H0: beta_i = 0 (automatically in summary)

# Test multiple restrictions using F-test
# Example: Test beta_1 = beta_2 = 0
R = [[0, 1, 0, 0], [0, 0, 1, 0]]  # Restriction matrix
f_test = results.f_test(R)
print(f_test)

# Formula-based hypothesis testing
f_test = results.f_test("x1 = x2 = 0")
print(f_test)

# Test linear combination: beta_1 + beta_2 = 1
r_matrix = [[0, 1, 1, 0]]
q_matrix = [1]  # RHS value
f_test = results.f_test((r_matrix, q_matrix))
print(f_test)

# Wald test (equivalent to F-test for linear restrictions)
wald_test = results.wald_test(R)
print(wald_test)

Model Comparison

# Compare nested models using likelihood ratio test (if using MLE)
from statsmodels.stats.anova import anova_lm

# Fit restricted and unrestricted models
model_restricted = sm.OLS(y, X_restricted).fit()
model_full = sm.OLS(y, X_full).fit()

# ANOVA table for model comparison
anova_results = anova_lm(model_restricted, model_full)
print(anova_results)

# AIC/BIC for non-nested model comparison
print(f"Model 1 AIC: {model1.aic}, BIC: {model1.bic}")
print(f"Model 2 AIC: {model2.aic}, BIC: {model2.bic}")
# Lower AIC/BIC indicates better model

Robust Standard Errors

Handle heteroscedasticity or clustering without reweighting.

# Heteroscedasticity-robust (HC) standard errors
results_hc = results.get_robustcov_results(cov_type='HC0')  # White's
results_hc1 = results.get_robustcov_results(cov_type='HC1')
results_hc2 = results.get_robustcov_results(cov_type='HC2')
results_hc3 = results.get_robustcov_results(cov_type='HC3')  # Most conservative

# Newey-West HAC (Heteroscedasticity and Autocorrelation Consistent)
results_hac = results.get_robustcov_results(cov_type='HAC', maxlags=4)

# Cluster-robust standard errors
results_cluster = results.get_robustcov_results(cov_type='cluster',
                                                groups=cluster_ids)

# View robust results
print(results_hc3.summary())

Best Practices

1. Always add constant: Use sm.add_constant() unless you specifically want to exclude the intercept 2. Check assumptions: Run diagnostic tests (heteroscedasticity, autocorrelation, normality) 3. Use formula API for categorical variables: smf.ols() handles categorical variables automatically 4. Robust standard errors: Use when heteroscedasticity detected but model specification is correct 5. Model selection: Use AIC/BIC for non-nested models, F-test/likelihood ratio for nested models 6. Outliers and influence: Always check Cook's distance and leverage 7. Multicollinearity: Check VIF and condition number before interpretation 8. Time series: Use GLSAR or robust HAC standard errors for autocorrelated errors 9. Grouped data: Consider mixed effects models or cluster-robust standard errors 10. Quantile regression: Use for robust estimation or when interested in distributional effects

Common Pitfalls

1. Forgetting to add constant: Results in no-intercept model 2. Ignoring heteroscedasticity: Use WLS or robust standard errors 3. Using OLS with autocorrelated errors: Use GLSAR or HAC standard errors 4. Over-interpreting with multicollinearity: Check VIF first 5. Not checking residuals: Always plot residuals vs fitted values 6. Using t-SNE/PCA residuals: Residuals should be from original space 7. Confusing prediction vs confidence intervals: Prediction intervals are wider 8. Not handling categorical variables properly: Use formula API or manual dummy coding 9. Comparing models with different sample sizes: Ensure same observations used 10. Ignoring influential observations: Check Cook's distance and DFFITS

Statistical Tests and Diagnostics Reference

This document provides comprehensive guidance on statistical tests, diagnostics, and tools available in statsmodels.

Overview

Statsmodels provides extensive statistical testing capabilities:

Residual diagnostics and specification tests
Hypothesis testing (parametric and non-parametric)
Goodness-of-fit tests
Multiple comparisons and post-hoc tests
Power and sample size calculations
Robust covariance matrices
Influence and outlier detection

Residual Diagnostics

Autocorrelation Tests

Ljung-Box Test: Tests for autocorrelation in residuals

from statsmodels.stats.diagnostic import acorr_ljungbox

# Test residuals for autocorrelation
lb_test = acorr_ljungbox(residuals, lags=10, return_df=True)
print(lb_test)

# H0: No autocorrelation up to lag k
# If p-value < 0.05, reject H0 (autocorrelation present)

Durbin-Watson Test: Tests for first-order autocorrelation

from statsmodels.stats.stattools import durbin_watson

dw_stat = durbin_watson(residuals)
print(f"Durbin-Watson: {dw_stat:.4f}")

# DW ≈ 2: no autocorrelation
# DW < 2: positive autocorrelation
# DW > 2: negative autocorrelation
# Exact critical values depend on n and k

Breusch-Godfrey Test: More general test for autocorrelation

from statsmodels.stats.diagnostic import acorr_breusch_godfrey

bg_test = acorr_breusch_godfrey(results, nlags=5)
lm_stat, lm_pval, f_stat, f_pval = bg_test

print(f"LM statistic: {lm_stat:.4f}, p-value: {lm_pval:.4f}")
# H0: No autocorrelation up to lag k

Heteroskedasticity Tests

Breusch-Pagan Test: Tests for heteroskedasticity

from statsmodels.stats.diagnostic import het_breuschpagan

bp_test = het_breuschpagan(residuals, exog)
lm_stat, lm_pval, f_stat, f_pval = bp_test

print(f"Breusch-Pagan test p-value: {lm_pval:.4f}")
# H0: Homoskedasticity (constant variance)
# If p-value < 0.05, reject H0 (heteroskedasticity present)

White Test: More general test for heteroskedasticity

from statsmodels.stats.diagnostic import het_white

white_test = het_white(residuals, exog)
lm_stat, lm_pval, f_stat, f_pval = white_test

print(f"White test p-value: {lm_pval:.4f}")
# H0: Homoskedasticity

ARCH Test: Tests for autoregressive conditional heteroskedasticity

from statsmodels.stats.diagnostic import het_arch

arch_test = het_arch(residuals, nlags=5)
lm_stat, lm_pval, f_stat, f_pval = arch_test

print(f"ARCH test p-value: {lm_pval:.4f}")
# H0: No ARCH effects
# If significant, consider GARCH model

Normality Tests

Jarque-Bera Test: Tests for normality using skewness and kurtosis

from statsmodels.stats.stattools import jarque_bera

jb_stat, jb_pval, skew, kurtosis = jarque_bera(residuals)

print(f"Jarque-Bera statistic: {jb_stat:.4f}")
print(f"p-value: {jb_pval:.4f}")
print(f"Skewness: {skew:.4f}")
print(f"Kurtosis: {kurtosis:.4f}")

# H0: Residuals are normally distributed
# Normal: skewness ≈ 0, kurtosis ≈ 3

Omnibus Test: Another normality test (also based on skewness/kurtosis)

from statsmodels.stats.stattools import omni_normtest

omni_stat, omni_pval = omni_normtest(residuals)
print(f"Omnibus test p-value: {omni_pval:.4f}")
# H0: Normality

Anderson-Darling Test: Distribution fit test

from statsmodels.stats.diagnostic import normal_ad

ad_stat, ad_pval = normal_ad(residuals)
print(f"Anderson-Darling test p-value: {ad_pval:.4f}")

Lilliefors Test: Modified Kolmogorov-Smirnov test

from statsmodels.stats.diagnostic import lilliefors

lf_stat, lf_pval = lilliefors(residuals, dist='norm')
print(f"Lilliefors test p-value: {lf_pval:.4f}")

Linearity and Specification Tests

Ramsey RESET Test: Tests for functional form misspecification

from statsmodels.stats.diagnostic import linear_reset

reset_test = linear_reset(results, power=2)
f_stat, f_pval = reset_test

print(f"RESET test p-value: {f_pval:.4f}")
# H0: Model is correctly specified (linear)
# If rejected, may need polynomial terms or transformations

Harvey-Collier Test: Tests for linearity

from statsmodels.stats.diagnostic import linear_harvey_collier

hc_stat, hc_pval = linear_harvey_collier(results)
print(f"Harvey-Collier test p-value: {hc_pval:.4f}")
# H0: Linear specification is correct

Multicollinearity Detection

Variance Inflation Factor (VIF):

from statsmodels.stats.outliers_influence import variance_inflation_factor
import pandas as pd

# Calculate VIF for each variable
vif_data = pd.DataFrame()
vif_data["Variable"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i)
                   for i in range(X.shape[1])]

print(vif_data.sort_values('VIF', ascending=False))

# Interpretation:
# VIF = 1: No correlation with other predictors
# VIF > 5: Moderate multicollinearity
# VIF > 10: Serious multicollinearity problem
# VIF > 20: Severe multicollinearity (consider removing variable)

Condition Number: From regression results

print(f"Condition number: {results.condition_number:.2f}")

# Interpretation:
# < 10: No multicollinearity concern
# 10-30: Moderate multicollinearity
# > 30: Strong multicollinearity
# > 100: Severe multicollinearity

Influence and Outlier Detection

Leverage

High leverage points have extreme predictor values.

from statsmodels.stats.outliers_influence import OLSInfluence

influence = results.get_influence()

# Hat values (leverage)
leverage = influence.hat_matrix_diag

# Rule of thumb: leverage > 2*p/n or 3*p/n is high
# p = number of parameters, n = sample size
threshold = 2 * len(results.params) / len(y)
high_leverage = np.where(leverage > threshold)[0]

print(f"High leverage observations: {high_leverage}")

Cook's Distance

Measures overall influence of each observation.

# Cook's distance
cooks_d = influence.cooks_distance[0]

# Rule of thumb: Cook's D > 4/n is influential
threshold = 4 / len(y)
influential = np.where(cooks_d > threshold)[0]

print(f"Influential observations (Cook's D): {influential}")

# Plot
import matplotlib.pyplot as plt
plt.stem(range(len(cooks_d)), cooks_d)
plt.axhline(y=threshold, color='r', linestyle='--', label=f'Threshold (4/n)')
plt.xlabel('Observation')
plt.ylabel("Cook's Distance")
plt.legend()
plt.show()

DFFITS

Measures influence on fitted value.

# DFFITS
dffits = influence.dffits[0]

# Rule of thumb: |DFFITS| > 2*sqrt(p/n) is influential
p = len(results.params)
n = len(y)
threshold = 2 * np.sqrt(p / n)

influential_dffits = np.where(np.abs(dffits) > threshold)[0]
print(f"Influential observations (DFFITS): {influential_dffits}")

DFBETAs

Measures influence on each coefficient.

# DFBETAs (one for each parameter)
dfbetas = influence.dfbetas

# Rule of thumb: |DFBETA| > 2/sqrt(n)
threshold = 2 / np.sqrt(n)

for i, param_name in enumerate(results.params.index):
    influential = np.where(np.abs(dfbetas[:, i]) > threshold)[0]
    if len(influential) > 0:
        print(f"Influential for {param_name}: {influential}")

Influence Plot

from statsmodels.graphics.regressionplots import influence_plot

fig, ax = plt.subplots(figsize=(12, 8))
influence_plot(results, ax=ax, criterion='cooks')
plt.show()

# Combines leverage, residuals, and Cook's distance
# Large bubbles = high Cook's distance
# Far from x=0 = high leverage
# Far from y=0 = large residual

Studentized Residuals

# Studentized residuals (outliers)
student_resid = influence.resid_studentized_internal

# External studentized residuals (more conservative)
student_resid_external = influence.resid_studentized_external

# Outliers: |studentized residual| > 3 (or > 2.5)
outliers = np.where(np.abs(student_resid_external) > 3)[0]
print(f"Outliers: {outliers}")

Hypothesis Testing

t-tests

One-sample t-test: Test if mean equals specific value

from scipy import stats

# H0: population mean = mu_0
t_stat, p_value = stats.ttest_1samp(data, popmean=mu_0)

print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")

Two-sample t-test: Compare means of two groups

# H0: mean1 = mean2 (equal variances)
t_stat, p_value = stats.ttest_ind(group1, group2)

# Welch's t-test (unequal variances)
t_stat, p_value = stats.ttest_ind(group1, group2, equal_var=False)

print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")

Paired t-test: Compare paired observations

# H0: mean difference = 0
t_stat, p_value = stats.ttest_rel(before, after)

print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")

Proportion Tests

One-proportion test:

from statsmodels.stats.proportion import proportions_ztest

# H0: proportion = p0
count = 45  # successes
nobs = 100  # total observations
p0 = 0.5    # hypothesized proportion

z_stat, p_value = proportions_ztest(count, nobs, value=p0)

print(f"z-statistic: {z_stat:.4f}")
print(f"p-value: {p_value:.4f}")

Two-proportion test:

# H0: proportion1 = proportion2
counts = [45, 60]
nobs = [100, 120]

z_stat, p_value = proportions_ztest(counts, nobs)
print(f"z-statistic: {z_stat:.4f}")
print(f"p-value: {p_value:.4f}")

Chi-square Tests

Chi-square test of independence:

from scipy.stats import chi2_contingency

# Contingency table
contingency_table = pd.crosstab(variable1, variable2)

chi2, p_value, dof, expected = chi2_contingency(contingency_table)

print(f"Chi-square statistic: {chi2:.4f}")
print(f"p-value: {p_value:.4f}")
print(f"Degrees of freedom: {dof}")

# H0: Variables are independent

Chi-square goodness-of-fit:

from scipy.stats import chisquare

# Observed frequencies
observed = [20, 30, 25, 25]

# Expected frequencies (equal by default)
expected = [25, 25, 25, 25]

chi2, p_value = chisquare(observed, expected)

print(f"Chi-square statistic: {chi2:.4f}")
print(f"p-value: {p_value:.4f}")

# H0: Data follow the expected distribution

Non-parametric Tests

Mann-Whitney U test (independent samples):

from scipy.stats import mannwhitneyu

# H0: Distributions are equal
u_stat, p_value = mannwhitneyu(group1, group2, alternative='two-sided')

print(f"U statistic: {u_stat:.4f}")
print(f"p-value: {p_value:.4f}")

Wilcoxon signed-rank test (paired samples):

from scipy.stats import wilcoxon

# H0: Median difference = 0
w_stat, p_value = wilcoxon(before, after)

print(f"W statistic: {w_stat:.4f}")
print(f"p-value: {p_value:.4f}")

Kruskal-Wallis H test (>2 groups):

from scipy.stats import kruskal

# H0: All groups have same distribution
h_stat, p_value = kruskal(group1, group2, group3)

print(f"H statistic: {h_stat:.4f}")
print(f"p-value: {p_value:.4f}")

Sign test:

from statsmodels.stats.descriptivestats import sign_test

# H0: Median = m0
result = sign_test(data, m0=0)
print(result)

ANOVA

One-way ANOVA:

from scipy.stats import f_oneway

# H0: All group means are equal
f_stat, p_value = f_oneway(group1, group2, group3)

print(f"F-statistic: {f_stat:.4f}")
print(f"p-value: {p_value:.4f}")

Two-way ANOVA (with statsmodels):

from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

# Fit model
model = ols('response ~ C(factor1) + C(factor2) + C(factor1):C(factor2)',
            data=df).fit()

# ANOVA table
anova_table = anova_lm(model, typ=2)
print(anova_table)

Repeated measures ANOVA:

from statsmodels.stats.anova import AnovaRM

# Requires long-format data
aovrm = AnovaRM(df, depvar='score', subject='subject_id', within=['time'])
results = aovrm.fit()

print(results.summary())

Multiple Comparisons

Post-hoc Tests

Tukey's HSD (Honest Significant Difference):

from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Perform Tukey HSD test
tukey = pairwise_tukeyhsd(data, groups, alpha=0.05)

print(tukey.summary())

# Plot confidence intervals
tukey.plot_simultaneous()
plt.show()

Bonferroni correction:

from statsmodels.stats.multitest import multipletests

# P-values from multiple tests
p_values = [0.01, 0.03, 0.04, 0.15, 0.001]

# Apply correction
reject, pvals_corrected, alphac_sidak, alphac_bonf = multipletests(
    p_values,
    alpha=0.05,
    method='bonferroni'
)

print("Rejected:", reject)
print("Corrected p-values:", pvals_corrected)

False Discovery Rate (FDR):

# FDR correction (less conservative than Bonferroni)
reject, pvals_corrected, alphac_sidak, alphac_bonf = multipletests(
    p_values,
    alpha=0.05,
    method='fdr_bh'  # Benjamini-Hochberg
)

print("Rejected:", reject)
print("Corrected p-values:", pvals_corrected)

Robust Covariance Matrices

Heteroskedasticity-Consistent (HC) Standard Errors

# After fitting OLS
results = sm.OLS(y, X).fit()

# HC0 (White's heteroskedasticity-consistent SEs)
results_hc0 = results.get_robustcov_results(cov_type='HC0')

# HC1 (degrees of freedom adjustment)
results_hc1 = results.get_robustcov_results(cov_type='HC1')

# HC2 (leverage adjustment)
results_hc2 = results.get_robustcov_results(cov_type='HC2')

# HC3 (most conservative, recommended for small samples)
results_hc3 = results.get_robustcov_results(cov_type='HC3')

print("Standard OLS SEs:", results.bse)
print("Robust HC3 SEs:", results_hc3.bse)

HAC (Heteroskedasticity and Autocorrelation Consistent)

Newey-West standard errors:

# For time series with autocorrelation and heteroskedasticity
results_hac = results.get_robustcov_results(cov_type='HAC', maxlags=4)

print("HAC (Newey-West) SEs:", results_hac.bse)
print(results_hac.summary())

Cluster-Robust Standard Errors

# For clustered/grouped data
results_cluster = results.get_robustcov_results(
    cov_type='cluster',
    groups=cluster_ids
)

print("Cluster-robust SEs:", results_cluster.bse)

Descriptive Statistics

Basic descriptive statistics:

from statsmodels.stats.api import DescrStatsW

# Comprehensive descriptive stats
desc = DescrStatsW(data)

print("Mean:", desc.mean)
print("Std Dev:", desc.std)
print("Variance:", desc.var)
print("Confidence interval:", desc.tconfint_mean())

# Quantiles
print("Median:", desc.quantile(0.5))
print("IQR:", desc.quantile([0.25, 0.75]))

Weighted statistics:

# With weights
desc_weighted = DescrStatsW(data, weights=weights)

print("Weighted mean:", desc_weighted.mean)
print("Weighted std:", desc_weighted.std)

Compare two groups:

from statsmodels.stats.weightstats import CompareMeans

# Create comparison object
cm = CompareMeans(DescrStatsW(group1), DescrStatsW(group2))

# t-test
print("t-test:", cm.ttest_ind())

# Confidence interval for difference
print("CI for difference:", cm.tconfint_diff())

# Test for equal variances
print("Equal variance test:", cm.test_equal_var())

Power Analysis and Sample Size

Power for t-test:

from statsmodels.stats.power import tt_ind_solve_power

# Solve for sample size
effect_size = 0.5  # Cohen's d
alpha = 0.05
power = 0.8

n = tt_ind_solve_power(effect_size=effect_size,
                        alpha=alpha,
                        power=power,
                        alternative='two-sided')

print(f"Required sample size per group: {n:.0f}")

# Solve for power given n
power = tt_ind_solve_power(effect_size=0.5,
                           nobs1=50,
                           alpha=0.05,
                           alternative='two-sided')

print(f"Power: {power:.4f}")

Power for proportion test:

from statsmodels.stats.power import zt_ind_solve_power

# For proportion tests (z-test)
effect_size = 0.3  # Difference in proportions
alpha = 0.05
power = 0.8

n = zt_ind_solve_power(effect_size=effect_size,
                        alpha=alpha,
                        power=power,
                        alternative='two-sided')

print(f"Required sample size per group: {n:.0f}")

Power curves:

from statsmodels.stats.power import TTestIndPower
import matplotlib.pyplot as plt

# Create power analysis object
analysis = TTestIndPower()

# Plot power curves for different sample sizes
sample_sizes = range(10, 200, 10)
effect_sizes = [0.2, 0.5, 0.8]  # Small, medium, large

fig, ax = plt.subplots(figsize=(10, 6))

for es in effect_sizes:
    power = [analysis.solve_power(effect_size=es, nobs1=n, alpha=0.05)
             for n in sample_sizes]
    ax.plot(sample_sizes, power, label=f'Effect size = {es}')

ax.axhline(y=0.8, color='r', linestyle='--', label='Power = 0.8')
ax.set_xlabel('Sample size per group')
ax.set_ylabel('Power')
ax.set_title('Power Curves for Two-Sample t-test')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

Effect Sizes

Cohen's d (standardized mean difference):

def cohens_d(group1, group2):
    \"\"\"Calculate Cohen's d for independent samples\"\"\"
    n1, n2 = len(group1), len(group2)
    var1, var2 = np.var(group1, ddof=1), np.var(group2, ddof=1)

    # Pooled standard deviation
    pooled_std = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))

    # Cohen's d
    d = (np.mean(group1) - np.mean(group2)) / pooled_std

    return d

d = cohens_d(group1, group2)
print(f"Cohen's d: {d:.4f}")

# Interpretation:
# |d| < 0.2: negligible
# |d| ~ 0.2: small
# |d| ~ 0.5: medium
# |d| ~ 0.8: large

Eta-squared (for ANOVA):

# From ANOVA table
# η² = SS_between / SS_total

def eta_squared(anova_table):
    return anova_table['sum_sq'][0] / anova_table['sum_sq'].sum()

# After running ANOVA
eta_sq = eta_squared(anova_table)
print(f"Eta-squared: {eta_sq:.4f}")

# Interpretation:
# 0.01: small effect
# 0.06: medium effect
# 0.14: large effect

Contingency Tables and Association

McNemar's test (paired binary data):

from statsmodels.stats.contingency_tables import mcnemar

# 2x2 contingency table
table = [[a, b],
         [c, d]]

result = mcnemar(table, exact=True)  # or exact=False for large samples
print(f"p-value: {result.pvalue:.4f}")

# H0: Marginal probabilities are equal

Cochran-Mantel-Haenszel test:

from statsmodels.stats.contingency_tables import StratifiedTable

# For stratified 2x2 tables
strat_table = StratifiedTable(tables_list)
result = strat_table.test_null_odds()

print(f"p-value: {result.pvalue:.4f}")

Treatment Effects and Causal Inference

Propensity score matching:

from statsmodels.treatment import propensity_score

# Estimate propensity scores
ps_model = sm.Logit(treatment, X).fit()
propensity_scores = ps_model.predict(X)

# Use for matching or weighting
# (manual implementation of matching needed)

Difference-in-differences:

# Did formula: outcome ~ treatment * post
model = ols('outcome ~ treatment + post + treatment:post', data=df).fit()

# DiD estimate is the interaction coefficient
did_estimate = model.params['treatment:post']
print(f"DiD estimate: {did_estimate:.4f}")

Best Practices

1. Always check assumptions: Test before interpreting results 2. Report effect sizes: Not just p-values 3. Use appropriate tests: Match test to data type and distribution 4. Correct for multiple comparisons: When conducting many tests 5. Check sample size: Ensure adequate power 6. Visual inspection: Plot data before testing 7. Report confidence intervals: Along with point estimates 8. Consider alternatives: Non-parametric when assumptions violated 9. Robust standard errors: Use when heteroskedasticity/autocorrelation present 10. Document decisions: Note which tests used and why

Common Pitfalls

1. Not checking test assumptions: May invalidate results 2. Multiple testing without correction: Inflated Type I error 3. Using parametric tests on non-normal data: Consider non-parametric 4. Ignoring heteroskedasticity: Use robust SEs 5. Confusing statistical and practical significance: Check effect sizes 6. Not reporting confidence intervals: Only p-values insufficient 7. Using wrong test: Match test to research question 8. Insufficient power: Risk of Type II error (false negatives) 9. p-hacking: Testing many specifications until significant 10. Overinterpreting p-values: Remember limitations of NHST

Time Series Analysis Reference

This document provides comprehensive guidance on time series models in statsmodels, including ARIMA, state space models, VAR, exponential smoothing, and forecasting methods.

Overview

Statsmodels offers extensive time series capabilities:

Univariate models: AR, ARIMA, SARIMAX, Exponential Smoothing
Multivariate models: VAR, VARMAX, Dynamic Factor Models
State space framework: Custom models, Kalman filtering
Diagnostic tools: ACF, PACF, stationarity tests, residual analysis
Forecasting: Point forecasts and prediction intervals

Univariate Time Series Models

AutoReg (AR Model)

Autoregressive model: current value depends on past values.

When to use:

Univariate time series
Past values predict future
Stationary series

Model: yₜ = c + φ₁yₜ₋₁ + φ₂yₜ₋₂ + ... + φₚyₜ₋ₚ + εₜ

from statsmodels.tsa.ar_model import AutoReg
import pandas as pd

# Fit AR(p) model
model = AutoReg(y, lags=5)  # AR(5)
results = model.fit()

print(results.summary())

With exogenous regressors:

# AR with exogenous variables (ARX)
model = AutoReg(y, lags=5, exog=X_exog)
results = model.fit()

Seasonal AR:

# Seasonal lags (e.g., monthly data with yearly seasonality)
model = AutoReg(y, lags=12, seasonal=True)
results = model.fit()

ARIMA (Autoregressive Integrated Moving Average)

Combines AR, differencing (I), and MA components.

When to use:

Non-stationary time series (needs differencing)
Past values and errors predict future
Flexible model for many time series

Model: ARIMA(p,d,q)

p: AR order (lags)
d: differencing order (to achieve stationarity)
q: MA order (lagged forecast errors)

from statsmodels.tsa.arima.model import ARIMA

# Fit ARIMA(p,d,q)
model = ARIMA(y, order=(1, 1, 1))  # ARIMA(1,1,1)
results = model.fit()

print(results.summary())

Choosing p, d, q:

1. Determine d (differencing order):

from statsmodels.tsa.stattools import adfuller

# ADF test for stationarity
def check_stationarity(series):
    result = adfuller(series)
    print(f"ADF Statistic: {result[0]:.4f}")
    print(f"p-value: {result[1]:.4f}")
    if result[1] <= 0.05:
        print("Series is stationary")
        return True
    else:
        print("Series is non-stationary, needs differencing")
        return False

# Test original series
if not check_stationarity(y):
    # Difference once
    y_diff = y.diff().dropna()
    if not check_stationarity(y_diff):
        # Difference again
        y_diff2 = y_diff.diff().dropna()
        check_stationarity(y_diff2)

2. Determine p and q (ACF/PACF):

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt

# After differencing to stationarity
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))

# ACF: helps determine q (MA order)
plot_acf(y_stationary, lags=40, ax=ax1)
ax1.set_title('Autocorrelation Function (ACF)')

# PACF: helps determine p (AR order)
plot_pacf(y_stationary, lags=40, ax=ax2)
ax2.set_title('Partial Autocorrelation Function (PACF)')

plt.tight_layout()
plt.show()

# Rules of thumb:
# - PACF cuts off at lag p → AR(p)
# - ACF cuts off at lag q → MA(q)
# - Both decay → ARMA(p,q)

3. Model selection (AIC/BIC):

# Grid search for best (p,q) given d
import numpy as np

best_aic = np.inf
best_order = None

for p in range(5):
    for q in range(5):
        try:
            model = ARIMA(y, order=(p, d, q))
            results = model.fit()
            if results.aic < best_aic:
                best_aic = results.aic
                best_order = (p, d, q)
        except (ValueError, np.linalg.LinAlgError):
            continue

print(f"Best order: {best_order} with AIC: {best_aic:.2f}")

SARIMAX (Seasonal ARIMA with Exogenous Variables)

Extends ARIMA with seasonality and exogenous regressors.

When to use:

Seasonal patterns (monthly, quarterly data)
External variables influence series
Most flexible univariate model

Model: SARIMAX(p,d,q)(P,D,Q,s)

(p,d,q): Non-seasonal ARIMA
(P,D,Q,s): Seasonal ARIMA with period s

from statsmodels.tsa.statespace.sarimax import SARIMAX

# Seasonal ARIMA for monthly data (s=12)
model = SARIMAX(y,
                order=(1, 1, 1),           # (p,d,q)
                seasonal_order=(1, 1, 1, 12))  # (P,D,Q,s)
results = model.fit()

print(results.summary())

With exogenous variables:

# SARIMAX with external predictors
model = SARIMAX(y,
                exog=X_exog,
                order=(1, 1, 1),
                seasonal_order=(1, 1, 1, 12))
results = model.fit()

Example: Monthly sales with trend and seasonality

# Typical for monthly data: (p,d,q)(P,D,Q,12)
# Start with (1,1,1)(1,1,1,12) or (0,1,1)(0,1,1,12)

model = SARIMAX(monthly_sales,
                order=(0, 1, 1),
                seasonal_order=(0, 1, 1, 12),
                enforce_stationarity=False,
                enforce_invertibility=False)
results = model.fit()

Exponential Smoothing

Weighted averages of past observations with exponentially decreasing weights.

When to use:

Simple, interpretable forecasts
Trend and/or seasonality present
No need for explicit model specification

Types:

Simple Exponential Smoothing: no trend, no seasonality
Holt's method: with trend
Holt-Winters: with trend and seasonality

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Simple exponential smoothing
model = ExponentialSmoothing(y, trend=None, seasonal=None)
results = model.fit()

# Holt's method (with trend)
model = ExponentialSmoothing(y, trend='add', seasonal=None)
results = model.fit()

# Holt-Winters (trend + seasonality)
model = ExponentialSmoothing(y,
                            trend='add',           # 'add' or 'mul'
                            seasonal='add',        # 'add' or 'mul'
                            seasonal_periods=12)   # e.g., 12 for monthly
results = model.fit()

print(results.summary())

Additive vs Multiplicative:

# Additive: constant seasonal variation
# yₜ = Level + Trend + Seasonal + Error

# Multiplicative: proportional seasonal variation
# yₜ = Level × Trend × Seasonal × Error

# Choose based on data:
# - Additive: seasonal variation constant over time
# - Multiplicative: seasonal variation increases with level

Innovations state space (ETS):

from statsmodels.tsa.exponential_smoothing.ets import ETSModel

# More robust, state space formulation
model = ETSModel(y,
                error='add',           # 'add' or 'mul'
                trend='add',           # 'add', 'mul', or None
                seasonal='add',        # 'add', 'mul', or None
                seasonal_periods=12)
results = model.fit()

Multivariate Time Series

VAR (Vector Autoregression)

System of equations where each variable depends on past values of all variables.

When to use:

Multiple interrelated time series
Bidirectional relationships
Granger causality testing

Model: Each variable is AR on all variables:

y₁ₜ = c₁ + φ₁₁y₁ₜ₋₁ + φ₁₂y₂ₜ₋₁ + ... + ε₁ₜ
y₂ₜ = c₂ + φ₂₁y₁ₜ₋₁ + φ₂₂y₂ₜ₋₁ + ... + ε₂ₜ

from statsmodels.tsa.api import VAR
import pandas as pd

# Data should be DataFrame with multiple columns
# Each column is a time series
df_multivariate = pd.DataFrame({'series1': y1, 'series2': y2, 'series3': y3})

# Fit VAR
model = VAR(df_multivariate)

# Select lag order using AIC/BIC
lag_order_results = model.select_order(maxlags=15)
print(lag_order_results.summary())

# Fit with optimal lags
results = model.fit(maxlags=5, ic='aic')
print(results.summary())

Granger causality testing:

# Test if series1 Granger-causes series2
from statsmodels.tsa.stattools import grangercausalitytests

# Requires 2D array [series2, series1]
test_data = df_multivariate[['series2', 'series1']]

# Test up to max_lag
max_lag = 5
results = grangercausalitytests(test_data, max_lag, verbose=True)

# P-values for each lag
for lag in range(1, max_lag + 1):
    p_value = results[lag][0]['ssr_ftest'][1]
    print(f"Lag {lag}: p-value = {p_value:.4f}")

Impulse Response Functions (IRF):

# Trace effect of shock through system
irf = results.irf(10)  # 10 periods ahead

# Plot IRFs
irf.plot(orth=True)  # Orthogonalized (Cholesky decomposition)
plt.show()

# Cumulative effects
irf.plot_cum_effects(orth=True)
plt.show()

Forecast Error Variance Decomposition:

# Contribution of each variable to forecast error variance
fevd = results.fevd(10)  # 10 periods ahead
fevd.plot()
plt.show()

VARMAX (VAR with Moving Average and Exogenous Variables)

Extends VAR with MA component and external regressors.

When to use:

VAR inadequate (MA component needed)
External variables affect system
More flexible multivariate model

from statsmodels.tsa.statespace.varmax import VARMAX

# VARMAX(p, q) with exogenous variables
model = VARMAX(df_multivariate,
               order=(1, 1),        # (p, q)
               exog=X_exog)
results = model.fit()

print(results.summary())

State Space Models

Flexible framework for custom time series models.

When to use:

Custom model specification
Unobserved components
Kalman filtering/smoothing
Missing data

from statsmodels.tsa.statespace.mlemodel import MLEModel

# Extend MLEModel for custom state space models
# Example: Local level model (random walk + noise)

Dynamic Factor Models:

from statsmodels.tsa.statespace.dynamic_factor import DynamicFactor

# Extract common factors from multiple time series
model = DynamicFactor(df_multivariate,
                      k_factors=2,          # Number of factors
                      factor_order=2)       # AR order of factors
results = model.fit()

# Estimated factors
factors = results.factors.filtered

Forecasting

Point Forecasts

# ARIMA forecasting
model = ARIMA(y, order=(1, 1, 1))
results = model.fit()

# Forecast h steps ahead
h = 10
forecast = results.forecast(steps=h)

# With exogenous variables (SARIMAX)
model = SARIMAX(y, exog=X, order=(1, 1, 1))
results = model.fit()

# Need future exogenous values
forecast = results.forecast(steps=h, exog=X_future)

Prediction Intervals

# Get forecast with confidence intervals
forecast_obj = results.get_forecast(steps=h)
forecast_df = forecast_obj.summary_frame()

print(forecast_df)
# Contains: mean, mean_se, mean_ci_lower, mean_ci_upper

# Extract components
forecast_mean = forecast_df['mean']
forecast_ci_lower = forecast_df['mean_ci_lower']
forecast_ci_upper = forecast_df['mean_ci_upper']

# Plot
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(y.index, y, label='Historical')
plt.plot(forecast_df.index, forecast_mean, label='Forecast', color='red')
plt.fill_between(forecast_df.index,
                 forecast_ci_lower,
                 forecast_ci_upper,
                 alpha=0.3, color='red', label='95% CI')
plt.legend()
plt.title('Forecast with Prediction Intervals')
plt.show()

Dynamic vs Static Forecasts

# Static (one-step-ahead, using actual values)
static_forecast = results.get_prediction(start=split_point, end=len(y)-1)

# Dynamic (multi-step, using predicted values)
dynamic_forecast = results.get_prediction(start=split_point,
                                          end=len(y)-1,
                                          dynamic=True)

# Plot comparison
fig, ax = plt.subplots(figsize=(12, 6))
y.plot(ax=ax, label='Actual')
static_forecast.predicted_mean.plot(ax=ax, label='Static forecast')
dynamic_forecast.predicted_mean.plot(ax=ax, label='Dynamic forecast')
ax.legend()
plt.show()

Diagnostic Tests

Stationarity Tests

from statsmodels.tsa.stattools import adfuller, kpss

# Augmented Dickey-Fuller (ADF) test
# H0: unit root (non-stationary)
adf_result = adfuller(y, autolag='AIC')
print(f"ADF Statistic: {adf_result[0]:.4f}")
print(f"p-value: {adf_result[1]:.4f}")
if adf_result[1] <= 0.05:
    print("Reject H0: Series is stationary")
else:
    print("Fail to reject H0: Series is non-stationary")

# KPSS test
# H0: stationary (opposite of ADF)
kpss_result = kpss(y, regression='c', nlags='auto')
print(f"KPSS Statistic: {kpss_result[0]:.4f}")
print(f"p-value: {kpss_result[1]:.4f}")
if kpss_result[1] <= 0.05:
    print("Reject H0: Series is non-stationary")
else:
    print("Fail to reject H0: Series is stationary")

Residual Diagnostics

# Ljung-Box test for autocorrelation in residuals
from statsmodels.stats.diagnostic import acorr_ljungbox

lb_test = acorr_ljungbox(results.resid, lags=10, return_df=True)
print(lb_test)
# P-values > 0.05 indicate no significant autocorrelation (good)

# Plot residual diagnostics
results.plot_diagnostics(figsize=(12, 8))
plt.show()

# Components:
# 1. Standardized residuals over time
# 2. Histogram + KDE of residuals
# 3. Q-Q plot for normality
# 4. Correlogram (ACF of residuals)

Heteroskedasticity Tests

from statsmodels.stats.diagnostic import het_arch

# ARCH test for heteroskedasticity
arch_test = het_arch(results.resid, nlags=10)
print(f"ARCH test statistic: {arch_test[0]:.4f}")
print(f"p-value: {arch_test[1]:.4f}")

# If significant, consider GARCH model

Seasonal Decomposition

from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose into trend, seasonal, residual
decomposition = seasonal_decompose(y,
                                   model='additive',  # or 'multiplicative'
                                   period=12)         # seasonal period

# Plot components
fig = decomposition.plot()
fig.set_size_inches(12, 8)
plt.show()

# Access components
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

# STL decomposition (more robust)
from statsmodels.tsa.seasonal import STL

stl = STL(y, seasonal=13)  # seasonal must be odd
stl_result = stl.fit()

fig = stl_result.plot()
plt.show()

# Multiple seasonalities, e.g., hourly data with daily and weekly patterns
from statsmodels.tsa.seasonal import MSTL

mstl_result = MSTL(hourly_series, periods=(24, 24 * 7)).fit()
fig = mstl_result.plot()
plt.show()

Model Evaluation

In-Sample Metrics

# From results object
print(f"AIC: {results.aic:.2f}")
print(f"BIC: {results.bic:.2f}")
print(f"Log-likelihood: {results.llf:.2f}")

# MSE on training data
from sklearn.metrics import mean_squared_error

mse = mean_squared_error(y, results.fittedvalues)
rmse = np.sqrt(mse)
print(f"RMSE: {rmse:.4f}")

# MAE
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y, results.fittedvalues)
print(f"MAE: {mae:.4f}")

Out-of-Sample Evaluation

# Train-test split for time series (no shuffle!)
train_size = int(0.8 * len(y))
y_train = y[:train_size]
y_test = y[train_size:]

# Fit on training data
model = ARIMA(y_train, order=(1, 1, 1))
results = model.fit()

# Forecast test period
forecast = results.forecast(steps=len(y_test))

# Metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error

rmse = np.sqrt(mean_squared_error(y_test, forecast))
mae = mean_absolute_error(y_test, forecast)
mape = np.mean(np.abs((y_test - forecast) / y_test)) * 100

print(f"Test RMSE: {rmse:.4f}")
print(f"Test MAE: {mae:.4f}")
print(f"Test MAPE: {mape:.2f}%")

Rolling Forecast

# More realistic evaluation: rolling one-step-ahead forecasts
forecasts = []

for t in range(len(y_test)):
    # Refit or update with new observation
    y_current = y[:train_size + t]
    model = ARIMA(y_current, order=(1, 1, 1))
    fit = model.fit()

    # One-step forecast
    fc = fit.forecast(steps=1)[0]
    forecasts.append(fc)

forecasts = np.array(forecasts)

rmse = np.sqrt(mean_squared_error(y_test, forecasts))
print(f"Rolling forecast RMSE: {rmse:.4f}")

Cross-Validation

# Time series cross-validation (expanding window)
from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5)
rmse_scores = []

for train_idx, test_idx in tscv.split(y):
    y_train_cv = y.iloc[train_idx]
    y_test_cv = y.iloc[test_idx]

    model = ARIMA(y_train_cv, order=(1, 1, 1))
    results = model.fit()

    forecast = results.forecast(steps=len(test_idx))
    rmse = np.sqrt(mean_squared_error(y_test_cv, forecast))
    rmse_scores.append(rmse)

print(f"CV RMSE: {np.mean(rmse_scores):.4f} ± {np.std(rmse_scores):.4f}")

Advanced Topics

ARDL (Autoregressive Distributed Lag)

Bridges univariate and multivariate time series.

from statsmodels.tsa.ardl import ARDL

# ARDL(p, q) model
# y depends on its own lags and lags of X
model = ARDL(y, lags=2, exog=X, order=2)
results = model.fit()

Error Correction Models

For cointegrated series.

from statsmodels.tsa.vector_ar.vecm import coint_johansen

# Test for cointegration
johansen_test = coint_johansen(df_multivariate, det_order=0, k_ar_diff=1)

# Fit VECM if cointegrated
from statsmodels.tsa.vector_ar.vecm import VECM

model = VECM(df_multivariate, k_ar_diff=1, coint_rank=1)
results = model.fit()

Regime Switching Models

For structural breaks and regime changes.

from statsmodels.tsa.regime_switching.markov_regression import MarkovRegression

# Markov switching model
model = MarkovRegression(y, k_regimes=2, order=1)
results = model.fit()

# Smoothed probabilities of regimes
regime_probs = results.smoothed_marginal_probabilities

Best Practices

1. Check stationarity: Difference if needed, verify with ADF/KPSS tests 2. Plot data: Always visualize before modeling 3. Identify seasonality: Use appropriate seasonal models (SARIMAX, Holt-Winters) 4. Model selection: Use AIC/BIC and out-of-sample validation 5. Residual diagnostics: Check for autocorrelation, normality, heteroskedasticity 6. Forecast evaluation: Use rolling forecasts and proper time series CV 7. Avoid overfitting: Prefer simpler models, use information criteria 8. Document assumptions: Note any data transformations (log, differencing) 9. Prediction intervals: Always provide uncertainty estimates 10. Refit regularly: Update models as new data arrives

Common Pitfalls

1. Not checking stationarity: Fit ARIMA on non-stationary data 2. Data leakage: Using future data in transformations 3. Wrong seasonal period: S=4 for quarterly, S=12 for monthly 4. Overfitting: Too many parameters relative to data 5. Ignoring residual autocorrelation: Model inadequate 6. Using inappropriate metrics: MAPE fails with zeros or negatives 7. Not handling missing data: Affects model estimation 8. Extrapolating exogenous variables: Need future X values for SARIMAX 9. Confusing static vs dynamic forecasts: Dynamic more realistic for multi-step 10. Not validating forecasts: Always check out-of-sample performance

Related skills

Microsoft FoundryDeploy, evaluate, and continuously improve Microsoft Foundry agents from a single agent interface.478k1.3k

Ai Research ReproductionOrchestrate trustworthy, auditable reproduction of deep learning repositories directly from their READMEs.164k507

Run TrainSafely execute selected deep learning training commands with standardized evidence capture.164k507

Explore RunSafely run isolated exploratory experiments with clear recording and conservative selection before committing changes.164k507

Paper Context ResolverFetch precise reproduction-critical details like dataset splits, preprocessing steps, or evaluation protocols from the original academic paper when the repo README leav141k507

Repo Intake And PlanScan unfamiliar AI research repositories and receive a minimal, trustworthy reproduction target before investing significant time.140k507

How it compares

Choose statsmodels over high-level ML libraries when you need classical inference, residual diagnostics, and publication coefficient tables.

FAQ

Which statsmodels version does the skill target?

statsmodels skill examples target statsmodels 0.14.6 on Python 3.9+, installed via uv pip install statsmodels==0.14.6. Imports use statsmodels.api and statsmodels.formula.api for stable high-level access.

When should you use statsmodels vs statistical-analysis?

statsmodels is for explicit model classes—OLS, GLM, discrete choice, ARIMA—with detailed residuals and inference. Use the separate statistical-analysis skill when you need guided test selection with APA-formatted reporting.

Is Statsmodels safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Data Science & MLresearchautomation