Regression with R and Python: Description, Prediction, and Causal Inference

Published in Springer, 2026

This forthcoming textbook introduces regression analysis through three connected goals: description, prediction, and causal inference. It combines statistical foundations with practical implementation in both R and Python, and is designed for readers who want to connect regression theory with modern empirical practice.

The book is under a publishing agreement with Springer and is expected to be published soon.

Companion Code and Resources

The book has a companion GitHub repository with chapter workbooks and reproducible code examples:

QR code for the Regression with R and Python GitHub repository

GitHub repository: Regression_with_R_and_Python

The repository includes chapter-level RMarkdown/HTML workbooks, runnable R examples, Python appendix code, and dependency notes for readers who want to reproduce examples or use the book in teaching.

Book Structure

The book is organized around the progression from descriptive regression to predictive modeling and causal inference:

  1. Introduction: description, prediction, causality, and how these goals differ in empirical work.
  2. Covariation in data: Pearson correlation, Spearman rank correlation, and simple linear regression.
  3. Basic probability theory and statistical inference: probability, random variables, estimators, hypothesis testing, and confidence intervals.
  4. Correlation and inference about a population: sampling, population inference, and dependence beyond simple correlation.
  5. The simple linear regression model: coefficient interpretation, uncertainty, finite-sample inference, and asymptotic inference.
  6. Multiple linear regression: the general model, regression anatomy, partial regression plots, R-squared, F-tests, conditional expectation uncertainty, and residual analysis.
  7. Nonlinear functional form: qualitative regressors, collinearity, logarithms, interactions, and an applied example on weather and air pollution.
  8. Regression analysis with dependent error terms: clustered data, panel data, fixed effects, and presentation of regression results.
  9. Binary dependent variables: logistic regression, maximum likelihood estimation, inference, and comparison with the linear probability model.
  10. Prediction: overfitting, train-test splits, cross-validation, high-dimensional predictors, tree-based regression, and a housing-price application.
  11. Nonparametric regression methods: regressograms and regression discontinuity as a nonparametric application.
  12. Time series analysis: components of macroeconomic time series, stationarity, and autoregressive regression.
  13. Causal analyses: randomized experiments, quasi-experiments, Rubin’s causal model, difference-in-differences, instrumental variables, and regression discontinuity.
  14. Key concepts: a synthesis of the central empirical and statistical ideas used throughout the book.

Appendices

The appendices provide supporting material for computation, mathematics, and econometric foundations:

  • Appendix A: The R Programming Language covers R fundamentals, probability distributions, tidyverse tools, and API-based data access.
  • Appendix B: The Python Programming Language covers Python fundamentals, probability distributions, the Python data ecosystem, and API-based data access.
  • Appendix C: Mathematical appendix covers matrix algebra, probability theory, large-sample theory, and optimization basics.
  • Appendix D: Ordinary Least Squares (OLS) develops OLS algebra, inference with experimental data, stochastic covariates, and dependent error terms.
  • Appendix E: Generalized Least Squares (GLS) covers GLS for cross-sectional data, panel and clustered data, and weighting under stratified probability sampling.
  • Appendix F: Instrumental variables and 2SLS covers endogeneity, instruments, first-stage relevance, reduced forms, two-stage least squares, inference, and weak instruments.

Full book materials are not hosted on this website due to copyright restrictions.

Recommended citation: Johansson, P., & Sun, J. (forthcoming 2026). "Regression with R and Python: Description, Prediction, and Causal Inference." Springer.