Saturday, September 20, 2025

Book Review: Statistics every Programmer Needs

I recently read Statistics every Programmer Needs by Gary Sutton. I am probably a good target audience for the book since I used to be a software developer that transitioned into data science some 10 years ago, then into machine learning with neural networks and transformers, and more recently, to Generative AI with Large Language Models. During this time, I have read numerous books on statistics in an effort to pick up what I didn’t know (being largely self-taught, there is plenty I didn't and still don't know). I think this book stands out not only as a thorough and practical introduction to statistics, but also provides coverage to areas one would normally consider peripheral to statistics but still useful in practical data science scenarios, such as Linear Programming, PERT/CPM, etc.

The book takes a very hands-on approach to each area, starting with business problems often faced by programmers, and outlines how statistical techniques (pertinent to that area) can be used to address these problems. It starts with foundational concepts but goes on to cover advanced concepts across statistics, machine learning, optimization, and project management. The book is organized into the following 14 chapters.

The Foundation (Chapter 1) begins by laying a solid groundwork. Readers are introduced to core statistical concepts, both descriptive (mean, mode, median) and inferential (confidence intervals, p-values), ensuring they grasp the basics before progressing. The inclusion of regression, optimization, simulation, and machine learning in the foundational chapter sets the tone for the book’s broad scope.

Probability and Counting Principles (Chapter 2) covers continuous and discrete variables and how they differ, permutations, combinations, and key probability functions (PDF, PMF, CDF). What was interesting for me is how permutations and combinations are described using basic probability concepts.

Probability Distributions (Chapter 3) covers the essential probability distributions—Gaussian, Binomial, Uniform, and Poisson. This chapter also covers conditional probability and Bayes’ rule with various practical applications.

Chapters 4 & 5 cover Linear and Logistic Regression respectively, Bonus material here (which I didn’t expect to see at least) were discussions around data normalization, residual analysis and multi-collinearity. Model evaluation is covered in depth for both varieties of models, as well as discussion of popular metrics used to evaluate these models.

Chapter 6 covers Decision Trees and Random Forests, the next major category of traditional ML models. The book has a solid introduction to decision trees and random forests, including how to interpret feature importance and use GINI impurity measures. I had hoped for some coverage of Gradient Boosted Trees since we were already discussing trees, but maybe that will come in the next edition.

Time Series Analysis (Chapter 7) is tackled with impressive depth, usually I would expect this subject to have its own book. However, the author does a good job of providing a good useful introduction to Time Series – covering forecasting, ARIMA models, exponential smoothing, stationarity testing (including the Augmented Dickey-Fuller test), trends, and seasonality. The chapter’s coverage of ACF/PACF plots and different exponential smoothing models (SES, DES, Holt-Winters) is thorough, making it a valuable reference for people working with temporal data and autoregressive models.

Chapter 8 covers Optimization using Linear Programming, an area I would expect to see covered in a book on Operations Research rather than Statistics. But the coverage is practical and complete, focusing on modeling business problems as optimization problems and solving them using Linear Programming libraries provided by scipy.optimize.

Chapter 9 covers Simulation using Monte Carlo techniques. As before, not something I would have expected in a Statistics book, but definitely a useful tool to have in one’s Data Science toolbox. As with the other chapters, multiple business scenarios are described and modeled with probability distributions, and Monte Carlo simulations performed on them to elicit useful insights.

Decision Methods and Markov Analysis (Chapters 10 & 11) cover Decision-making frameworks (maximax, maximin, minimax regret, expected value decision trees) and Markov analysis (transition probabilities, equilibrium, and absorbing states). Taken together, they could serve as a gateway for deeper explorations into Bayesian Networks and other Probabilistic Graphical Models.

The chapter on Benford’s Law (Chapter 12) for fraud detection is another unique touch, introducing readers to mantissa statistics. So is the chapter on Project Management (Chapter 13), which presents quantitative methods in project management (WBS, PERT, CPM, critical path)with actionable insights, bridging the gap between theory and project execution.

The concluding chapter on Statistical Quality Control (Chapter 14) is packed with practical content—control charts (p, np, c, g, etc.), UCL/LCL, and key metrics—making it invaluable for readers in manufacturing, operations, or quality assurance roles.

I thought that the book is ambitious in scope but succeeds in providing both breadth and depth, managing to hit all the high points without impacting the quality of each. As I mentioned earlier, its coverage goes beyond just statistics, making it a bargain since you get to learn useful statistics and quantitative techniques from a single book. I found both areas to be described in a very hands-on, example driven manner,often highlighting concepts and metrics that are overlooked in more traditional texts, thus making it a useful reference for software professionals (DS and non-DS alike).

Be the first to comment. Comments are moderated to prevent spam.