![[noise-fischer-black.png]] **Causal inference** The abstract is from Black's 1986 article "Noise". Fischer goes on to say: *"Even highly trained people, though, seem to make certain kinds of errors consistently. For example, there is a strong tendency in looking at data to assume that when two events frequently happen together, one causes the other. There is an even stronger tendency to assume that the one that occurs first causes the one that occurs second. These tendencies are easy to resist in the simplest cases. But they seem to creep back in when econometric studies become more complex. Sometimes I wonder if we can draw any conclusions at all from the results of regression studies."* Finding a link between a cause and its effect in the midst of noise is difficult. **All kinds of analytics** With recent advances in the collection, transfer, and analysis of large data sets, computational power, and algorithmic innovations, it is easier than ever to train models and make predictions using those models. Given enough data, and with the help of almost off-the-shelf libraries, one can model data without having to think deeply about the problem. This is more evident with nonparametric methods, where the problem formulation loses its appeal and the meaning and usefulness of method-dependent metrics such as variable importance are easily overstated. But even with parametric methods, one can develop and use models for decision making without checking whether the assumptions are correct. Conceptual models and assumptions go hand in hand. For example, a regression model alone with a statistically significant interaction term of brightness and a weekday indicator may lead to the erroneous conclusion that increasing the brightness of a retail store during the week will increase sales more than on weekends. If we use such a model alone and increase store brightness during the week, we may be making a costly but not necessarily beneficial intervention. Could XGBoost help? Absolutely not. The missing piece is the conceptual model and underlying assumptions, not a better curve fitting method. There are at least two distinct questions here: 1\. What is the relationship between brightness and sales in a retail store on weekdays versus weekends? *This question seeks a correlation and can be answered using the regression. The positive interaction term indicates that brightness performs better on weekdays (i.e., the correlation is higher).* 2\. Why is the effect of brightness on sales greater during the week than on the weekend? *This is a causal problem and the regression model alone cannot answer this question. It could be that shoppers are in a hurry during the week and tend to make more purchases under brighter lighting. It could also be that weekday shoppers visit stores after work and the effect of brightness is greater because it is dark outside. Another reason could be that weekday shoppers are older and need more lighting than weekend shoppers.* Some of this is noise and some of it is a true cause and effect relationship. The regression (or XGBoost) model alone is not capable of eliminating the noise and identifying the true relationship. That is why it cannot answer the second question. However, it is possible to use the same regression or XGBoost model to answer the question if a causal design pattern is applied: a conceptual model and some assumptions along with the necessary data (either existing observational data or experimental data). This book is a curated set of design patterns for causal inference and the application of each pattern using a variety of methods in three approaches: Statistics (very narrowly defined), Machine Learning, and Bayesian. Each design pattern is supported by business cases that use available data for causal inference. Three approaches are compared using the same data and model. > [!NOTE] > **What is a design pattern?** > A (software) design pattern is a general, reusable solution to a common problem in a given (software) design context. It is not a finished design that can be directly transformed. Rather, it is a description or template for solving a problem that can be used in many different situations. Design patterns are formalized best practices that can be used to solve common problems.^[[Software design pattern - Wikipedia](https://en.wikipedia.org/wiki/Software_design_pattern)] > > **And why design patterns?** > Early in my career, I was a programmer using C# and then Java. Our most valuable resources back then were design patterns. I still have a copy of the book _Head First Java Design Patterns: A Brain-Friendly Guide_ on my bookshelf from 20 years ago. It was a lifesaver when I moved from C# to Java. This is a tribute to those days. **What is the goal here, beyond the design pattern idea?** This project is not meant to replace the many great books on causal inference methods. Instead, it strives to accomplish two goals in an interactive format that presents all of the book's content in a network of ideas, concepts, and applications: 1. To present solutions of the same conceptual model for problems requiring causal inference (on the same data, sometimes using the same estimators) using different approaches: Statistics (in a very narrow sense), Machine Learning, and Bayesian. The applications include complete R and Python code, and in some cases, there are comparative evaluations of the two. 2. To discuss the intricacies of modeling data for causal inference and lesser known and sometimes puzzling situations along the way, such as larger coefficients in IV models, negative R-squared values, conflicting standard error corrections, or seemingly unexpected multimodal posterior distributions. These are some of the more interesting cases encountered when applying the patterns. **Got it, where can I start?** [[Table of Contents]] would be a good place to start. \--- *Causal Book is a work in progress. I would appreciate it if you contact me [here](https://www.linkedin.com/in/gtozer/) for anything related to this project (comment, typo, question, bug, anything). G.T. *^[*Gorkem Turgut Ozer © 2025*] > [!info]- Last updated: October 31, 2024