
SHAP diagram taken from the home page at github of Uber’s CausalML
In this brief blog post, I want to address a fundamental and glaring error being made by many Causal Inference (CI) practitioners, including Uber, Uplift Marketers, and many economists.
The error I am referring to is to use decision tree methods (e.g., random forests, XGBoost, etc.) to calculate CATE/ATE, without ever drawing a DAG. To make the error even greater, this is followed by using SHAP to allegedly endow the results with explainability.
So why is this an error? Sidestepping and deprecating the use of a DAG, and conditioning on every covariate in sight, as is advocated by Rubin and his economist acolytes (Joshua Angrist, Guido Imbens, Susan Athey, …), is a terrible CI doctrine. It sweeps the problem of good and bad controls under the rug. But hiding and ignoring this problem does not make it go away. Chances are that if you condition on everything in sight, you will condition on a collider and introduce a bias into your ATE calculation. It also makes you condition on many more covariates than is necessary. That is why I consider all CATE/ATE values calculated without a DAG to be highly suspect.
So what is wrong with SHAP? SHAP is a game theory inspired software app that produces graphs that look like red and blue gummy worms on a skewer. These graphs are supposed to add explainability to the results of any machine learning algorithm. For me, SHAP is fake explainabilty. True explainability is given by the DAG.
So who has been doing this?
- Uber’s “CausalML” software which has 3.5K stars at github. All you have to do is skim the CausalML’s home page at github to verify that almighty Uber has been promoting this error for the last 2 years..
- Famous economist Susan Athey et al (see https://arxiv.org/abs/1610.01271 and https://grf-labs.github.io/grf/)
- Uplift Marketers. Besides CausalML, which is being promoted at its github home page as an Uplift Marketing tool, see, for example, UpliftML by booking.com and PyLift by Wayfair.com.
One of the developers (Totte Harinen) of CausalML is claiming that Uplift Marketing is only used to stratify experimental data (RCT, aka A/B test). I agree that if restricted to experimental data, there is no good/bad control problem. However, even then, a DAG is true explainability whereas SHAP is fake explainability.
The problem I have with Totte’s argument is that the CausalML homepage asserts over and over again that you can use CausalML with both experimental (RCTs) and observational data (surveys). Now on Twitter he is admitting that the observational CATE is “garbage”.
Quote from CauisalML homepage at github:
An RCT is a really strong assumption. Saying one thing on the homepage and the opposite on Twitter is not very nice.
Comment by rrtucci — October 13, 2022 @ 5:35 pm
Comment by rrtucci — October 15, 2022 @ 12:12 am
[…] Explainability is still a hot topic in AI – with increasingly complicated models how can you generate understanding around why decisions are made and what the most important factors are. Disconcerting that errors have been found in some of the more well used approaches, including SHAP […]
Pingback by November Newsletter – Royal Statistical Society Data Science Section — November 7, 2022 @ 4:54 pm