Did You Achieve Balance?! Part II 

 Continuing from yesterday's post, another popular way to test balance is to examine standardized differences (SDIFF) between groups (Rubin and Rosenbaum 1985). SDIFF capture the difference in means in the matched samples, scaled by the square root of the average variance in the un-matched groups. This test has been criticized for the lack of formal criteria for judging the size of the standardized bias. Moreover, it may be open to manipulation as one can add observations to the control group in order to decrease variance in the denominator (Smith and Todd 2005).  

 Staying in the realm of univariate balance tests, some claim that difference in means tests are insufficient and that Kolmogorov-Smirnov (KS) tests are needed to non-parametrically test for the equality of distributions (Diamond and Sekhon 2005). These KS tests need to be bootstrapped, by the way, to yield correct coverage in the presence of point masses in the distributions of the covariates (Abadie 2002). Again, these tests would substantially increase the balance hurdle. Are they necessary for reliable causal inference?  


 Apart from univariate tests there are also some multivariate balance tests floating around in the literature such as the Hotelling T^2 test of the joint null of equal means of all covariates, multivariate (bootstrapped) Kolmogorov-Smirnov (KS) and Chi-Square null deviance tests based on the estimated assignment probabilities, as well as various regression-based tests for joint insignificance, etc. Which of these tests is preferable in what situation? What is the relationship between uni- and multivariate balance?  

 Last but not least, there is the thorny question of significance levels. Is a p-value of 0.10, let's say against the null of equality of means, high enough for satisfactory balance? Is .05 permissible? There is evidence that conventional significance standards are too lenient to obtain reliable causal inference in the canonical LaLonde data set (Diamond and Sekhon 2005).  

 These are too many questions to which I do not know the answers. The current lack of a scholarly standard for covariate balance strikes me as troubling, because balance affects the quality of the causal inferences we draw. I think it is important to bring the balance issue to the forefront of the matching debate. That is why Jas Sekhon and I are currently working on a paper on this topic. Suppose you are reviewing a matching article. What does it take to convince you that the authors "achieved balance"? Please feel cordially invited to join the debate.