Don't Use Hypothesis Tests for Balance 

  Jens' last two blog posts  constitute an excellent statement of where the literature on matching is, but I think almost all of the literature has this point wrong.  Hypothesis tests for checking balance in matching are in fact (1) unhelpful at best and (2) usually harmful.   

 Suppose you had a control group and a treatment group that are identical (exactly matched) except for one person, or except for a bunch of people in one very minor way.  Suppose hypothesis tests indicate no difference between the groups, and so you'd be in the situation of reporting balance was great and no further adjustment was needed.  (We might think of this as a real experiment where the outcome variable hasn't been collected but is expensive to do so.)  If you were given the chance of dropping the one or few people that caused the two groups to differ and replacing them with others that exactly matched, would you do so?  Since the dimension on which the inexact match or matches occurred might be the one that has a huge effect on your outcome variable, the bias due to not switching could be huge.  So you'd undoubtedly make the switch, despite the fact that the hypothesis test indicated that there was no problem.  Hence (1) the tests are unhelpful: passing the test does not necessarily protect one from bias more than failing the test.

 Now suppose you have data that don't match very well by all hypothesis tests and you randomly (rather than systematically to improve matching) drop observations, in a bad application of matching.   what will happen?  Your t-tests or ks-tests or any other hypothesis tests will lose power and so will indicate that balance is getting better and better.  Yet, bias is not changing at all, and efficency is dropping fast.  The tests are telling you to discard data!  Hence (2) hypothesis tests to evaluate balance are harmful, quite seriously so.

 The fact is that there is no superpopulation to which we need to infer features of the explanatory variables; all analysis models we regularly use after matching are conditional on X.  Balance should be assessed on the observed data, and not be the subject of inference or hypothesis tests.

 This message rehearses an argument in a to-be-revised version of our   matching paper by Ho, Imai, King, and Stuart  that we hope to be finished with and post in a couple of weeks.