Beyond Standard Errors, Part II: What Makes an Inference Prone to Survive Rosenbaum-Type Sensitivity Tests? 

 Continuing from my  previous post  on this subject, sensitivity tests are still somewhat rarely (yet increasingly) used in applied research. This is unfortunate, I think, because, at least according to my own tests on several datasets, observational studies do vary considerably in their sensitivity to hidden bias. Some results go away once you allow for only a tiny amount of hidden bias, others are rock solid weathering very strongest hidden bias. One should always give the reader this information I think.  


 One (and maybe not the most important) reason for why these tests are infrequently used is that they take time and effort to compute. So I was thinking, instead of computing the sensitivity tests each time, maybe it would be good to have some quick rules of thumbs to judge whether a study is insensitive to hidden bias.  

 Imagine you have two studies with identical estimated effect size and standard errors. Now, which one would you trust more regarding their insensitivity to hidden bias? In other words, are there particular features of the data, which makes an inference drawn from this data to excel on Rosenbaum type sensitivity tests? The literature I have read thus far provides little guidance on this issue.  

 We have a few ideas about this (which are still underdeveloped). For example, ceteris paribus, one could think that it’s better to have a rather imbalanced vector of treatment assignments (like only a few treated or only a few control). Another idea is that, ceteris paribus, inferences obtained from a smaller (matched) dataset should be less prone to get knocked over by hidden bias tests. Or, in the case of propensity score methods, one would like covariates that strongly predict treatment assignment so that an omitted variable cannot tweak the results much. 

 This is very much still work in progress; comments and feedback are highly appreciated.