A Handy Trick for Multiple Imputation of Categorical Data 

 As an applied researcher, I've often come across missing data problems where my data are categorical. This can raise issues because most standard multiple imputation packages assume the multivariate normal (MVN) distribution, which may not hold for certain types of categorical and binary data.   

 The standard shortcut for overcoming this problem is to just impute under the MVN assumption, then use rounding to finish out the imputation.  But as Yucel Recai, Yulei He, and Alan Zaslavsky point out in their May 2008 article in The American Statistician, naive rounding can bias estimates, particularly when the underlying data are asymmetric or multimodal.   

 So what should the applied researcher do when multiply imputing categorical data?  The authors propose a method of calibration whereby one duplicates the original data but sets the observed values for the variable of interest to missing in the duplicated data.  The original data and the duplicated data are then stacked and imputation is carried out on the stacked dataset.  By comparing the fraction of 1's among the originally observed (but imputed) observations in the duplicated data (Y_obs(dup)) with the fraction of 1's in the original observed data (Y_obs), one can find the appropriate cutoff (c) and assign 0's and 1's using that.    

 This is a neat technique which benefits from the fact that it's very easy to implement in practice.  In any case, check out the entire paper for more details on the method.