Coarsened at Random 

 I’m the “teaching fellow” (the “teaching assistant” everywhere but Harvard, which has to have its lovely little quirks:  “Spring” semester beginning in February, anyone?) for a course in missing data this semester, and in a recent lecture, an interesting concept came up:  coarsened at random. 

 Suppose you have a dataset in which you know or suspect that some of your data values are rounded.  For example, ages of youngsters might be given to the nearest year or half-year.  Or perhaps in a survey, you’ve gotten some respondents’ incomes only within certain ranges.  Then the data has been “coarsened” in the sense that you know that the true value is within a certain range, but you don’t know where within that range. 


 Happily, techniques have been developed to handle this sort of situation.  In many ways, the game is the same as that in the missing data setting.  Just as in the missing data context good things happen when the data are missing at random, so also in this context good things happened when the data are coarsened at random.  Thus, to begin with, you have to consider (among other things) whether you think the probability that you will observe only a range of possible data values, as opposed to the specific true value, depends on something you don’t observe (such as that specific true value).  A good place to start on all this is Heitjan & Rubin, “Inference from Coarse Data via Multiple Imputation with Application to Age Heaping,” 85 JASA 410 (1990). 

 One final point:  you might think that coarsened at random is a specific case of missing at random.  Actually, it’s the other way around.  Data can be (and often is assumed to be) coarsened at random but not missing at random.  Think and you’ll see why.