Differences-in-Differences in the Rubin Causal Model 

 At today's Applied Statistics Workshop, Dan Hopkins gave a talk on contextual effects on political views in the United States and United Kingdom. Dan presented evidence that national political discussions increase the salience of local context for opinion formation. Namely, those who live in areas of high immigrant populations tend to react more strongly to changes in the national discussion of immigration than others. The data and analysis are interesting, but the talk's derailment interested me slightly more. 

 The derailment involved Dan's choice of method, a version of difference-in-difference (DID) estimator and how to represent it in the Rubin Causal Model. Putting this model in terms of the usual counterfactual framework is slightly nuanced, but not impossible.  


 The typical setup for a DID estimator is that there are two groups G = {0,1} and two time periods T={0,1}. Between time 0 and time 1, some policy is applied to group 1 and not applied to group 0. What we are interested in is the effect of that policy. For instance, if Y is the outcome in time 1 and Y(1) is the potential outcome (in time 1) in the counterfactual world where we forced the policy to be implemented, then we can define a possible quantity of interest: the average treatment effect on the treated (ATT): E[Y(1) - Y(0) | G = 1].  

 We could proceed from here by simply making an ignorability assumption about the treatment assignment. Unfortunately, policies are often not randomly assigned to the groups and the groups may differ in ways that affect the outcome. For instance, an example from the Wooldrige textbook is the effect of the placement of trash processing facility on house prices. The two groups in this case are "houses close to the facility" and "houses far from the facility" and the policy is the facility's placement. It would be borderline insane to imagine city planners randomly assigning the location of the facility and these two groups will differ in ways that are very related to house prices (I don't think I have seen too many newly minted trash dumps in rich neighboorhoods). Thus, we cannot simply use the observed data from the control group to make the counterfactual inference.  

 What we can do, however, is look at how changes in the dependent variable occur for the two groups and use these changes to identify the model. For instance, if we assume that X is the outcome in period 0, then the DID identifying assumption is  

 E[Y(0) - X(0) | G = 1] = E[Y(0) - X(0) | G = 0], 

 which is simply saying that the change in potential outcomes under control is the same for both groups. Or, that group 1 would have followed the same "path" as group 0 if they had not received treatment. With this assumption in hand, we can identify the ATT as the typical DID estimator 

 E[Y(1) - Y(0) | G =1] = (E[Y|G=1] - E[X|G=1]) - (E[Y|G=0] - E[X|G=0]). 

 The proof is short and can be found in  Abadie (2005)  and  Athey & Imbens (2006)  also show (these papers also go into considerable depth on how to simple schemes).  

 Two issues always arise for me when I see DID estimators. First is the incredibly difficult task of arguing that the policy is the only thing that changed between time 0 and time 1 with respect to the two groups. That is, perhaps the city also placed a freeway through the part of town where the trash processing facility was built at the same time. The DID estimator would not be able to differentiate effects. Thus, it is up to the practitioner to argue that all other changes in the period are orthogonal to the two groups. Second, I have very little insight about how identification or estimands change as we move from a simple non-parametric world to a highly parametric world (where most applied researchers live). If and how do inferences change when we move away from simple conditional expectations?