More SUTVA questions 

 Since Jim's post has brought us back to the SUTVA problem, here is another situation to consider.  Let's say that I am interested in the effect of starting order on the performance of athletes in some competition.  For the sake of argument, let's say cycling.  We might conjecture that starting in the first position in a pack of cyclists conveys some advantage, since the leader can stay out of trouble in the back of the pack.  On the other hand, there might be an advantage to starting in a lower position so that the cyclist can take advantage of the draft behind the leaders.   

 It is pretty clear what we would like to estimate in this case.  If X is the starting position from 1 to n, and Y is the length of time that it takes the athlete to complete the race, then the most intuitive quantity for the causal effect of starting first instead of second is E[Y_i|X_i=1] - E[Y_i|X_i=2], etc.  We still have the fundamental problem of causal inference in that we only observe one of the potential outcomes, but average treatment effects also make sense in this case, defining the ATE as  E[Y|X=1] - E[Y|X=2].  Moreover, there is a clear manipulation involved (I can make you start first or I can make you start second) and such a manipulation would be easy to implement using a physical randomization to ensure balance on covariates in expectation.  Indeed, this procedure is used in several sports; one example is the  keiren  race in cycling, which is a paced sprint competition among 6-9 riders.    

 So far, so good, but there is a problem... 


 It is pretty clear that we have a SUTVA violation here.  It is not that if Cyclist A is assigned to start in position 2, then Cyclist B has to be assigned to start in some other position;  SUTVA (as I understand it) doesn't require that it be possible for all subjects to be assigned to all values of the treatment.  The problem is that the potential outcome for Cyclist A starting in position 2 may depend on whether Cyclist B is assigned to position 1 and Cyclist C is assigned to position 3 or vice versa.  What if B is a strong cyclist who likes to lead from the front, enabling A to draft for most of the race, while C is a weak starter who invariably falls to the back of the pack?  In that case, E[Y_A| X_A= 2, X_B = 1,  X_C = 3] will not be equal to E[Y_A| X_B = 3, X_A = 2,  X_C = 1].  In other words, in this case there is interference between units.   So, the non-interference aspect of SUTVA is violated and therefore E[Y|X=1] - E[Y|X=2] isn't a Rubin causal effect. Bummer. 

 On the other hand, if we are able to run this race over and over again with the same cyclists, we are in a sense going to average over all of the assignment vectors.  If we then take the observed data and plot E[Y|X = x], we are going to get a relationship in the data that is purely a function of the manipulation that we carried out.  How should we think about this quantity?  I would think that a reasonably informed lay person would interpret the difference in race times in a causal manner, but what, precisely, are we estimating and how should we talk about it?  I'd love to hear any suggestions, particularly since it relates to a project that I've been working on (and might have more to say about in a few weeks).