Discovering Causal Knowledge? 

 How do we learn about causal relationships when we can't run experiments? In my own work, the answer has been to look around for "natural experiments" in which something important varies for roughly random reasons: for example, the winners of close elections are selected almost at random, which allows you to draw conclusions about the effect of being elected on various outcomes (like the  winner's wealth ).  

 I recently read  a paper  by  David Jensen  and coauthors from the  UMass Knowledge Discovery Laboratory  that proposes a systematic way of uncovering causal relationships from databases. Their approach (which they call AIQ -- "Automated Identification of Quasi-experiments") is not to mine the joint density of variables for independencies that can produce a causal graph (as discussed in  Jamie Robins' talk last March ), but rather to produce a list of feasible quasi-experiments based on a standard database schema that has been augmented with some causal information (e.g. A might cause B, C does not cause A or B) and some temporal information (i.e. ordering and frequency of events). In the paper, the authors provide an overview of the approach as applied to three commonly-used databases, including some candidate quasi-experiments that the algorithm suggests.  

 My impression after reading the paper was that AIQ's discovery potential is pretty limited (at least at this stage), because most users who could provide the inputs AIQ needs could very likely think up the quasi-experimental design themselves. Any valid quasi-experiment design that AIQ can discover at this point appears to come from the user specifying that the treatment and outcome have no common cause or confounding factors, which is a very unusual situation that is either quite obvious (e.g. because there is a lottery or other explicit randomization) or requires significant substantive knowledge. I wonder how commonly a researcher would a) have in mind a causal model that is sufficiently restrictive to produce plausible quasi-experimental designs through AIQ, and b) not have already thought of those designs.  

 The example of causal discovery the authors provide comes from a combined IMDB/Netflix movie database; they assert that winning an Oscar improves the reviews a movie receives on Netflix. In order for AIQ to suggest this quasi-experiment, the authors had to specify in advance that the Oscar-winning film is chosen from among nominees at random. One can of course criticize that assumption, but the point is that once you make that assumption it should be quite obvious that you have a quasi-experiment with which to study the effect of winning the Oscar on various outcomes; any film-specific, post-awards ceremony outcome should do. AIQ may provide a structured way to go through that exercise, but I'm not convinced there are many circumstances in which it would be useful to a researcher.