Ecological Inference in the Law, Part II 

 In a  previous post , I introduced a definition of the ecological inference problem as applied to the legal difficulty of drawing inferences about racial voting patterns from precinct-level data on candidate support and racial makeup of the voting-age-population.  As I mentioned as a previous post, very few lawyers and judges have ever contributed to the expansive literature on this question, despite the fact that ecological inference models are often used in high-profile courtroom cases. 

 Here's an initial contribution from the courtroom:  forget about two by two tables. 

 The overwhelming majority of publications on the ecological inference problem concern methods for sets of two by two contingency tables.  In the Voting Rights Act context, a two by two table problem might correspond to a jurisdiction in which almost every potential voter is African-American or Caucasian, and all we care about is who votes, not who the voters supported.  In that case, the rows of each table are black and white, while the columns are vote and no-vote.  For each precinct, we need only predict one internal cell count, and the others are determined. 

 This two by two case is of almost no interest in the law.  The reason is that in jurisdictions in this country, the voters have three options in any electoral contest of interest:  Democrat, Republican, and not voting.  That means we have a minimum of three columns.  In most jurisdictions of interest these days, we also have more than two rows.  Hispanics constitute an increasingly important set of voters in the United States, and their voting patterns are rarely similar enough to those of African-Americans or Caucasians to allow an expert witness to combine Hispanics with one of these other groups. 

 Thus far, scant research exists into the R x C problem.  Before a few years ago, one had two options:  (i) run a set of C-1 linear models, a solution that often led to logically inconsistent predictions (such as 115 percent of Hispanic voters supported the Democrat), or (ii) pick a two by two model that includes information from the precinct-level bounds, and also available statistical information, and apply it in some way to the problem set of R x C tables at hand, perhaps by collapsing cell counts down to a two by two shape, perhaps by applying the two by two method repeatedly to draw inferences about the R x C problem at hand.  Neither approach is very appealing. 

 A few years ago,  Rosen et al.  proposed a variant of a Dirichlet-Multinomial model, a serious improvement in this area.  This model was and is a large step forward in the analysis of R x C ecological inference tables.  Nevertheless, there is always room for improvement.  The model does not respect the bounds deterministically, and it does not allow a great deal of flexibility in modeling intra-row and inter-row correlations.  On the latter point, an example may clarify:  Suppose we are analyzing a primary in which four candidates are running, two African-American and two Causacian.  Would we expect, among (say) black voters, for the vote counts or fractions (by precinct) for the two African-American candidates to be positvely correlated? 

 I look forward to contributing to this research soon.