Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
« October 2005 | Main | December 2005 »
30 November 2005
You, Jong-Sung
The problem of “missing women� in many developing countries reflects not just the gender inequality but serious violation of human rights, as Amartya Sen reported in his book Development as Freedom (1999). It refers to the phenomenon of excess mortality and artificially lower survival rates of women. Particularly disturbing is the practice of sex-selective abortion, which has become quite widespread in China and South Korea.
Statistical analysis, in particular examination of anomalies in a distribution of interest, can give compelling evidence of crime or corruption. If nine out of ten babies delivered at a hospital are boys, we must have a strong suspicion that the doctor(s) in the hospital conduct(s) sex-selective abortion. It may not be evidence sufficient for a conviction, but it probably is sufficient grounds for investigation. Then, what will be a good guide to decision for investigation? Applying a threshold of a certain percentage will not be a good idea, because the probability of 6 or more boys out of 10 babies is much larger than the probability of 600 or more boys out of 1000 babies. So, an appropriate guide may be the use of binomial probability distribution.
Suppose the probability of producing boy or girl is exactly 50 percent. Then, the probability of producing six or more boys out of ten babies 37.7 percent, while the probability of producing 60 or more boys out of 100 babies is only 2.8 percent in the absence of some explanatory factor (probably sex-selective abortion). The probability of producing 55 or more boys out of 100 babies is 18.4 percent, but the probability of producing 550 or more boys out of 1000 babies is only 0.09 percent in the absence of some explanatory factor (again, probably sex-selective abortion). If the police decide to investigate the hospitals with more than a certain percentage of boy-birth rate, say 60 percent, then many honest small hospitals will get investigation, while large hospitals that really engage in sex-selective abortion may avoid the investigation.
Posted by Jong-sung You at 5:59 AM
29 November 2005
Jens Hainmueller
Stimulated by the lectures in Statistics 214 (Causal Inference in the Biomedical and Social Sciences), Holger Kern and I have been thinking about Rosenbaum-type tests for sensitivity to hidden bias. Hidden bias is pervasive in observational settings and these sensitivity tests are a tool to deal with it. When done with your inference, it seems constructive to replace the usual qualitative statement that hidden bias “may be a problem� with a precise quantitative statement like “in order to account for my estimated effect, a hidden bias has to be of magnitude X.� No?
Imagine you are (once again) estimating the causal effect of smoking on cancer and you have successfully adjusted for differences in observed covariates. Then you estimate the “causal� effect of smoking and you’re done. But wait a minute. Maybe subjects who appear similar in terms of their observed covariates actually differ in terms of important unmeasured covariates. Maybe there exists a smoking gene that causes cancer and makes people smoke. Did you achieve balance on the smoking gene? You have no clue. Are your results sensitive to this hidden bias? How big must the hidden bias be to account for your findings? Again, you have no clue (and so neither does the reader of your article).
Enter stage Rosenbaum type sensitivity tests. These come in different forms but the basic idea is similar in all of them. We have a measure, call it (for lack of latex in the blog) R, which gives the degree to which your particular smoking study may deviate from a study that’s free of hidden bias. You assume that two subjects with the same X may nonetheless differ in terms of some unobserved covariates, so that one subject has an odds of receiving the treatment that is up to Gamma ≥ 1 times greater than the odds for another subject.. So, for example, Gamma=1 would mean your study is indeed free of hidden bias (like a big randomized experiment), and Gamma=4 means that two subjects who are similar on their observed X can differ on unobservables such that one could be four times as likely as the other to receive treatment.
The key idea of the sensitivity test is to specify different values of Gamma and check if the inferences change. If your results break down at Gamma values just above 1 already, this is bad new. We probably should not trust your findings, because the difference in outcome data you found may not be caused by your treatment but may instead be due to an unobserved covariate that you did not adjust for. But if the inferences hold at big values of Gamma, let’s say 7, then your results seems very robust to hidden bias. (That’s what happened in the smoking on cancer case btw). Sensitivity tests allow you to shift the burden of proof back to the critics who laments about hidden bias: Please, Mr. Knows it all, go and find me this “magic omitted variable� which is so extremely imbalanced but strongly related to treatment assignment that it is driving my results.
More on this subject in a subsequent post.
Posted by Jens Hainmueller at 4:24 AM
28 November 2005
Drew Thomas
An MIT tag team of Prof. Josh Tenenbaum and his graduate student Charles Kemp presented their research to the IQSS Research Workshop on Wednesday, October 19. The overlaying topic of Prof. Tenenbaum's research is machine learning; one major aspect of this is their method of categorizing the structure of the field to be learned.
For example, it has made sense for hundreds of years that forms of life could be taxonomically identified according to a tree structure so as to compare the closeness of two species, and it also makes some sense to rank them on an ordered scale by some other characteristic (one example presented was how jaw strength could be used to generalize to total strength.) The presenters then showed how Bayesian inference could be used to determine what organizational structures are best suited to which systems, based on a set of covariates corresponding to certain observable features, which could then be used to make other comparisons that might not be as evident, such as immune system behaviour.
What confused me for much of the time was their insistence that they could use the data to decide on a prior distribution, an idea that set some alarms off. I have been under the strongest of directives from professors to keep the prior distribution limited to prior knowledge. My current understanding is that the following method is used:
1. Choose a family to examine, such as the tree, ring or clique structure (all of which, notably, can be learned by kindergarteners rather quickly.)
2. Conduct an analysis where the prior distribution is an equal likelihood of structure corresponding to all possible formations of this type.
3. Repeat this with the other relevant families. Those analyses with the most favorable results would then correspond to the most likely structure.
4. Conduct further research on the system with the knowledge that one structure family is superior for this description.
While I'm not as comfortable with their use of a data-driven prior distribution as they'd like, it seems that the authors are sensitive enough to this concern to keep actual structures separate, and using the data only to confirm their heuristic interpretations of the structures at hand, which sets me more at ease.
Now, the key to this research is that this is a model for human learning - and wouldn't you know it, we're better at it than computers. But I'm still very encouraged at the direction in which this is heading, and am looking forward to later reports from the Tenenbaum group.
Posted by Andrew C. Thomas at 2:23 AM
23 November 2005
Eric Werker (guest author)
I enjoyed the chance to present a work in progress that attempts to measure the impact of AIDS on the economies and populations in Africa at the Applied Statistics Workshop on Wednesday, November 9. Given the possibility for some omitted variable to influence both the national AIDS rate and economic performance or some other outcome variable, I chose to pursue an instrumental variable strategy using variations in the male circumcision rate (which the bulk of the medical literature on this subject believes to have a causal impact on the spread of HIV/AIDS). Comments from the audience were useful and illuminating, and the debate was most interesting around potential violations of the exclusion restriction as well as the use of 2SLS in a small sample setting.
(Blogger's note: For more on this talk, see here and here.)
Posted by James Greiner at 5:47 AM
This recent article in the
Posted by Mike Kellermann at 5:00 AM
22 November 2005
John Friedman
In my previous two posts here and here, I discussed some of the game-theoretic reasons why lawyers' choice of experts in cases might only add noise to the process. In this post, I will draw on my own experience on a jury, evaluating expert witnesses, to speak to further pitfalls in our system.
First, some background on my case: I was on a jury for a medical malpractice trial, essentially deciding whether a tumor, which later killed the patient, should have been spotted on an earlier X-ray. The "standard of care" to which we were to hold the doctors in question was a completely relative metric: Did the doctors provide the level of care "expected" from the "ordinary" practicing radiologist. Predictably, radiologists testified for both the plaintiff and the defense, each claiming that it was obvious that the defendants violated/met the relevant standard of care.
My position, as might be expected given my earlier posts, was that these two experts, on net, provided very little information on the culpability of the defendants. For all I knew, 99% of qualified doctors could have believed these doctors were negligent, or not negligent - how would I ever know? Since my prior was uninformative in this case, I had no choice but to find for the defendants for lack of evidence in either direction.
My fellow jurors, however, had far stronger opinions. Many tended to believe or disbelieve an expert witness for irrelevant reasons. For instance, the physical attractiveness, speech pattern, and general "likeability" played a great role. Furthermore, the experts usually made or lost ground on their ability to explain the basics of the science underlying the issue at hand - the mechanics of an X-ray, for instance - to the jury. Of course, these basics were not in dispute by any party in the case. And, as any student at Harvard University knows, a witness's ability to clearly and succinctly explain the basics need not be related at all to her expertise in the field! That these facts influence juries should be of no surprise to anyone familiar with trials; the existence of an entire industry of "jury consultants," the legal equivalent of marketing professionals, should be evidence enough that these issues of presentation matter a great deal.
Finally, even after the experts presented their cases, the priors of some jurors seemed to greatly affect their opinions of the case. Though jurors are screened for such biases, the test cannot be perfect. I often found jurors relating personal experiences with radiologists as evidence for one side or another. Given my arguments above about the lack of information from experts, perhaps it is not surprising that priors mattered as they did, but this seemed to further add noise into the process.
In the end, I supported my jury's decision in this case. But I could not help feeling that it was simply by random chance, by a peculiar confluence of misinterpretation and biases, that we had reached the right decision.
Posted by James Greiner at 4:03 AM
21 November 2005
Amy Perfors
I'm fascinated by the ongoing evolution controversy in America. Part of this is because as a scientist I realize how important it is to defend rational, scientific thinking -- meaning reliance on evidence, reasoning based on logic rather than emotion, and creating falsifiable hypotheses. I also recognize how deeply important it is that our students are not crippled educationally by not being taught how to think this way.
But from the cognitive science perspective, it's also interesting to try to understand why evolution is so unbelievable and creationism so logical and reasonable to many fairly intelligent laypeople. (I doubt it's just ignorance or mendacity!) What cognitive heuristics and ways of thinking cause this widespread misunderstanding?
There are probably a number of things. Two I'm not going to talk about include emotional reasons for wanting not to believe in evolution as well as the tendency for people who don't know much about either sides of an issue to think the fair thing to do is "split the middle" and "teach both sides." The thing I do want to talk about today-- the one that's relevant to a statistical social science blog -- concerns people's notions of simplicity and complexity. My hypothesis is that laypeople and scientists probably apply Occam's Razor to the question of evolution in very different ways, which is part of what leads to such divergent views.
[Caveat: this is speculation; I don't study this myself. Second caveat: I am neither saying that it's scientifically okay to believe in creationism, nor that people who do are stupid; this post is about explaining, not justifying, the cognitive heuristics we use that make evolution so difficult to intuitively grasp].
Anyway...
Occam's Razor is a reasoning heuristic that says, roughly, that if two hypotheses both explain the data fairly well, the simpler is likely to be better. Simpler hypotheses, generally formalized as those with fewer free parameters, don't "overfit" the data too much and thus generalize to new data better. Simpler models are also better because they make a strong predictions. Such models are therefore falsifiable (one can easily find something they don't predict, and see if it is true) and, in probabilistic terms, put a lot of the "probability mass" or "likelihood" on a few specific phenomena. Thus, when such a specific phenomenon does occur, simpler models explain it better than a more complex theory, which spread the probability mass over more possibilities. In other words, a model with many free parameters -- a complicated one -- will be compatible with many different types of data if you just tweak the parameters. This is bad because it then doesn't "explain" much of anything, since anything is consistent with it.
When it comes to evolution and creationism, I think that scientists and laypeople often make exactly the opposite judgments about which hypothesis is simple and which is complex; therefore their invokation of Occam's Razor results in opposite conclusions. For the scientist, the "God" hypothesis (um, I mean, "Intelligent Designer") is almost the prototypical example of a hypothesis that is so complex it's worthless scientifically. You can literally explain anything by invoking God (and if you can't, you just say "God works in mysterious ways" and feel like you've explained it), and thus God scientifically explains nothing. [I feel constrained to point out that God is perfectly fine in a religious or spiritual context where you're not seeking to explain the world scientifically!] This is why ID is not approved by scientists; not because it's wrong, but because it's not falsifiable -- the hypothesis of an Intelligent Designer is consistent with any data whatsoever, and thus as theories go ... well, it isn't one, really.
But if you look at "simplicity" in terms of something like number of free parameters, you can see why a naive view would favor ID over evolution. On a superficial inspection, the ID hypothesis seems like it really has only one free parameter (God/ID exists, or not); this is the essence of a simple hypothesis. By contrast, evolution is complicated - though the basic idea of natural selection is fairly straightforward, even that is more complicated than a binary choice, and there are many interesting and complicated phenomena arising in the application of basic evolutionary theory (simpatric vs. allopatric speciation, the role of migration and bottlenecks, asexual vs sexual reproduction, different mating styles, recessive genes, junk DNA, environmental and hormonal affects on genes, accumulated effects over time, group selection, canalization, etc). The layperson either vaguely knows about all of this or else tries to imagine how you could get something as complicated as a human out of "random accidents" and concludes that you could only do so if the world was just one specific way (i.e. if you set many free parameters just exactly one way). Thus they conclude that it's therefore an exceedingly complex hypothesis, and by Occam's Razor one should favor the "simpler" ID hypothesis. And then when they hear scientists not only believe this apparently unbelievable thing, but refuse to consider ID as a scientific alternative, they logically conclude that it's all just competing dogma and you might as well teach both.
This is a logical train of reasoning on the layperson's part. (Doesn't mean it's true, but it's logical given what they know). The reason it doesn't work is twofold: (a) a misunderstanding of evolution as "randomness"; seeing it as a search over the space of possible organisms is both more accurate and more illuminating, I think; and (b) misunderstanding the "God" hypothesis as the simple one.
If I'm right that these are among the fundamental errors the layperson makes in reasoning about evolution, the the best way to reach the non-mendacious, intelligent creationist is by pointing out these flaws. I don't know if anybody has studied whether this hunch is correct, but it sure would be fascinating to find out what sorts of arguments work best, not just because it would help us argue effectively on a national level, but also because it would reveal interesting things about how people tend to use Occam's Razor in real-life problems.
Posted by Amy Perfors at 4:04 AM
20 November 2005
There will be no session of the Applied Statistics workshop on Wednesday, November 23, in anticipation of the Thanksgiving recess. We hope to see you after the break (no leftover turkey for lunch, we promise!).
Posted by Mike Kellermann at 7:00 AM
18 November 2005
Mike Kellermann
We have talked a bit on the blog (here and here) about estimating the ideal points of legislators in different political systems. I've been doing some work on this problem in the United Kingdom, adapting an existing Bayesian ideal point model in an attempt to obtain plausible estimates of the preferences of British legislators.
The basic Bayesian ideal point model assumes that politicians have quadratic preferences over policy outcomes; this implies that they will support a proposal if it implements a policy closer to their ideal point than the status quo. Let qi be the ideal point of legislator i, mj be the location of proposal j, and sj be the location of the status quo that proposal j seeks to overturn. The (random) utility for legislator i of voting for proposal j can thus be written as:
sj2 - mj2 + 2qi(mj - sj) + eij
Or re-written as
aj + bjqi + eij
With the appropriate assumptions on the stochastic component, this is just a probit model with missing data in which the legislator votes in favor of the proposal when the random utility is positive and against when the random utility is negative. Fitting a Bayesian model with this sampling density is pretty easy, given some restrictions on the priors.
Unfortunately, applying this model to voting data in the British House of Commons produces results that lack face validity. The estimates for MPs known to be radical left-wingers are located in the middle of the political spectrum. Party discipline is the problem; the influence of the party whips (which is missing from the model) overwhelms the policy utility.
I try to address this problem by moving to a different source of information about legislative preferences. Early Day Motions allow MPs to express their opinions without being subject to the whips. EDMs are not binding, and can be introduced by any legislator. Other legislators can sign the EDM to indicate their support. There are well over 1000 EDMs introduced every year, which greatly exceeds the number of votes in the Commons.
We can't just apply the standard ideal point model to EDM data, however, because there is no way for MPs to indicate opposition to the policy proposed in an EDM. Instead of 'yea' and 'nay', one observes 'yea' or nothing. In particular, it is clear that some Members of Parliament are less likely to sign EDMs, regardless of their policy content. I model this by adding a cost term ci to legislator i's random utility.
sj2 - mj2 + 2qi(mj - sj) + ci + eij
This is a more realistic model of the decision facing legislators in the House of Commons. In this model, the proposal parameters are unidentified; I restrict the posterior distribution for these parameters by assuming a prior distribution that assumes the sponsors of EDMs make proposals that are close to their ideal points.
I'm still finalizing the results using data from the 1997-2001 Parliament, but the results on a subset of the data seem promising; left-wingers are on the left, right-wingers are on the right, and the (supposed) centrists are in the center. These estimates have much greater face validity than those generated from voting data.
If you are interested in this topic, I am going to be presenting my preliminary results at the G1-G2 Political Economy Workshop today (Friday, November 18) at noon in room N401. By convention, it is grad students only, so I hope there are not too many disappointed faculty out there (sure...).
Posted by Mike Kellermann at 3:09 AM
17 November 2005
Jim Greiner
In previous blog entries here, here, and here, I discussed the fundamental questions about the objectivity of expert witnesses raised by Professor of History Morgan Kousser's article entitled "Are Expert Witnesses Whores?".
In my view, Professor Kousser's article suggests that expert witnesses are not fully aware of the threat to their objectivity that the litigation poses. For example, despite acknowledging that lawyers "peform[ed] most of the culling of primary sources" in the cases in which he offered testimony, Professor Kousser argues, for a number of reasons, that there was no threat to objectivity. Primary among these reasons was the adversarial process, which gave the other side an incentive to find adverse evidence and arguments, and thus an incentive for an expert's own attorneys to share such evidence and arguments.
Professors Kousser's reasoning dovetails with private conversations I've had with social scientists about litigation experiences, who also insisted that they retained their objectivity throughout. Invariably, they support this contention by describing critical moments during pre-trial preparation in which they refused requests from their attorneys to testify to something, saying that the requests pushed the data too far or contradicted their beliefs.
My response: think about what the attorneys had already done to your objectivity before you reached these critical moments. Might they even have pushed you into refusing so as to convince you of your own virtue?
Professor Kousser and other social scientists have misperceived the nature of the threat. Professor Kousser is correct when he suggests that lawyers, upon encountering a potentially damaging piece of source material or evidence within an expert's area, are unlikely to suppress it (in the hope that the other side is negligent). But we lawyers do accompany our transmission of the potentially damaging item with rhetoric about its lack of reliability, importance, or relevance. Similarly, when we prepare experts for deposition and trial, we do not avoid adverse arguments or potential weaknesses in reasoning. Instead, we raise them in a way so as to minimize their impact. Often, we (casually) use carefully tailored, ready-made rhetorical phrases about the issue, hoping to hear those phrases again at trial. Before conducting pretrial meetings with important experts, we meet amongst ourselves to decide how best to ask questions and discuss issues to "prop up" expert' resolve.
Social scientists have long known that the way a questioner phrases an inquiry affects the answer received, that the way in which a conversational subject is raised affects the opinions discussants will form. Perhaps social scientists believe that their knowledge of these phenomena makes them immune to such effects. My experience in prepping social scientist expert witnesses suggests that such is not the case.
Posted by SSS Coauthors at 2:54 AM
16 November 2005
Just read this nice entry on Andrew Gelman's blog about junk graphs . Somebody complemented the entry by posting a link to another site by Karl Broman in the Biostatistics department of Johns Hopkinson. In case you missed this please take a look. We all make these mistakes, but it's actually really funny...
Posted by Jens Hainmueller at 2:09 PM
John Friedman
I ended my last post by showing, in the context of the brief model I sketched, what the optimal outcome would look like. In practice, though, the court suffers from two problems.
First, it cannot conduct a broad survey, but must instead rely on those testimonies presented in court. Each side will offer an expert whose "true opinion" is as supportive of their argument as possible, regardless of whether that expert is at all representative of commonly accepted views in the field. Second, the court cannot distinguish between an expert's true opinion and her "slant." Experts probably suffer some cost for slanting their views away from their true opinions, so one should not expect most slants to be large. But the legal parties will look to pick experts who suffer as little a cost from slanting as possible, so that, in equilibrium, the slants could be quite large.
Given these strategies from the legal parties, what does the court see? Each side presents an expert (or slate of experts) with the most favorable combination of "true opinion" and "slant." Even if the court could disentangle the two components of testimony, the court would only see the endpoints of the distribution of "true opinions" among the potential pool of experts. But since they cannot even distinguish the slant, the court actually sees only a noisy signal of the extremes of the distribution.
Finally, I have already argued that the experts chosen will be those most able (or willing) to slant their opinions, so that the ratio of signal to noise – or of "true opinion" to "slant" - for the experts will be very low, in expectation. When the court performs the required signal extraction problem, very little signal remains. Because of the optimizing action of each party, the court will draw very little inference from any of the witnesses in many cases, ironically nullifying the effect of the efforts of the experts. No one deviates from this strategy, though; if one side presented a more representative expert, while the other played the old strategy, the evidence would appear lopsided.
I noted in my last post that the "first-best," or socially optimal solution, would be for the court to collect a representative sample of the opinions of experts for their decision. Even when the parties present their own experts, each side would be better off if they could somehow commit not to use "slant" in their expert's opinions, since the decision in the case would be less noisy. But the structure of the problem makes such an agreement impossible.
Jim is correct when he remarks that, given the adversarial nature of the legal system, expert testimony could not happen any other way. We should not celebrate this fact, though; rather, we should mourn it. We are stuck in a terrible equilibrium.
Posted by James Greiner at 4:59 AM
15 November 2005
Sebastian Bauhoff
This entry follows up on earlier ones here and here on spatial statistics and spatial lag, and discusses another consequence of spatial dependence. Spatial error autocorrelation arises if error terms are correlated across observations, i.e., the error of an observation affects the errors of its neighbors. It is similar to serial correlation in time series analysis and leaves OLS coefficients unbiased but renders them inefficient. Because it's such a bothersome problem, spatial errors is also called "nuisance dependence in the error."
There are a number of instances in which spatial error can arise. For example, similar to what can happen in time series, a source of correlation may come from unmeasured variables that are related through space. Correlation can also arise from aggregation of spatially correlated variables and systematic measurement error.
So what to do if there is good reason to believe that there is spatial error? Maybe the most famous test is Moran's I which is based on the regression residuals and is also related to Moran's scatterplot of residuals which can be used to spot the problem graphically. There are other statistics like Lagrange multiplier and likelihood ratio tests, and each of them has different ways of getting at the same problem. If there is good reason to believe that spatial error is a problem, then the way forward is either model the error directly or to use autoregressive methods.
In any case it's probably a good idea to assess whether spatial error might apply to your research problem. Because of it's effect on OLS, there might be a better way to estimate the quantity you are interested in, and the results might improve quite a bit.
Posted by James Greiner at 3:54 AM
14 November 2005
This week, the Applied Statistics Workshop will present a talk by You, Jong-Sung, a PhD candidate in Public Policy at the Kennedy School of Government. Jong-Sung’s dissertation on “corruption, inequality, and social trust� explores how corruption and inequality reinforce each other and erode social trust. His dissertation chapter on cross-national study of causal effect of inequality on corruption was published in ASR (February 2005) as an article with S. Khagram. His research interests include comparative politics and political sociology of corruption and anti-corruption reform and political economy of inequality and social policy. Before coming to Harvard, he worked for an NGO in Korea, “Citizens’ Coalition for Economic Justice�, as Director of Policy Research and General Secretary. He spent more than two years in prison because of democratization movement under military regimes. He has a BA in social welfare from Seoul National University, and a MPA from KSG. He is also one of the authors of this blog.
The talk is entitled “A Multilevel Analysis of Correlates of Social Trust: Fairness Matters More Than Similarity,� and draws on Jong-Sung’s dissertation research. The abstract follows on the jump:
I argue that the fairness of a society affects its level of social trust more than does its homogeneity. Societies with fair procedural rules (democracy), fair administration of rules (freedom from corruption), and fair (relatively equal and unskewed) income distribution produce incentives for trustworthy behavior, develop norms of trustworthiness, and enhance interpersonal trust. Based on a multi-level analysis using the World Values Surveys data that cover 80 countries, I find that (1) freedom from corruption, income equality, and mature democracy are positively associated with trust, while ethnic diversity loses significance once these factors are accounted for; (2) corruption and inequality have an adverse impact on norms and perceptions of trustworthiness; (3) the negative effect of inequality on trust is due to the skewness of income rather than its simple heterogeneity; and (4) the negative effect of minority status is greater in more unequal and undemocratic countries, consistent with the fairness explanation.
Posted by Mike Kellermann at 11:55 AM
Drew Thomas
Last year during Prof. Rima Izem's Spatial Statistics course, I started to wonder about different analytical techniques for comparing lattice data (say voting results, epidemiological information, or the prevalence of basketball courts) on a map with distinct spatial units such as counties.
A set of techniques had been demonstrated to determine spatial autocorrelation through the use of a fixed-value neighbour matrix, with one parameter determining the strength of the autocorrelation. The use of the fixed neighbour matrix perturbed me somewhat, since the practice of geostatistics uses a tool called the empirical variogram - a functional estimate of variance between sample sites through a regression, based on taking each possible pair of points and computing the difference between squared values - which might give a more reasonable estimate of autocorrelation than a simpler model.
As it turned out, this same question was asked by Prof. Melanie Wall from the Biostatistics Department at the University of Minnesota about a year before I got around to it. In her paper "A close look at the spatial structure implied by the CAR and SAR models" (J. Stat. Planning and Inference, v121, no.2), Prof. Wall tests the idea of using a variogram approach to model spatial structure on SAT data against more common lattice models. And what do you know - the variogram approach holds up to scrutiny. In some cases it outperforms the lattice model, such as in the extreme case of Tennessee and Missouri, which have a bizarrely low correlation due to the fact that each state has eight neighbours.
As well as feeling relief that this difficulty with the model wasn't just in my imagination, I'm glad to see that this type of inference crosses so many borders.
Posted by Andrew C. Thomas at 3:37 AM
10 November 2005
John Friedman
No sooner had the recent posts on this blog by Jim Greiner about the use of statistics and expert witnesses in trials
(see here and here, as well as yesterday's' post) piqued my curiosity than I was empanelled on a jury for a 5-day medical malpractice trial. This gave me ample time to think through some of the issues of statistics and the law. I will spend my next posts discussing these issues from three different perspectives: the game-theoretic, the experiential, and the historical.
I first approach this problem from a game-theoretic framework. In Jim's second post, he spoke about how, in our adversarial legal system, an expert for one side tends to interpret the facts in the way most favorable for that side, without compromising her "academic integrity." He then listed several reasons why this might actually be best for the system. I tend to disagree on this final point; instead, I believe the adversarial nature of the system pushes us into a very bad situation.
To give my argument focus, we must first pin down the concept of "equilibrium." An equilibrium of a game is a strategy for each player such that, given the other players' strategies, the player is maximizing her return from the game. In this case, the game is relatively simple: Two parties to a lawsuit are the players, each with a set of expert testimonials interpreting the relevant statistics in the case (which makes up the strategy). We can represent the net message from the expert testimony for each side as a number on the real line: The more positive the number, the more pro-plaintiff the testimony.
We must make some simplifying assumptions to analyze this problem. Let us assume that the testimony for each side comprises two components: the "true opinion" and the "slant." When added together, "true opinion" + "slant" = testimony. (For simplicity, let us assume that these numbers are the actual impact of the testimony. Thus, if a testimony seems too biased and is discounted, the true number would not lie far from zero). In an ideal world, the court (either judge or jury) would survey the "true opinions" of many experts in the field; if enough opinions were positive, the case would go for the plaintiff. Economists often refer to such a case as the "first-best," the socially optimal outcome.
Many games do not yield the socially optimal outcome, though. Both parties can even be worse off playing the equilibrium strategies than if each played some other strategy, despite the fact that each party maximizes her payoff given the other players strategy. A classic example of such a situation is the "Prisoner's Dilemma." In my next post, I will explore how, in this legal setting, exactly this tragedy occurs.
Posted by James Greiner at 5:50 AM
9 November 2005
Jim Greiner
Continuing with the theme of quantitative social science expert witnesses in litigation introduced here and here, I shift gears to consider the experts' view of lawyers. Several expert witnesses with whom I have spoken confided that they often form low opinions of the lawyers who retain them. One common complaint is that the attorneys do not take the time to understand the guts of the issue experts were hired to examine. Another is that lawyers are uncommunicative and provide poor guidance as to their preferences for the testimony of experts.
Without question, poor lawyering is common, and some of what experts experience can be safely attributed to this source. But as was the case with lawyers' complaints about experts, experts' complaints about lawyers have their genesis partly in the structural rules that govern litigation. In most courts and jurisdictions, communications between a testifying expert and any other participant in the case (lawyer, fact witness, another expert) are discoverable. That means that, before trial, the other side is entitled to request, for example, copies of all email communications between lawyer and expert. In deposition, an expert may be questioned on telephone and other oral conversations with the retaining attorney. For this reason, good lawyers are careful about what they say to experts; they know that written or transcribed communications reach both parties to a case.
As is usually the situation, there are good reasons for this rule. An expert witness is one of the most dangerous creatures to enter a courtroom. By definition, he or she invariably knows more about the subject matter of the testimony than anyone else involved in the litigation, except perhaps the opposing expert. The judge and jury lack the knowledge and training to assess what the expert says. Thus, the law provides that experts must disclose anything that might form the basis of an expert's opinion, including communications with trial counsel (along with workpapers, references consulted, and other items).
Expert witness frustration aside, this discovery rule has other negative side effects; it affects not only how well lawyers prepare a case for trial, but also the treatment of the suit more generally. Parties and their attorneys need information to settle, and a lack of clear communication between lawyer and expert may cause the former to misjudge the settlement value of a case. Once again, we see how atypical Professor Kousser's experience as an expert was (see here), as lawsuits concerning the internal structure of a municipality or a state entity settle less often than, say, employment discrimination class actions.
In closing, a word to potential and actual social science expert witnesses: If you find yourself frustrated by a certain reticence or irrational exuberance on the part of the attorney retaining you, remember, there may be good reason for it.
Posted by SSS Coauthors at 2:50 AM
8 November 2005
Sebastian Bauhoff
In a recent presentation at Harvard, Caroline Hoxby outlined a paper-in-process on estimating the causal impact of higher education on economic growth in the US states (Aghion, Boustan, Hoxby and Vandenbussche (ABHV) "Exploiting States' Mistakes to Identify the Causal Impact of Higher Education on Growth", draft paper August 6, 2005).
ABHV's paper is interesting for the model and results, and you should read it to get the full story. But the paper is also intersting because of the instrument used to get around the endogeneity of education spending (where rich states spend more on higher education).
The basic idea is as follows: politicians are motivated to channel pork to their constituents in return for support. They do so through appropriations committees that can disburse "earnmarked" funds to research-type education. Observing that the membership of these committees is to a large extent random, ABHV have an instrument for research spending (and more instruments for spending on other types of education) and proceed to estimate the causal effect of education on growth. So this paper employs what could be called a political instrument. Of course there are plenty of other classes of IV's such as natural events (rainfall or natural disasters) etc. But an instrument is only partly distinguished by its ability to fulfill the formal requirements. There's also plenty of scope for creativity.
The IQSS Social Science Statistics blog is soliciting suggestions and requests for instruments: send your favorite IV and its application. Or tell our readers what you always wanted to get instrumented and see if someone comes up with a suggestion.
Posted by Sebastian Bauhoff at 1:08 AM
7 November 2005
Thanks a lot to Mike Kellerman for inviting me over for the talk on Oct 26, 05 at the IQSS (see here for details). I really enjoyed giving the talk and getting interesting comments and questions from the audience. In particular, Prof. Donald Rubin, Prof. Gary King and others made important contributions which I really appreciate. Prof. Kevin Quinn gave me some excellent suggestions on how to improve the structure of the talk which I think will turn out to be very helpful in the near future when I prepare for the job market. In fact, along those lines, if anyone may have any inputs/suggestions/comments on the presentation please feel free to send them to me at goswami@stat.harvard.edu.
Here are some afterthoughts on the talk. The PBC (Population Based Clustering) moves I presented, namely, SCSC:TWO-NEW, SCSC:ONE-NEW and SCRC are new and they are very specific to the sampling based
clustering (which is a discrete space) problem. I haven't been successful in devising similar moves in dealing with general sampling problem on a continuous space. In the Evolutionary Monte Carlo (EMC) literature these types of moves are also called "cross-over" moves because these moves take two chromosomes (or states of two chains)
which are called two "parents" and implement some cross-over type operation with the parents to produce two chromosomes (or proposed states of two chains) which are called "children."
The main motivation behind devising the above mentioned moves, as I mentioned in the talk, is that we were looking for moves which propose to update "more than one coordinate but not too many" at a time. Gibbs sampler proposes one coordinate at a time update. This is the main reason why Jain and Neal (A Split-Merge Markov Chain Monte Carlo
Procedure for the Dirichlet Process Mixture Model with Radford M. Neal, Journal of Computational and Graphical statistics, volume 13, No. 1, pp. 158-182 . (2004)) proposed their sampler which updates more than one coordinates at a time but it does so for one too many of them. To counter this problem we proposed the above mentioned PBC moves which are kind of a middle ground between the Gibbs sampler and the Jain-Neal sampler.
The other main issue addressed by the two moves, namely, SCSC:ONE-NEW and SCRC, is that "they produce only one new child" after "cross-over." To expand on this, we note that since all the PBC moves, the mentioned ones included, are Metropolis-Hastings type moves, two "children" have to be produced to replace the parents so as to maintain reversibility or detailed balance. But the children produced by two good parents are usually not good enough, and one does not want to throw away some good parent by chance. Thus, it has long been desired to design some moves that both can take advantage of the "cross-over" strategy and can keep some good parent. Our new moves are the first such in the literature.
Lastly, some members of the audience in the talk were worried about the temperature placement problem in the parallel tempering set up. Prof. Jun Liu and I proposed a first cut solution to the problem which solves the problem in two steps. First, we determine the highest temperature to be used in the ladder, namely, $t_1 = \tau_{max}$. Next, we look at the length and the structure of the ladder i.e. the placement of the intermediate temperatures within the
range $(\tau_{min}, \tau_{max})$. You can find the details of this the paper at my website by clicking on "On Real-Parameter Evolutionary Monte Carlo Algorithm (ps file) [submitted]":
Posted by James Greiner at 5:42 AM
6 November 2005
This week, the Applied Statistics Workshop will present a talk by Eric Werker of the Harvard Business School. Professor Werker received his Ph.D. from the Harvard Economics Department in 2005. His research focuses on the economics of security in developing economies, including studies of forced migration and refugee camps, and he received an NBER Pre-doctoral Fellowship in the Economics of National Security. Professor Werker will present a talk entitled "Male Circumcision and the Impact of AIDS in Africa." The presentation will be at noon on Wednesday, November 9 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. The abstract of the paper follows on the jump:
Theories abound on the possible impact of AIDS on economic growth and savings in Africa yet there have been surprisingly few empirical studies to test the mixed predictions. In this paper, we examine the impact of the AIDS epidemic on African nations through 2003 using the male circumcision rate to identify plausibly exogenous variation in HIV prevalence. Though medical researchers are still in debate over whether lack of male circumcision can lead to increased susceptibility of contracting HIV, the statistical correlation between the two is compelling. We assemble national circumcision rates for African nations and find that they are both a strong and robust predictor of HIV/AIDS prevalence and uncorrelated with other determinants of economic outcomes. Two-stage least squares regressions do not reveal that AIDS has had any measurable impact on economic growth in African nations, however we do find that AIDS has had considerable effect on the structure of the economy and on humanitarian outcomes such as undernutrition.
Posted by Mike Kellermann at 8:48 PM
4 November 2005
You, Jong-Sung
I had a very embarrassing experience, when I presented my early draft paper on “Inequality and Corruption as Correlates of Social Trust� at a Work-in-Progress Seminar at the Kennedy School of Government last fall. Professor Edward Glaeser came to my talk, but I was not aware of him although I had read his articles including one about “measuring trust." He asked a question about measurement of social trust without identifying himself. Since I had already talked about the problem of measurement (apparently he did not hear that because he was late) and was about to present my results, I did not want to spend much time about the measurement issue. He was not satisfied with my brief answer and repeated his questions and comments, saying that the typical trust question in surveys, “Do you agree that most people can be trusted or you can’t be too careful?�, may reflect trustworthiness rather than trust “according to a study.� Because I assumed that trust and trustworthiness reinforce one other, I did not think that was a great problem.
Our encounter was an unhappy one for us both. Probably he had an impression that I did not respect him and did not give adequate attention and appreciation to his questions and comments, and I was also kind of annoyed by his repeated intervention. One thing that made the things even worse was that I am not a native English speaker; I have particular difficulty with husky voices like his, a difficulty made the interaction even more problematic. After the talk, I asked him to give the reference for the study on measurement of trust he mentioned. He wrote down Glaeser et al. (2000), and I realized that I had read the article he cited. Even then, I was unaware who he was. I asked a participant of the seminar who he was, and to my surprise, he was Edward Glaeser, the lead author of the article on measuring trust. If I had recognized him, I would have paid much more attention to his questions and comments and tried to answer them better. How big a mistake I made!
Although I still think that the typical trust question captures both trust and trustworthiness, Glaeser et al.’s experimental results may indicate the trust question needs to be designed better. One thing to note in this regard is that caution is not the opposite of trust, as Yamagishi et al. (1999) argued. In my case study of social trust in Korea, I found that inclusion and exclusion of “being careful� option in trust questions produced substantially different results. More respondents agree that most people can be trusted when they were simply asked, “Do you think most people can be trusted� than when they were given the two options “trusting most people� and “being careful.� Average percentage of trusting people was 42.9 per cent for the former type of questions, and 32.2 per cent for the latter type of questions. I looked at the GSS, and the same was true there. The trust question was given without the option of being careful once during 1983-87, and 55.7 per cent of respondents agreed that most people can be trusted. When the “being careful� option was given, only 42.1 per cent of respondents did so.
Posted by SSS Coauthors at 5:54 AM
3 November 2005
John Friedman
In my last post, I wrote about the methodological identity of economics and some of the corresponding advantages. But perhaps the greatest benefit to economists from this definition of the discipline is the great range of subjects on which one can work.
There are, of course, areas of inquiry traditionally dominated by economists – monetary policy, or the profit-maximizing activities of companies, to name a few – and most people connect economics, as a field, to these subjects. Increasingly, though, economists are venturing further afield. Steven Levitt’s best-selling book, Freakonomics, exemplifies this trend, using the tools of economics to investigate corruption in sumo wrestling, cheating in Chicago schools, and ethnic names, to name a few. While Levitt currently sits farther from the mainstream than most economists, his work appears to be not a randomly scattered shot but rather the vanguard of a new generation of scholars.
What are the consequences of this expansion of economics across the social sciences? The increasing incidence of economists working on problems traditionally associated with other fields will, no doubt, create some conflict in the coming years. No local baron, ruling a fiefdom of land or knowledge, savors a challenge over his turf. And the “imperial� economists, many of whom view other fields as weak and primed for colonization, will surely disrespect the vast contributions of non-economists to date. But despite the inevitable (but still unfortunate) conflicts of ego, the majority of these interactions should be not only of great benefit to the world but also a wondrous sight to see. Nothing in academia is quite so spectacular as the collision of two great points of view, obliterating long-held dogmas and, in the heat of debate, forging new paradigms for generations to come.
As a young economist, I look forward to following (and even contributing to) these great arguments to come. And I hope that those of us writing this blog, viewing the questions in social science from diverse perspectives, can give you a look at the current state of these debates.
Posted by James Greiner at 4:00 AM
2 November 2005
Amy Perfors
If it's of interest, I will be blogging every so often about the numerous ways that humans seem to be remarkably adept statistical learners. This is a big question in cognitive science for two reasons. First, statistical learning looks like a promising approach to help answer the open question of how people learn as well and as quickly as they do. Second, better understanding how humans use statistical learning may be a good way to improve our statistical models in general, or at least investigate in what ways they might be applied to real data.
One of the more impressive demonstrations of human statistical learning is in the area usually called "implicit grammar learning." In this paradigm, people are presented with strings of nonsense syllables like "bo ti lo fa" in a continuous stream for a minute or two. One of the first examples of this paradigm, by Saffran et. al., studied word segmentation -- for example, being able to tell that "the" and "bird" are two separate words, rather than guessing it is "thebird" or "theb" and "ird." If you ever listen to a foreign language, you realize that word boundaries aren't signaled by pauses, which is a huge problem if you're trying to learn the words. Anyway, in the study, syllables occurred in groups of three, thus making "words" like botifa or gikare. As in natural language, there was no pause between words; the only cues to word segmentation were the different transition probabilities between syllables -- that is, "ti" might be always followed by "fa" but "fa" could be followed by any of the first syllables of any other words. Surprisingly, people can pick up on these subtleties: adults who first heard a continuous stream of this "speech" were then able to identify which three-syllable items they heard were "words" or "nonwords" in the "language" they had just heard. That is, the people could correctly say that "botifa" was a word, but "fagika" wasn't, at an above chance level. Since the only cues to this information were in the transition probabilities, people must have been calculating those probabilities implicitly (none had the conscious sense they were doing much of anything). Most surprisingly of all, the same researchers demonstrated in a follow-up study that even 8-month old infants can use these transitional probabilities as cues to word segmentation. Work like this has led many to believe that statistical learning might be one of the most powerful resources infants use during the difficult problem of language learning.
From the modeling perspective, this result can be captured by Markov models in which the learner keeps track of the string of syllables and the transition probabilities between them, updating the transition probabilities as they hear more data. More recent work has begun to investigate whether humans are capable of statistical learning that cannot be captured by a Markov model -- that is, learning nonadjacent dependencies (dependencies between syllables that do not directly follow each other) in a stream of speech. For instance, papers by Gomez et. al. and Onnis et. al. provide evidence that discovering even nonadjacent dependencies is possible through statistical learning, as long as the variability of the intervening items is low or high enough. This has obvious implications for how statistical learning might help in acquiring grammar (in which many dependencies are nonadjacent), but it also opens up new modeling issues, since simple Markov models are no longer applicable. What more sophisticated statistical and computational tools are necessary in order to capture own unconscious, amazing abilities?
Posted by James Greiner at 4:20 AM
1 November 2005
Jim Greiner
Social science statistics is everywhere. So is law. And both are tangled up with each other. I was forcefully reminded of these facts when my wife pointed out an article on Salon.com about an opinion Samuel Alito (as of yesterday, a nominee to the Supreme Court) wrote while a judge on the United States Court of Appeals for the Third Circuit in a case called Riley v. Taylor. The facts of the specific case, which concerned the potential use of race in preemptory challenges in a death penalty trial, are less important than Judge Alito's approach to statistics and the burden of proof.
Schematically, the facts of the case follow this pattern: Party A has the burden of proof on an issue concerning race. Party A produces some numbers that look funny, meaning instinctively unlikely in a race-neutral world, but conducts no significance test or other formal statistical analysis. The opposing side, Party B, doesn't respond at all, or if it does respond, it simply points out that a million different factors could explain the funny-looking numbers. Party B does not attempt to show that such innocent factors actually do explain the observed numbers, just that they could, and that Party A has failed to eliminate all such alternative explanations.
Such cases occur over and over again in cases involving employment discrimination, housing discrimination, preemptory challenges, and racial profiling, just to name a few. When discussing them, judges inevitably lament the fact that one side or the other did not conduct a multiple regression analysis, as if that technique would provide all the answers (Judge Alito's Riley opinion is no exception here).
The point is, of course, that how a judge views such cases has almost nothing to do with the facts at bar and everything to do with a judge's priors on the role of race in modern society. For judges who believe that race has little relevance in the thought processes of modern decision makers (employers, landlords, prosecutors, cops), Party A in the above situation must eliminate all potential explanatory factors via (alas) multiple regression in order to meet its burden of production. For judges who believe that race still matters, Party B must respond in the above situation or lose the case. Judge Alito's Riley opinion demonstrates where he stands here.
Is there a middle way? Perhaps. In the above situation, what about requiring some sort of significance test from Party A, but not one that eliminates alternative explanations? In the specific facts of Riley, the number-crunching necessary for "some sort of significance test" is the statistical equivalent of riding a tricycle: a two-by-two hypergeometric with row totals of 71 whites and 8 blacks, column totals of 31 strikes and 48 non-strikes, and an observed value of 8 black strikes yields a p-value of 0.
Posted by James Greiner at 3:58 AM