Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
« December 2005 | Main | February 2006 »
31 January 2006
Mike Kellermann
Those of you who have followed this blog know that making reasonable causal inferences from observational data usually presents a huge challenge. Using experimental data where we "know" the right answer, in the spirit of Lalonde (1986), provides one way for researchers to evaluate the performance of their estimators. Last month, Jens posed the question (here and here) "What did (and do we still) learn from the Lalonde dataset?" My own view is that we have beaten the NSW data to death, buried it, dug it back up, and whacked it around like a piñata. While I'm sure that others would disagree, I think that we would all like to see other experiment-based datasets with which to evaluate various methods.
In that light, it is worth mentioning "Comparing experimental and matching methods using a large-scale voter mobilization experiment" by Kevin Arceneaux, Alan Gerber, and Donald Green, which appears in the new issue of
The authors then attempt to replicate their experimental results using both OLS and various matching techniques. In this context, the goal of the matching process is to pick out people who would have listened to the phone call had they been contacted. The authors have a set of covariates on which to match, including age, gender, household size, geographic location, whether the voter was newly registered, and whether the voter turned out in each of the two previous elections. Because the control sample that they have to draw from is very large (almost two million voters), they don't have much difficulty in finding close matches for the treated group based on the covariates in their data. Unfortunately, the matching estimates don't turn out to be very close to the experimental baseline, and in fact are much closer to the plain-vanilla OLS estimates. Their conclusion from this result is that the assumptions necessary for causal inferences under matching (namely, unconfoundedness conditional on the covariates) are not met in this situation, and (at least by my reading) they seem to suggest that it would be difficult to find a dataset that was rich enough in covariates that the assumption would be met.
As a political scientist, I have to say that I like this dataset, because (a) it is not the NSW dataset and (b) it is not derived from a labor market experiment. What do these results mean for matching methods in political science? I'll have some thoughts on that tomorrow.
Posted by Mike Kellermann at 6:00 AM
30 January 2006
This week, the Applied Statistics Workshop resumes for the spring term with a talk by Jim Greiner, a Ph.D. candidate in the Statistics Department. The talk is entitled "Ecological Inference in Larger Tables: Bounds, Correlations, Individual-Level Stories, and a More Flexible Model," and is based on joint work with Kevin Quinn from the Government Department. Jim graduated with a B.A. in Government from the University of Virginia in 1991 and then received a J.D. from the University of Michigan Law School in 1995. He clerked for Judge Patrick Higginbotham on the U.S. Court of Appeals for the Fifth Circuit and was a practicing lawyer in the Justice Department and private practice before joining the Statistics Department here at Harvard. As chair of the author's committee, he is a familiar figure to readers of this blog.
As a reminder, the Applied Statistics Workshop meets in Room N354 in the CGIS Knafel Building (next to the Design School) at 12:00 on Wednesdays during the academic term. Everyone is welcome, and lunch is provided. We hope to see you there!
Posted by Mike Kellermann at 12:30 PM
Jim Greiner
In two previous posts here and here, I discussed the ecological inference problem as it relates to the legal question of racially polarized voting in litigation under Section 2 of the Voting Rights Act. In the latter of these two posts, I suggested that this field needed greater research into the case of R x C, as opposed to 2 x 2, tables.
Here's another suggestion from the courtroom: we need an individual level story.
The fundamental problem of ecological inference is that we do not observe data at the individual level; instead, we observe row and column totals for a set of aggregate units (precincts, in the voting context). This fact has led to some debate about whether a model or a story or an explanation about individual level behavior is necessary to make ecological inferences reliable, or at least as reliable as they can be. On the one hand, Achen & Shively, in their book Cross-Level Inference, have argued that an individual level story is always necessary to assure the coherence of the aggregate model and to assess its implications. On the other hand, Gary King, in his book A Solution to the Ecological Inference Problem, has argued that because we never observe the process by which ecological data are aggregated from individual to group counts, we need not consider individual level processes, so long as the row counts (or percentages) are uncorrelated with model parameters.
From a social science point of view, this question is debatable. From a legal point of view, we need an individual level story, regardless of whether such a story produces better statistical results. When judges and litigators encounter statistical methods in a litigation setting, they need to understand (or, at least, to feel that they understand) something about those methods. They know they will not comprehend everything, or perhaps even most things, and they have no interest in the gritty details. But they will not credit an expert witness who says, in effect, "I ran some numbers. Trust me." What can quantitative expert witnesses offer in an ecological inference setting? The easiest and best thing is some kind of individual level story that leads to the ecological model being used.
Posted by James Greiner at 6:01 AM
27 January 2006
It's a common truism, familiar to most people by now thanks to advertising and politics, that repeating things makes them more believable -- regardless of whether they're true or not. In fact, even if they know at the time that the information is false, people will still be more likely to believe something the more they hear it. This phenomenon, sometimes called the reiteration effect, is well-studied and well-documented. Nevertheless, from a statistical learning point of view, it is extremely counter-intuitive: shouldn't halfway decent learners learn to discount information they know is false, not "learn" from it?
One of the explanations for the repetition effect is related to source confusion -- the fact that, after a long enough delay, people are generally much better at remembering what they learned rather than where they learned it. Since a great deal of knowing that something is false means knowing that its source is unreliable, forgetting the source often means forgetting that it's not true.
Repetition increases the impact of source confusion for two reasons. First, the more often you hear something, the more sources there are to remember, and the more likely you are to forget at least some of them. I've had this experience myself - trying to judge the truth of some tidbit of information, actually remembering that I first read it somewhere that I didn't trust, knowing that I've read it somewhere else (but not remembering the details) and concluding that since there was some chance that this somewhere else was trustworthy, it might be true.
The second reason is that the more sources there are the more unlikely it seems that all of them believe it if it's false. This strategy makes some evolutionary and statistical sense. Hearing (or experiencing) something from two independent sources (or two independent events) makes it more likely that you can generalize on them than if you only experienced it once. This idea is the basis of getting large sample sizes: as long as the samples are independent, more samples means more evidence. Unfortunately, in the mass media today few sources of information are independent. Most media outlets get things from AP wire services and most people get their information from the same media outlets, so even if you hear item X in 20 completely different contexts, chances are that all 20 of them stem from the same one or two original reports. If you've ever been the source of national press yourself, you will have experienced this firsthand.
I tried to think of a way to end this entry on a positive note, but I'm having a hard time here. It's a largely unconscious byproduct of how our implicit statistical learning mechanisms operate, so even being aware of this effect is only somewhat useful: we know consciously not to trust things simply because we've heard them often, but so much of this is unconscious it's hard to fight. Education about it is therefore worthwhile, but better still would be solutions encouraging a more heterogeneous media with more truly independent sources.
Posted by Amy Perfors at 6:00 AM
26 January 2006
Jens Hainmueller
January is exam period at Harvard. Since exams are usually pretty boring, I sometimes get distracted from learning by online games. Recently, I found a game that may even be (partly) useful for exam preparation, at least for an intro stats class. Yes, here it is a computer game about statistics: StatsGames. StatsGame is a collection of 20 games or challenges designed to playfully test and refine your statistical thinking. As the author of StatsGames, economist Gary Smith, admits: "These games are not a substitute for a statistics course, but they may give you an enjoyable opportunity to develop your statistical reasoning." The program is free and runs on all platforms. Although the graphical makeup somewhat reminds me of the days when games were still played on Atari computers, most of the games in the collection are really fun. My favorites include the Batting Practice (a game to teach students to use the binomial distribution to test the hypothesis whether you are equally likely to swing late or early) and the Stroop effect (a game featuring a simple cognitive science type experiment which is then evaluated using the F-test). I also enjoyed the simulation of Galton's Apparatus. Go check it out! But don't waste too much exam preparation time of course - and good luck if you have any exams soon! I also wonder whether there are other computer games about statistics out there. Any ideas?
Posted by Jens Hainmueller at 6:00 AM
25 January 2006
Jim Greiner
This Spring, Harvard will be the site of something that has never been attempted before . . . I think. Matthew Stephenson of the Harvard Law School, Don Rubin of the Harvard Department of Statistics, and I will teach a seminar entitled Quantitative Social Science, Law, Expert Witnesses, and Litigation. The course will be offered jointly in the Law School and the Statistics Department and will, we hope, include students from the both places as well as other Departments in the Graduate School of Arts & Sciences (Government, Sociology, Economics, etc.).
In the course, the quantitatively trained students will act as expert witnesses by analyzing datasets relating to a given fact scenario. The experts will draft expert reports and testify at depositions, which will be taken by the law students acting as (what else?) lawyers. The lawyers will then use the transcripts and expert reports to draft cross motions for summary judgment and responses to those motions. By the way: A very big thanks to New England Court Reporting Institute for agreeing to provide court reporters free of charge to assist the course!
Our hope is that by forcing law students and quantitatively trained students to communicate effectively under the pressure-cooker conditions of pre-trial litigation, we can teach them something about the critical process of communicating with one another generally. In my view, this communication process is underemphasized in both law schools and quantitative departments around the nation. For example, how often does the average law student have to communicate with a person with greater knowledge of another field (anything from construction to exporting fruit)? How often are students trained in quantitative fields required to explain methods and conclusions to those not so trained?
When I began putting together this course a year ago, I searched for analogs in academic websites around the country but found none. My question: are there other for-credit classes like this one out there? By "like this one" I mean courses in which quantitative and law students are in the same classroom, forced to work with each other effectively?
Either way, I'll be sharing some of the lessons learned from this effort throughout the upcoming semester.
Posted by James Greiner at 6:00 AM
24 January 2006
Sebastian Bauhoff
With the end of the Fall semester comes the happy time of shopping for (applied) quantitative methods courses for the Spring. Here's a partial list for currently planned offerings around Cambridge, and their descriptions.
Bio 503 Introduction to Programming and Statistical Modeling in R (Harezlak, Paciorek and Houseman)
An introduction into R in 5 3-hour sessions combining demonstration, lecture, and laboratory components. It will be graded pass/fail on the basis of homework assignments. Taught in the Winter session at HSPH.
Gov 2001 Advanced Quantitative Research Methodology (King)
Introduces theories of inference underlying most statistical methods and how new approaches are developed. Examples include discrete choice, event counts, durations, missing data, ecological inference, time-series cross sectional analysis, compositional data, causal inference, and others. Main assignment is a research paper to be written alongside the class.
Econ 2120. Introduction to Applied Econometrics (Jorgenson)
Introduction to methods employed in applied econometrics, including linear regression, instrumental variables, panel data techniques, generalized method of moments, and maximum likelihood. Includes detailed discussion of papers in applied econometrics and computer exercises using standard econometric packages. Note: Enrollment limited to certain PhD candidates, check the website.
MIT 14.387 Topics in Applied Econometrics (Angrist and Chernozhukov)
Click here for 2004 website
Covers topics in econometrics and empirical modeling that are likely to be useful to applied researchers working on cross-section and panel data applications.
[It's not clear whether this class will be offered in Spring 06. Check the MIT class pages for updates.
KSG API-208 Program Evaluation: Estimating Program Effectiveness with Empirical Analysis (Abadie)
Accessible from here (click on Spring Schedule)
Deals with a variety of evaluation designs (from random assignment to quasi-experimental evaluation methods) and teaches analysis of data from actual evaluations, such as the national Job Training Partnership Act Study. The course evaluates the strengths and weaknesses of alternative evaluation methods.
KSG PED-327 The Economic Analysis of Poverty in Poor Countries (Jensen)
Accessible from here (click on Spring Schedule)
Emphasizes modeling behavior, testing economic theories, and evaluating the success of policy. Topic areas include: conceptualizing and measuring poverty, inequality, and well-being; models of the household and intra-household allocation; risk, savings, credit, and insurance; gender and gender inequality; fertility; health and nutrition; and education and child labor.
Stat 221 Statistical Computing Methods (Goswami)
Advanced methods of fitting frequentists and Bayesian models. Generation of random numbers, Monte Carlo methods, optimization methods, numerical integration, and advanced Bayesian computational tools such as the Gibbs sampler, Metropolis Hastings, the method of auxiliary variables, marginal and conditional data augmentation, slice sampling, exact sampling, and reversible jump MCMC.
Stat 232 Incomplete Multivariate Data (Rubin)
Methods for handling incomplete data sets with general patterns of missing data, emphasizing the likelihood-based and Bayesian approaches. Focus on the application and theory of iterative maximization methods, iterative simulation methods, and multiple imputation.
Stat 245 Quantitative Social Science, Law, Expert Witnesses, and Litigation (Stephenson and Rubin)
Explores the relationship between quantitative methods and the law via simulation of litigation and a short joint (law student and quantitative student) research project. Cross-listed with Harvard Law School.
Stat 249 Generalized Linear Models (Izem)
Methods for analyzing categorical data. Visualizing categorial data, analysis of contingency tables, odds ratios, log-linear models, generalized linear models, logistic regression, and model diagnostics.
Posted by Sebastian Bauhoff at 6:00 AM
21 January 2006
How much slower would scientific progress be if the near universal standards for scholarly citation of articles and books had never been developed. Suppose shortly after publication only some printed works could be reliably found by other scholars; or if researchers were only permitted to read an article if they first committed not to criticize it, or were required to coauthor with the original author any work that built on the original. How many discoveries would never have been made if the titles of books and articles in libraries changed unpredictably, with no link back to the old title; if printed works existed in different libraries under different titles; if researchers routinely redistributed modified versions of other authors' works without changing the title or author listed; or if publishing new editions of books meant that earlier editions were destroyed? How much less would we know about the natural, physical, and social worlds if the references at the back of most articles and books were replaced with casual mentions, in varying, unpredictable, and incomplete formats, of only a few of the works relied on?
These questions are all obviously counterfactuals when it comes to printed matter, but remarkably they are entirely accurate descriptions of our [in]ability to reliably cite, access, and find quantitative data, all of which remain in an entirely primitative state of affairs.
Micah Altman and I have just written a paper on this subject that may be of interest. The title is "A Proposed Standard for the Scholarly Citation of Quantitative Data" and a copy can be found here. The abstract follows. Comments welcome!
An essential aspect of science is a community of scholars cooperating and competing in the pursuit of common goals. A critical component of this community is the common language of and the universal standards for scholarly citation, credit attribution, and the location and retrieval of articles and books. We propose a similar universal standard for citing quantitative data that retains the advantages of print citations, adds other components made possible by, and needed due to, the digital form and systematic nature of quantitative data sets, and is consistent with most existing subfield-specific approaches. Although the digital library field includes numerous creative ideas, we limit ourselves to onl those elements that appear ready for easy practical use by scientists, journal editors, publishers, librarians, and archivists.
Posted by Gary King at 5:50 PM
20 January 2006
Sebastian Bauhoff
In a 3-day conference at IQSS, Jon Krosnik is currently presenting chapters of a forthcoming 'Handbook of Questionnaire Design: Insights from Social and Cognitive Psychology'. Applied social scientists have put a lot of effort into improving research methods once the data is collected. However some of the evidence that Krosnik discusses shows that those efforts may be frustrated: getting the data may be a rather weak link in the chain of research.
Everyone who collected data themselves will know about those issues. The Handbook might be good way to get a structured review and facilitate more throrough thinking.
PS: The conference is this years' Eric M. Mindich 'Encounters with Authors' symposium. An outline is here.
Posted by Sebastian Bauhoff at 6:00 AM
19 January 2006
Sebastian Bauhoff
The Economist recently featured an intestesting article on forthcoming research by Griffiths and Tenenbaum on how the brain works ("Bayes Rules", January 7, 2006).
Their research reportedly analyses how the brain makes judgements by using prior distributions. Griffiths and Tenenbaum gave individuals a piece of information and asked them to draw general conclusions. Apparently the answers to most issues correspond well to a Bayesian approach to reasoning. People generally make accurate predictions, and pick the right probability distribution. And it seems that if you don't know the distribution, you can just make experiments and find out.
The interesting question of course is, where does the brain get this information from? Trial and error experience? Learning from your parents or others?
At any rate the results suggest what many readers of this blog already know: real humans are Bayesians. Tell a frequentist next time you meet one.
PS: Andrew Gelman also posted about this article on his blog. See here.
Posted by Sebastian Bauhoff at 4:12 AM
18 January 2006
Mike Kellermann
Regular visitors to this blog have read (here, here, and here) about the recent field research conducted by Mike Hiscox and Nick Smyth of the Government Department on consumer demand for labor standards. After they described the difficulties that they faced in convincing retailers to participate in their experiments, several workshop participants remarked that the retailers should be paying them for the market research done on their behalf. Indeed, bringing rigorous experimental design to bear in such cases should be worth at least as much to corporations as the advice that they receive from consulting firms - and all we want is their data, not their money!
This discussion reminded me of an Applied Statistics talk last year given by Sendhil Mullainathan of the Harvard Economics Department on randomization in the field. He argues that there are many more opportunities for field experiments than we typically assume in the social sciences. One of the projects that he described during the talk was a field experiment in South Africa, in which a lender (unidentified for reasons that should become clear) agreed to several manipulations of its standard letter offering credit to relatively low-income, high-risk consumers. These manipulations included both economic (varying the interest rate offered) and psychological (altering the presentation of the interest rate through frames and cues of various sorts) treatments. Among the remarkable things about this experiment is the sheer number of subjects - over 50,000 individuals (all of whom had previously borrowed from the lender) received letters. It is hard to imagine a field experiment of this magnitude funded by an academic institution. Of course, the motives of the lender in this case had little to do with scientific progress; it hoped that the lessons learned from the experiment would help the bottom line. The results from the experiment suggest that relatively minor changes in presentation dramatically affected the take-up rate of loans. As one example, displaying a single example loan amount (instead of several possible amounts) increased demand by nine percent.
So, the question is why don't we do more of these kinds of experiments? One answer is obvious; social science is not consulting. The whole project of social science depends on our ability to share results with other researchers, something unlikely to please companies that would otherwise love to have the information. Unfortunately, in many cases, paying social scientists in data is probably more expensive than paying consultants in dollars.
Posted by Mike Kellermann at 12:44 AM
17 January 2006
You, Jong-Sung
In my earlier entries on “Statistics and Detection of Corruption� and “Missing Women and Sex-Selective Abortion,� I demonstrated that examination of statistical anomaly can be a useful tool for detection of crime and corruption. In these cases, binomial probability distribution was a very useful tool.
Professor Malcolm Sparrow at the Kennedy School of Government shows how network analysis can be used to detect health care fraud in his book, License to Steal: How Fraud Bleeds America's Health Care System (2000). He gives an example of network analysis performed within Blue Cross/Blue Shield of Florida in 1993.
An analyst explored the network of patient-provider relationships with twenty-one months of Medicare data, treating a patient as linked to a provider if the patient had received services during the twenty-one-month period. The resulting patient-provider network had 188,403 links within it. The analyst then looked for unnaturally dense cliques within that structure. He found a massive one. “At its densest core, the cluster consisted of a specific set of 122 providers, linked to a specific set of 181 beneficiaries. The (symmetric) density criteria between these sets were as follows:
A. Any one of these 122 providers was linked with (i.e., had billed for services for) a minimum of 47 of these 181 patients.
B. Any one of these 181 patients was linked with (i.e., had been “serviced� by) a minimum of 47, and an average of about 80, of these providers.�
After the analyst found this unnaturally dense clique, field investigations confirmed a variety of illegal practices. “Some providers were indeed using the lists of patients for billing purposes without seeing the patients. Other patients were being paid cash to ride a bus from clinic to clinic and receive unnecessary tests, all of which were then billed to Medicare.�
Professor Sparrow suggests that many ideas and concepts from network analysis can be useful in developing fraud-detection tools, in particular for monitoring organized and collusive multiparty frauds and conspiracies.
Posted by Jong-sung You at 2:36 AM
16 January 2006
Jens Hainmueller
I've decided this blog needs more humor: Check out some Statistics Movies that Never Made it to the Big Screen. Or even better: What's the question the Cauchy distribution hates the most? "Got a moment?"
Posted by Jens Hainmueller at 1:25 AM
13 January 2006
Jim Greiner
In a previous post, I introduced a definition of the ecological inference problem as applied to the legal difficulty of drawing inferences about racial voting patterns from precinct-level data on candidate support and racial makeup of the voting-age-population. As I mentioned as a previous post, very few lawyers and judges have ever contributed to the expansive literature on this question, despite the fact that ecological inference models are often used in high-profile courtroom cases.
Here's an initial contribution from the courtroom: forget about two by two tables.
The overwhelming majority of publications on the ecological inference problem concern methods for sets of two by two contingency tables. In the Voting Rights Act context, a two by two table problem might correspond to a jurisdiction in which almost every potential voter is African-American or Caucasian, and all we care about is who votes, not who the voters supported. In that case, the rows of each table are black and white, while the columns are vote and no-vote. For each precinct, we need only predict one internal cell count, and the others are determined.
This two by two case is of almost no interest in the law. The reason is that in jurisdictions in this country, the voters have three options in any electoral contest of interest: Democrat, Republican, and not voting. That means we have a minimum of three columns. In most jurisdictions of interest these days, we also have more than two rows. Hispanics constitute an increasingly important set of voters in the United States, and their voting patterns are rarely similar enough to those of African-Americans or Caucasians to allow an expert witness to combine Hispanics with one of these other groups.
Thus far, scant research exists into the R x C problem. Before a few years ago, one had two options: (i) run a set of C-1 linear models, a solution that often led to logically inconsistent predictions (such as 115 percent of Hispanic voters supported the Democrat), or (ii) pick a two by two model that includes information from the precinct-level bounds, and also available statistical information, and apply it in some way to the problem set of R x C tables at hand, perhaps by collapsing cell counts down to a two by two shape, perhaps by applying the two by two method repeatedly to draw inferences about the R x C problem at hand. Neither approach is very appealing.
A few years ago, Rosen et al. proposed a variant of a Dirichlet-Multinomial model, a serious improvement in this area. This model was and is a large step forward in the analysis of R x C ecological inference tables. Nevertheless, there is always room for improvement. The model does not respect the bounds deterministically, and it does not allow a great deal of flexibility in modeling intra-row and inter-row correlations. On the latter point, an example may clarify: Suppose we are analyzing a primary in which four candidates are running, two African-American and two Causacian. Would we expect, among (say) black voters, for the vote counts or fractions (by precinct) for the two African-American candidates to be positvely correlated?
I look forward to contributing to this research soon.
Posted by James Greiner at 5:57 AM
12 January 2006
Amy Perfors
Two of the most enduring debates in cognitive science can be summarized baldly as the "rules vs statistics" debate and the "language: innate or not?" debate. (I think these simple dichotomies are not only too simple to capture the state of the field and current thought, but also actively harmful in some ways; nevertheless, they are still a good first approximation for blogging purposes). One of the talks at the BUCLD conference, by Gary Marcus at NYU, leapt squarely into both debates by examining simple rule-learning in seven-month old babies and arguing that the subjects could only do this type of learning when the input was linguistic.
Marcus built on some earlier studies of his (e.g., pdf here) in which he familiarized seven-month infants with a list of nonsense "words" like latala or gofifi. Importantly, all of the words heard by any one infant had the same structure, such as A-B-B ("gofifi") or A-B-A ("latala"). The infants heard two minutes of these type of words, and then were presented with a new set of words using different syllables, half of which followed the same pattern as before, half of which followed a new pattern. Marcus found that infants listened longer and paid more attention to the words with the unfamiliar structure, which they could have done only if they successful abstracted that structure (not just the statistical relationships between particular syllables). Thus, for instance, an infant who heard many examples of words like "gofifi" and "bupapa" would be more surprised to hear "wofewo" than "wofefe"; they have abstracted the underlying rule. (The question of how and to what extent they abstracted the rule is rather debated, and I'm not going to address it here).
The BUCLD talk focused instead on another question: did it matter at all that the stimuli they heard were linguistic rather than, say, tone sequences? To answer this question, Marcus did the same experiment with sequences of various kinds of different tones and tambors in the place of syllables (e.g. "blatt blatt honk" instead of "gofifi"). His finding? Infants did not respond differently in testing to the structures they had heard - that is, they didn't seem to be abstracting the underlying rule this time.
There is an unfortunately large confounding factor, however: infants have a great deal more practice and exposure to language than they do to random tones. Perhaps the failure was rather one of discrimination: they didn't actually perceive different tones to be that different, and therefore of course could not abstract the rule. To test this, Marcus trained infants on syllables but tested them on tones. His reasoning was that if it was a complete failure of discrimination, they shouldn't be able to perceive the pattern in tones when presented in testing any more than they could when presented in training. To his surprise, they did respond differently to the tones in testing, as long as they were trained on syllables. His conclusion? Not only can infants do cross-modal rule transfer, but they can only learn rules when they are presented linguistically, though they can then apply them to other domains. Marcus argued that this was probably due to an innate tendency in language, not a learnt effect.
It's fascinating work, though rather counterintuitive. And, quite honestly, I remain unconvinced (at least about the innate tendency part). Research on analogical mapping has shown that people who have a hard time perceiving underlying structure in one domain can nevertheless succeed in perceiving it if they learn about the same structure in another and map it over by analogy. (This is not news to good teachers!) It's entirely possible - and indeed a much simpler hypothesis - that babies trained on tones lack the experience they have with language and hence find it more difficult to pick up on the differences between the tones and therefore the structural rule they embody. But when first trained on language - which they do have plenty of practice hearing - they can learn the structure more easily; and then when hearing the tones, they know "what to listen for" and can thus pick out the structure there, too. It's still rule learning, and even still biased to be easier for linguistically presented things; but that bias is due to practice rather than some innate tendency.
Posted by Amy Perfors at 2:13 AM
11 January 2006
Drew Thomas
Spatial Statistical methodology is beginning to gain popularity as a methodological tool in the natural and social sciences. At Harvard, Prof. Rima Izem is leading the way towards the use of these techniques across many disciplines. This semester, Prof. Izem debuted her Spatial Statistics seminar, which met Wednesday afternoons in the Statistics Department.
Of those topics discussed in the seminar, lattice data analysis proves to be invaluable to the analysis of well-defined electoral districts. The principle of lattice data is that our land area can be divided into mutually exclusive, complete and contiguous divisions; interactions between the divisions can then be analyzed through various covariance methods.
A full understanding of spatial interaction may prove to be valuable to electoral analysis. Determining the interdependence of districts through means other than traditional covariates may suggest the presence of a true "neighbor effect." How one determines the covariance of districts may prove to be more art than science, but the depth of work yet to be done in this field should give many opportunities for meaningful investigation.
Posted by Andrew C. Thomas at 1:41 AM
10 January 2006
I'm thrilled to announce that Adam Glynn, Ph.D. candidate in the Department of Statistics at the University of Washington, has accepted the offer of the Government Department to be an Assistant Professor here. Adam is a political methodologist and will also be a resident faculty member at the Institute for Quantitative Social Science. His recent work shows how to improve ecological inferences with small, strategically selected samples of individuals. And as it turns out, he can also do the reverse: his work uses ecological inferences from aggregate data to adjust the relationships among the variables in survey data in a manner better than the sometimes current practice of adjusting only the marginals. He has also done work in a variety of other interesting areas. Welcome Adam!
Posted by Gary King at 10:57 PM
You, Jong-Sung
Duggan and Levitt's (2002) article on "corruption in sumo wrestling" demonstrates how statistical analysis may be used to detect crime and corruption. Sumo wrestling is a national sport of Japan. A sumo tournament involves 66 wrestlers participating in 15 bouts each. A wrestler with a winning record rises up the official ranking, while a wrestler with a losing record falls in the rankings. An interesting feature of sumo wrestling is the existence of a sharp nonlinearity in the payoff function. There is a large gap in the payoffs for seven wins and eight wins. The critical eighth win garners a wrestler roughly four times the value of the typical victory.
Duggan and Levitt uncover striking evidence that match rigging takes place in the final days of sumo tournaments. They find that the wrestler who is on the margin for an eighth win is victorious with an unusually high frequency, but the next time those same two wrestlers face each other, it is the opponent who has a very high win percentage. This suggests that part of the currency used in match rigging is promise of throwing future matches in return for taking a fall today. They present a histogram of final wins for the 60,000 wrestler-tournament observations between 1989 and 2000, in which a wrestler completes exactly 15 matches. Approximately 26.0 percent of all wrestlers finish with eight wins, compared to only 12.2 percent with seven wins. The binomial distribution predicts that these two outcomes should occur with an equal frequency of 19.6 percent. The null hypothesis that the probability of seven and eight wins is equal can be rejected at resounding levels of statistical significance. They report that two former sumo wrestlers have made public the names of 29 wrestlers who they allege to be corrupt and 14 wrestlers who they claim refuse to rig matches. Interestingly, they find that wrestlers identified as "not corrupt" do no better in matches on the bubble than in typical matches, whereas those accused of being corrupt are extremely successful on the bubbles.
A similar kind of empirical study of corruption dates to 1846, when Quetelet documented that the height distribution among French men based on measurements taken at conscription was normally distributed except for a puzzling shortage of men measuring 1.57–1.597 meters (roughly 5 feet 2 inches to 5 feet 3 inches) and an excess number of men below 1.57 meters. Not coincidentally, the minimum height for conscription into the Imperial army was 1.57 meters (recited from Duggan and Levitt 2002). These examples show that detection of statistical anomaly can give compelling evidence of corruption.
Corruption in conscription has been a big political issue in South Korea. Examination of anomaly in the distributions of height, weight, eyesight at each physical examination site for conscription may provide evidence of cheating and/or corruption. This kind of statistical evidence will fall short of proving crime or corruption, but will make a sufficient case for thorough investigation.
Posted by Jong-sung You at 6:39 AM
9 January 2006
Sebastian Bauhoff and Jens Hainmueller
A perfect method for adding drama to life is to wait until a paper deadline looms large. So you're finding yourself at the eve of a deadline, "about to finish" for the last 4 hours, and not having formatted any of the tables yet? Still copying STATA tables into MS Word? Or just received the departmental brainwash regarding statistical software and research best practice? Here are some interesting tools you could use to make life easier and your research more effective. On the Big Picture level, which tools to use is as much a question of philosophy as of your needs: open-source or commercial package? At Harvard, students often use one of the two combos: MS Word and Stata (low tech) or LaTeX and R (high tech). What type are you?
If you're doing a lot of data-based research, need to type formulas and often change your tables, you might want to consider learning LaTeX. Basically, LaTeX is a highly versatile type-setting environment to produce technical and scientific documents with the highest standards of typesetting quality. It's for free and LaTeX implementations are available for all platforms (Linux, Mac, Windows, etc). Bibliographies are easily managed with Bibtex. And you can also produce cool slides using ppower4. At the Government Department, LaTeX is taught to all incoming graduate students and many of them hate it at the beginning (it's a bit tricky to learn), but after a while many of them grew true LaTeX fetishists (in the metaphorical sense, of course).
Ever wondered why some papers look nicer than Word files? They're done in LaTex. A drawback is that they all look the same, of course. But then, some say having your papers in LaTeX-look is a signal that you're part of the academic community...
LaTeX goes well with R, an open-source statistical package modeled on S. R is both a language and an environment for statistical computing. It's very powerful and flexible; some say the graphical capabilities are unparalleled. The nice thing is that R can output LaTeX tables which you can paste directly into your document. There are many ways to do this, one easy way is to use the "LaTeX" function in the design library. A mouse-click later, your paper shines in pdf format, all tables looking professional. As with LaTeX, many incoming graduate students at the Government Department suffer learning it, but eventually most of them never go back to their previous statistical software.
But you are actually looking for a more user friendly modus vivid? Don't feel like wasting your nights writing code and chasing bugs like a stats addict? Rather, you like canned functions, and an easy-to-use working environment. Then consider the MS Word and STATA combo. Getting STATA output to look nice in Word is rather painful unless you use a little tool called outreg or alternatively estout (the latter also produces copy and paste-able LaTeX tables). Outreg is an ado-file that produces a table in Word format, and you can simply apply the normal formatting functions in Word. The problem is that outreg outputs only some of the tables that STATA produces, and so you're stuck having to format at least some. But of course there are many formatting tools available in Word.
So you make your choice depending on how user-friendly and or flexible you like it. But whether you're using Word/STATA or LaTeX/R, one tool comes in handy anyway: WinEdt is a shareware that can be used to write plain text, html, LaTeX etc. (WinEdt automatically comes with a LaTeX engine, so you won't need to install that.) The software can also serve as do-file editor for STATA and R. You can download configuration files that will highlight your commands in WinEdt, do auto-saves whenever you like (ever lost your STATA do-file??) and send your code to STATA or R just like the built-in editors would do. Alternative are other powerful word editors like Emacs, etc.
Confused? Can't decide? Well, your're certainly not the only one. In the web, people fight fervent LaTeX vs Word wars (google it!). We (the authors) recommend using LaTeX and R. This is the way we work, because, as Gary uses to say "if we knew a better way of working we would use it" -- is that what's called a tautology?! :-).
Posted by Sebastian Bauhoff at 2:44 AM
6 January 2006
Jim Greiner
Alchemists' gold. The perpetual motion machine. One might also think of cold fusion and warm superconductors. These are some of the great mythical aims of the so-called "hard" sciences. A few of these concepts have also been compared to attempts at ecological inference, the search for accurate predictions about the internal cell counts of a set of contingency tables (such as one for each precinct) when only the row and column totals of table are observed. The fundamental problem of ecological inference is, of course, that radically different internal cell counts can lead to identical row and column totals, and because we only get to see the row and column totals, we cannot distinguish among these different sets of counts. Another way of saying this is that the problem is impossible to solve deterministically (since the relationship between the cell entries and row and column marginals is not one-to-one), causing some to label ecological inference an "ill-posed inverse problem". In fact, without making some statistical assumptions, the estimation problem would not be identified, although it would be bounded because some values for the cell entries are are ruled out for each precinct's contingency table by the observed column and row totals (these are called "the bounds").
Ecological inference arises in the legal setting in cases litigated under the Voting Rights Act. Section 2 of the VRA prohibits a state or municipality from depriving a citizen, on account of race or ethnicity, of an equal opportunity to participate in the political process and to elect candidates of his/her choice. The Delphic statute has been interpreted to disallow districting schemes that have the effect of diluting minority voting strength. In practice, to succeed in a vote dilution claim, a plaintiff must almost always prove that voting in the relevant jurisdiction is racially polarized, meaning that whites vote differently from blacks who vote differently from Hispanics. Because the secret ballot prevents direct observation of voting patterns, expert witnesses are forced to attempt the dangerous task of drawing inferences about racial voting patterns from precinct-level candidate support counts (column totals) and precinct-level racial voting-age-populations (row totals).
A large literature exists on the ecological inference problem. Bizarrely, one constituency has rarely if ever contributed to this debate: the lawyers and judges who consume a great deal of what the literature produces. I'll be attempting to start to fill this gap in subsequent entries.
Posted by James Greiner at 5:56 AM
5 January 2006
Drew Thomas
My home country is in chaos - of a sort. With the dissolution of Parliament on November 29, Canada is heading into a federal election.
As a multiparty parliamentary democracy, predicting political outcomes in Canada isn't simply a matter of reading a thermometer. Of course, it isn't even that simple in a two-party system, but it gets me thinking about prediction methods.
I've been working with Gary on JudgeIt, a program used to evaluate voting districts for a variety of conditions, designed for a two-party system. With an emphasis on Bayesian simulation, its methods make use of uniform partisan swing -- a shift in the percentage of voters moving from one party to the other, and in the same proportion in each district -- to determine the likely outcomes given a set of covariates and a history of behaviour in the particular system.
What caught my attention was a series of election prediction websites, making use only of previous election information, which allows the user to input what they expect to be either the vote shares or swings in support. This by itself is mathematically unremarkable, but may keep political junkies up hours.
The real question of interest remains: by what process can a system predict who will gain whose votes in a shift in support? In most Canadian ridings (districts), seats are contested by three parties: from left to right, the socialist New Democrats, the incumbent Liberals and the opposition Conservatives. For the most part, votes lost by an outer party would naturally flow to the Liberals. In this election, however, a scandal which led to the election call may prove to cost the Liberals a good deal of support.
Since geography -- and hence, demography -- dictate much of the Canadian political climate, I have no doubt that the appropriate covariates are out there, waiting to be measured and/or analyzed. In the meantime, I'm keeping my head away from election speculation and looking to see if this problem has already been solved. Anyone out there have any suggestions?
Posted by Andrew C. Thomas at 3:57 AM
4 January 2006
John Friedman
In my previous posts on this subject (see here for the most recent), I have explored our legal system's reliance on expert witnesses from game-theoretic and personal perspectives. In this post, I take an entirely different approach, and ask the question: why is our system structured so?
The first question by many might be: what are the alternatives? The traditional example is the French system, known as the Civil Law system (as opposed to the British-based Common Law system). In France, a government judge acts as would the lawyers, judge, and jury in the American system. This judge calls witnesses suggested by the parties (plus some others of his choosing), questions them himself, and then decides upon the proper course of action. Trials often finish in one day; justice is summarily, if crudely, dispensed.
So why did these two systems develop differently, separated by less than 100 miles of the English Channel? Though many answers surely exist in the historical literature, I offer one theory presented by Edward Glaeser and Andrei Shleifer, both in the Harvard Economics Department. They place the roots of the two legal systems in the political circumstances in England and France in the 12th and 13th centuries, when the first characteristics of these procedures emerged.
The key element of a legal system, argue Glaeser and Shleifer, is its ability to limit the influence of corruption and coercion. Viewed from this perspective, the strengths and weaknesses of juries versus government (then royal) judges become clear. Juries, composed mostly of local commoners, would be subject to much coercion by local feudal lords. Royal magistrates, on the other hand, would be far less susceptible to such forceful persuasion, but would be far more easily bribed by the king. A country's choice between these two systems should depend on which problem is more dire: The threat of regional "bullies" or of royal domination.
Glaeser and Shleifer survey the historical record to argue that exactly this difference existed between England and France in the late middle ages. England, recently conquered by and still under the rule of the Normans, had a much stronger monarchy, which imposed order on the countryside. The smaller lords, with whom King John negotiated the Magna Carta, feared royal domination far more than they feared each other, and were willing to accept the possibility of local bias in juries so that the king would not interfere. France, on the other hand, was far more violent, torn between many competing barons. These dukes feared each other most of all, and knew that any jury would quickly fall under the sway of the local ruler; thus, they were willing to cede control of the legal system to the king.
I am not an historian, and so I cannot know whether these arguments accurately reflect the genesis of our legal system. But even if the true explanation lies elsewhere, surely it will have the same historical feel. These institutions have great inertia, and so it does not surprise me that factors so long ago have explanatory power. Nonetheless, is this the best we can do? Does our legal system reduce to an historical anachronism?
Posted by James Greiner at 3:07 AM
3 January 2006
Amy Perfors
An issue inherent in studying language acquisition is the sheer difficulty of acquiring enough accurate naturalistic data. In particular, since many questions hinge on what language input kids hear - and what language mistakes and capabilities kids show - it's important to have an accurate way of measuring both of these things. Unfortunately, short of following a child around all day with a tape recorder (which people have done!), it's hard to get enough data to have an accurate record of low-frequency items and productions; it's also hard to know what would be enough. Typically, researchers will record a child for a few hours at a time for a few weeks and then hope that this represents a good "sample" of their linguistic knowledge.
A paper by Caroline Rowland at the University of Liverpool, presented at the BUCLD conference in early November, attempts to assess the reliability of this sort of naturalistic data by comparing it to diary data. Diary data is obtained by having the caregiver write down every single utterance produced by the child over a period of time; as you can imagine, this is difficult to persuade someone to do! There are clear drawbacks to diary data, of course, not least of which is that as the child speaks more and more it becomes less and less accurate. But because it has a much better likelihood of incorporating low-frequency utterances, it provides a good baseline comparison in that respect to naturalistic, tape-recorded data.
What Rowland and her coauthor found is perfectly in line with what is known about statistical sampling. As the subsets of tape-recorded conversations got smaller, estimates of low-frequency terms became increasingly unreliable, and single segments less than three hours were nearly completely useless (as they said in the talk, they were "rubbish." Oh how I love British English!) It is also more accurate to use, say, four one-hour chunks from different conversations rather than one four-hour segment, as the former avoids "burstiness effects" that come from conversations and settings predisposing to certain topics.
Though this result isn't a surprise from a statistical sampling point of view, it is nice for the field to have some estimates of how little is "too little" (though of course how little depends somewhat on what you are looking for). And the paper highlights important methodological issues for those of us who can't trail after small children with our notebooks 24 hours a day.
Posted by Amy Perfors at 2:19 AM