Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 | 29 |
30 |
« March 2006 | Main | May 2006 »
28 April 2006
Amy Perfors
I've posted before about the "irrational" reasoning people use in some contexts, and how it might stem from applying cognitive heuristics to situations they were not evolved to cover. Lest we fall into the depths of despair about human irrationality, I thought I'd talk about another view on this issue, this time showing that people may be
In Simple heuristics that make us smart Gigerenzer et. al. argue that, contrary to popular belief, many of the cognitive heuristics people use are actually very rational given the constraints on memory and time that we have to face. One strand of their research suggests that people are far better at reasoning about probabilities when they are presented as natural frequencies rather than numbers (as most studies do). Thus, for instance, if people see pictures of, say, 100 cars, 90 of which are blue, they are more likely not to "forget" this base rate than if they are just told that 90% of cars are blue.
A recent paper in the journal Cognition (vol 98, 287-308) expands on this theme. Zhu & Gigerenzer found that children steadily gain in the ability to reason about probabilities, as long as the information is presented using natural frequencies. Children were told a story such as the following:
Pingping goes to a small village to ask for directions. In this village, the probability that the person he meets will lie is 10%. If a person lies, the probability that he/she has a red nose is 80%. If a person doesn't like, the probability that he/she also has a red nose is 10%. Imagine that Pingping meets someone in the village with a red nose. What is the probability that the person will lie?
Another version of the story gave natural frequencies instead of conditional probabilities, for instance "of the 10 people who lie, 8 have a red nose." None of the fourth-grade through sixth-grade children could answer the conditional probability question correctly, but sixth graders approached the performance of adult controls for the equivalent natural frequency question: 53% of them matched the correct Bayesian posterior probability. The fact that none of the kids could handle the probability question is not surprising -- they had not yet been taught the mathematical concepts of probability and percentage. What
The most interesting part of this research, for me, is less about the question of whether people "are Bayesian" (whatever that means), but rather that it highlights a very important message: representation matters. When information is presented using a representation that is natural, we find it a lot easier to reason about it correctly. I wonder how many of our apparent limitations reveal less about problems with our reasoning, and more about the choice or representation or the nature of the task.
Posted by Amy Perfors at 6:00 AM
27 April 2006
Felix Elwert
Why did people code their missing values as real numbers such as 999 in the old days? Why not “." from the get go? And why do many big, federally funded surveys insist on numerical missing values to this day?
Don’t we all have stories about how funny missing value codes (“-8") got people in trouble (think The Bell Curve)? Are there any anecdotes where people got in trouble for mistaking “." for a legitimate observation?
Posted by Felix Elwert at 6:00 AM
26 April 2006
Sebastian Bauhoff
A group at the Indiana School of Informatics has developed a software to detect whether a document is "human written and authentic or not." The idea was inspired by the successful attempt of MIT students in 2004 to place a computer-generated document at a conference (see here). Their program collated random fragments of computer science speak into a short paper that was accepted at a major conference without revision. (That program is online and you can generate your own paper, though unfortunately it only writes computer science articles).
The new tool lets users paste pieces of text and then assesses whether the content is likely to be authentic or just gibberish. The program tries to identify human-style writing that is characterized by certain repition patterns and apparently does rather well. It is not clear whether this works well for social science type articles. The first paragraphs of a recent health economics article (to remain unnamed) only have a 35.5% chance of being authentic. Hmm...
So is this just a joke or useful programming? The authors say it could be used to differentiate whether a website is authentic or bogus, or to identify different types of texts (articles vs blogs, for example). I wonder what the algorithms behind such technology are, and whether this will lead to an arms race between fakers and detectors? If one of them can recognize a human-written text could this be used by the faking software?
If further tweaked, could this have an application in the social sciences? Maybe we could use the faking software to search existing papers, collate them smartly and use that to identify patterns and get new ideas? Maybe everyone should run their papers through a detector software before submitting it to a journal or presenting at a workshop? And students watch out! No more random collating at 3am to meet the next day deadline!
PS: this blog entry has been classified as "inauthentic with a 26.3% chance of being an authentic text"...
Posted by Sebastian Bauhoff at 2:41 PM
Sebastian Bauhoff
In the last entry I wrote that China is the new exciting trend for researchers interested in development issues. There are now a number of surveys available, and it is getting easier to obtain data. (For a short list, see here.) However there are two key issues that are still pervasive: language difficulties and little sharing of experiences.
While some Chinese surveys are available in English translation, it is still difficult to fully understand their context. China is a very interesting yet peculiar place. It clearly helps to work with someone who speaks (and reads!) the language, though you might still miss some unexpected information -- and there are many things that can be surprising.
More annoying however is the lack of sharing of information and data. This problem has two associated parts. For the existing data, people seem to struggle with similar problems but don't provide their solutions to others. In the case of the China Health and Nutrition Survey for example, numerous papers have been written on different aspects and the key variables are being cleaned over and over. Apart from the time that goes into that, this can lead to different results.
Another lack of sharing is with regards to existing data or ongoing surveys. There are now a lot of people either who either have or are currently collecting data in China. But it is rather difficult even to find out about existing sources. If you're lucky, you've found an article that uses one. If you're not you might find one only once you put in your funding application.
To really start exploring the exciting opportunities that China may have to offer for research, these problems need to get fixed. I can understand that people don't necessarily want to hand over their data, but it seems that there is too little known about existing surveys, even to researchers who have been working on China for longer. And as for the cleaning of existing data and reporting problems, it just seems like a waste not to share. I wonder if there are similar experiences from other countries?
Posted by Sebastian Bauhoff at 6:00 AM
25 April 2006
You, Jong-Sung
There was a big scandal in scientific research recently. Dr. Hwang Woo-suk, Seoul National University in Korea, announced last June that he and his team had cloned human embryonic stem cells from 11 patients. It was a remarkable breakthrough in stem cell research and many people expected that he would eventually get a Nobel Prize. Hwang's team, however, was found to have intentionally fabricated key data in two landmark papers on human embryonic stem cells, according to a Seoul National University panel. Now, the prosecution is probing into his team’s alleged fabrication of data and violation of bioethics law.
Remarkably, the prestigious journal Science was not able to detect the data faking before and after publication of the articles. It is understandable considering that peer reviewers typically examine the presented analysis of the data but do not receive nor examine the actual data itself. Even more surprisingly, most of the 26 co-authors of the June 2005 article were unaware of the data fabrication. It was revealed only through an inside whistleblower who was the second author of the earlier article, and through a team of investigative journalists.
This incident makes us aware of the weakness and vulnerability of the review system of academic journals. Indeed, there have been many fraud cases in the history of scientific research, and Dr. Hwang has just added one more such case. Although outright faking may not be very common, errors in data and data analysis might be much more common than most people assume them to be.
I was struck by numerous errors that were found by students of Gov 2001 who replicated the analysis of an article published in a prominent social science journal. Many of the errors are probably benign and not critical to their key findings, but some errors may be critical and even deliberate. It can be tempting to distort the data or results of data analysis when a researcher has spent much time and energy to find evidence to support his or her hypothesis and the results are close but fall short of significance.
In his entry entitled Citing and Finding Data, Gary King discussed the [in]ability to reliably cite, access, and find quantitative data, all of which remain in an entirely primitive state of affairs. Sebastian Bauhoff also stressed the need for making data available in his entry Data Availability. I cannot agree with them more. If journals require authors to submit data as well as manuscript of their paper and publish data that were used for articles as an on-line appendix, it will certainly reduce the errors in data and data analysis as well as spur further research. This should be applied to qualitative data (such as interview transcripts) as well as quantitative data.
Posted by Jong-sung You at 6:00 AM
24 April 2006
This week the Applied Statistics Workshop will present a talk by Brian Ripley, Professor of Applied Statistics at the University of Oxford. Professor Ripley received his Ph.D. from the University of Cambridge and has been on the faculties of Imperial College, Strathclyde, and Oxford. His current research interests are in pattern recognition and related areas, although he has worked extensively in spatial statistics and simulation, and continues to maintain an interest in those subjects. New statistical methods need good software if they are going to be adopted rapidly, so he maintains an interest in statistical computing. He is the co-author of Modern Applied Statistics with S , currently in its fourth edition. Professor Ripley is also a member of the R core team, which coordinates the R statistical computing project, a widely adopted open-source language for statistical analysis.
Professor Ripley will present a talk entitled "Visualization for classification and clustering." Slides for the talk are available from the course website. The presentation will be at noon on Wednesday, April 26 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
Posted by Mike Kellermann at 3:49 PM
Jim Greiner
I’ve blogged previously about the course in statistics and law I’m co-teaching this semester (see, for example, here). The course is now in its second simulation, which deals with employment discrimination. In a recent class, the 80% rule came up. I wish it hadn’t. In fact, I wish the ``rule�? had never seen the light of day. In this post, I’ll explain what the 80% rule is. In a subsequent post, I’ll explain why it stinks.
Suppose we’re interested in figuring out whether members of a protected class (say, women) are being hired, promoted, fired, disciplined, whatever at a different rate from a comparison group (say, men, and for the sake of discussion, let’s say we’re interested in hiring). Long ago, the Equal Opportunity Employment Commission (“EEOC�?) released a statement saying that it would ordinarily regard as suspect a situation in which the hiring rate for women was less than 80% of the hiring rate for men. Note that the EEOC has the authority to bring suit in the name of the United States against a defendant that has violated federal employment discrimination laws.
It would be bad enough of the EEOC used the 80% rule for the purpose it gave, i.e., a statement about how the agency would exercise its investigative and prosecutorial discretion. Alas, courts, perhaps desperate for guidance on quantitative principles, have picked up on the idea, and some now use it as an indicator of which disparities are legally significant. Courts do so despite the outcry of those in the quantitative community interested in such things. More on that outcry in my next post.
Posted by James Greiner at 6:00 AM
21 April 2006
Amy Perfors
Since the days of Kahneman & Tversky, researchers have been finding evidence showing that people do not reason about probabilities as they would if they were "fully rational." For instance, base-rate neglect -- in which people ignore the frequency of different environmental alternatives when making probability judgments about them -- is a common problem. People are also often insensitive to sample size and to the prior probability of various outcomes. (this page offers some examples of what each of these mean).
A common explanation is that these "errors" arise as the result of using certain heuristics that usually serve us well, but lead to this sort of error in certain circumstances. Thus, base-rate neglect arises due to the representativeness heuristic, in which people assume that each case is representative of its class. So, for instance, people watching a taped interview with a prison guard with extreme views will draw conclusions about the entire prison system based on this one interview -- even if they were told in advance that his views were extreme and unusual, and that most guards were quite different. The prison guard was believed to be
In many circumstances, a heuristic of this sort is sensible: after all, it's statistically unlikely to meet up with someone or something that is, uh, statistically unlikely -- so it makes sense to usually assume that whatever you interact with is representative of things of that type. The problem is -- and here I'm harking back to a theme I touched on in an earlier post -- that this assumption no longer works in today's media-saturated environment. Things make it into the news precisely
Posted by Amy Perfors at 6:00 AM
20 April 2006
Jens Hainmueller and Michael Hiscox
We have written a paper that investigates individual attitudes toward immigration in 22 European countries. In line with our research on individual attitudes toward trade policies (see previous blog entries here, here, and here), we find that a simple labour market model (a la Heckscher-Ohlin) does not do very well in accounting for preferences at the individual level. This finding resonates well with economic theory, given that more recent economic models are actually quite equivocal about whether immigrants will have an adverse impact on the wages or employment opportunities of local workers with similar skills (see our discussion of these models here).
Please find our abstract after the jump. Here is the link to the paper. As always, comments are highly appreciated.
Educated Preferences: Explaining Attitudes Toward Immigration In Europe:
Recent studies of individual attitudes toward immigration emphasize concerns about labor market competition as a potent source of anti-immigrant sentiment, in particular among less-educated or less-skilled citizens who fear being forced to compete for jobs with low-skilled immigrants willing to work for much lower wages. We examine new data on attitudes toward immigration available from the 2003 European Social Survey. In contrast to predictions based upon conventional arguments about labor market competition, which anticipate that individuals will oppose immigration of workers with similar skills to their own, but support immigration of workers with different skill levels, we find that people with higher levels of education and occupational skills are more likely to favor immigration regardless of the skill attributes of the immigrants in question. Across Europe, higher education and higher skills mean more support for all types of immigrants. These relationships are almost identical among individuals in the labor force (i.e., those competing for jobs) and those not in the labor force. Contrary to the conventional wisdom, then, the connection between the education or skill levels of individuals and views about immigration appears to have very little, if anything, to do with fears about labor market competition. This finding is consistent with extensive economic research showing that the income and employment effects of immigration in European economies are actually very small. We find that a large component of the effect of education on attitudes toward immigrants is associated with differences among individuals in cultural values and beliefs. More educated respondents are significantly less racist and place greater value on cultural diversity than their counterparts; they are also more likely to believe that immigration generates benefits for the host economy as a whole.
Posted by Jens Hainmueller at 6:00 AM
19 April 2006
Drew Thomas
It seems that the difficulty in learning languages isn't always restricted to spoken words. A recent article in the New York Times ("Searching For Dummies", March 26 - here's a link, though it's for pay now) quotes an Israeli study which demonstrates the ineptitude of graduate students in making specific Internet searches in 2002.
Now, I know a lot has happened in the world of search engines in the last 4 years, and I admit my bias in being an MIT undergrad at the time meant that I was waist-deep in Google and its way of sorting information. See if you can't do any of these challenges now, with no time limit:
"A picture of the Mona Lisa; the complete text of either "Robinson Crusoe" or "David Copperfield"; and a recipe for apple pie accompanied by a photograph."
What's the trick to this kind of searching? Unless you have an excellent, selective and disambiguating search engine, knowing search grammar and context is essential.
For example, getting the text of David Copperfield is now a three-hop, one search process: search for it on Google, and select the Wikipedia entry, which has been cleanly separated from the magician and includes not one but
So the technology has gotten better. But the illusion of control remains; I find it more difficult to find other disambiguations that Wikipedia hasn't considered. Moreover, for any meaningful searches, such as to relevant papers in particular areas where I don't know the nomenclature, this feeling of power is challenged.
This is a skill that permeates all levels of society, from kindergarten on up, but there's a definite lack of appreciation for it. To learn it like a language, early on and with constant practice, seems to be the solution; to learn the context, grammar and syntax of the search (and research), and to appreciate that we're trying to communicate our intentions using all the tools we have available; by blaming them, we all typify poor carpenters.
Posted by Andrew C. Thomas at 6:00 AM
18 April 2006
Jim Greiner
In my last post, I pointed out that when presented with a causal inference situation of treatment, intermediate outcome, and final outcome, we have to be careful to define a sharp question of interest. Sometimes, we’re interested in the ITT, or the effect of the treatment on the final outcome. At other times, we’re interested in the effect of the intermediate outcome on the final outcome, and the treatment is our best way of manipulating the intermediate outcome so as to draw causal inferences.
In my view, these principles are important in the legal context. Take race in capital sentencing, for example.
To begin, it’s a big step to draw causal inferences about race in a potential outcomes framework; the maxim "no causation without manipulation"? (due, I believe, to Paul Holland) explains why. I believe that step can be taken, but that’s another subject. Suppose we take it, i.e., we decide to apply a potential outcomes framework to an immutable characteristic. The treatment (applied to the capital defendant) is being African-American, the intermediate outcome is whether the defendant is convicted, and the final outcome is whether a convicted defendant is sentenced to die. (Note that, in an instance of fairly macabre irony, if one applies the language of censoring or truncation due to death here, "death"? is an acquittal on the capital charge.)
What causal question do we care about? If all we want to study is the relationship between race and the death penalty, then we don’t care whether a defendant avoids a death sentence via acquittal or avoids a death sentence after a conviction by being sentenced to life. If, on the other hand, what we want to study is fairness in sentencing proceedings, then we need principal stratification; we need to isolate a set of defendants who would be convicted of the capital charge if African-American and convicted of the capital charge if not African-American. Both are potentially interesting causal questions. Let’s just make sure we know which we’re asking.
Posted by James Greiner at 6:00 AM
17 April 2006
This week the Applied Statistics Workshop will present a talk by Gerard van den Berg, Professor of Labor Economics at the Free University of Amsterdam. Before joining the faculty at Amsterdam in 1996, he worked at Northwestern University, New York University, Stockholm School of Economics, Tilburg University, Groningen University, and INSEE-CREST. From 2001 to 2004, he was Joint Managing Editor of The Economic Journal, and has published in Econometrica, Review of Economic Studies, American Economic Review, and other journals. He is currently a visiting scholar at the Center for Health and Wellbeing at Princeton University. His research interests are in the fields of econometrics, labor economics, and health economics, notably duration analysis, treatment evaluation, and search theory.
Professor van den Berg will present a talk entitled "An Economic Analysis of Exclusion Restrictions for Instrumental Variable Estimation." The paper is available from the course website. The presentation will be at noon on Wednesday, April 19 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
Posted by Mike Kellermann at 12:00 AM
14 April 2006
You, Jong-Sung
Here are some good statistics jokes for all of you.
Is the use of humor an effective way of teaching statistics? Lomax and Moosavi (1998), citing J. Bryant and D. Zillmann (1988) suggest that there is little empirical evidence that humor either (1) increases student attention, (2) improves the classroom climate or (3) reduces tension. Fortunately, however, the same research indicates that humor actually does (1) increase enjoyment and (2) motivates students to achieve higher. Hence, it may not be a bad idea to incorporate some statistical jokes (their article and Gary Ramseyer's website are two good sources).
This isn't a joke as such, but here is another interesting statistical dialogue from Lomax and Moosavi:
Q. I read that a sex survey said the typical male has six sexual partners in his life and the typical female has two. Assuming the typical male is heterosexual, and since the number of males and females is approximately equal, how can this be true?
A. You’ve assumed that "typical" refers to the arithmetical average of the numbers. But "average" also means "middle" and "most common". (Statisticians call these three kinds of averages the mean, the median and the mode, respectively.) Here’s how the three are used: Say you’re having five guests at a dinner party. Their ages are 100, 99 17, 2, and 2. You tell the butler that their average age is 44 (100+99+17+2+2=220¸5=44). Just to be safe, you tell the footman their average age is 17 (the age right in the middle). And to be sure everything is right, you tell the cook their average age is 2 (the most common age). Voila! Everyone is treated to pureed peas accompanied by Michael Jackson’s latest CD, followed by a fine cognac. In the case of the sex survey, "typical" may have referred to "most common", which would fit right in with all the stereotypes. (That is, if you believe sex surveys.)
Posted by Jong-sung You at 6:00 AM
13 April 2006
Sebastian Bauhoff
While the media keeps preaching that this century is Chinese, many researchers are getting excited about new opportunities for data collection and access to data. For the past decades, many development researchers have focused on India because of the regional variation and good infrastructure for surveys. It seems that now China holds a similar promise, and could provide an interesting comparison to India.
I recently started collecting information on China (here); below are some highlights. If you know of more surveys, do let me know.
Probably the best known micro-survey at this point is the China Health and Nutrition Survey CHNS, which is a panel with rounds in 1989, 1991, 1993, 1997, 2000, and 2004 (the 2006 wave is funded) and covers more than 4,000 households in 9 provinces. Though this is an amazing dataset, using it is not always easy. For example there are problems of linking individuals over time. New longitudinal master files are continuously released but the fixes are sometimes are hard to integrate in ongoing projects (the ID's are mixed up). Also there seem to be some inconsistencies in the recording, especially in earlier rounds and some key variables such as education. The best waves seem to be those of 1997 and 2000.
There is also a World Bank Living Standards Measurement Study (LSMS) for China. That survey used standardized (internationally comparable?) questionnaires and was conducted in 780 households and 31 villages in 1996/7. For those interested in the earlier periods, there is commercial data at the China Population Information and Research Center which has mainly census-based data starting from 1982. The census itself is also available electronically now (and with GIS maps) but there is a lively debate as to how reliable the figures are, and whether key measures changed over time. But it should still be good for basic cross-sectional analysis.
Posted by Sebastian Bauhoff at 6:00 AM
12 April 2006
Jim Greiner
A few weeks ago, Felix Elwert gave a bang-up presentation at the Wednesday seminar series on the effect of cohabitation on divorce rates (see here). One of the most interesting points I took away from the discussion was the following: in some social science situations in which a treatment is followed by an intermediate outcome, then by a final outcome, we might be interested in different causal questions. One causal question is the effect of the treatment on the final outcome; this is commonly called the intention-to-treat effect (ITT). The name comes from, I believe, an encouragement design context; the treatment is an encouragement to, say, get a vaccine, the intermediate outcome is whether a test subject gets a vaccine, the final outcome is whether the test subject gets a disease, and the ITT is the effect of encouragement on disease rates.
A second causal question different from the ITT is the effect of the intermediate outcome on the final outcome; in the vaccine example above, the question here would be the effect of the vaccine on disease rates.
Felix’s point was that if we think of cohabitation as the treatment, marriage as the intermediate outcome, and divorce as the final outcome, there are different causal questions we might want to ask. Those of us steeped in a principal stratification and a truncation due to ``death" way of looking things might jump to the conclusion that the idea of divorce makes no sense for people who don’t get married. Thus, the only ``right" way to look at this situation, we might say, is to isolate the set of people who would get married regardless of cohabitation (the treatment). Not so. If what we’re really interested in is avoiding divorce per se (maybe because divorce is stigmatizing, more stigmatizing than not ever having been married), then perhaps we don’t care whether people avoid divorce by not getting married or avoid divorce by getting married and staying that way. In that case, what we’re after is the ITT. If, however, what we want is stable marriages, then we need to do the principal stratification and truncation due to death bit.
I think Felix’s insight has some applicability to the legal context. More on that in a subsequent post.
Posted by James Greiner at 6:00 AM
11 April 2006
Felix Elwert
Race is a surprisingly malleable construct, though it’s usually taken as fixed in statistical models. In a recent paper with Nicholas Christakis (Widowhood and Race, American Sociological Review Vol 71(1), 2006) I had to engage changing racial responses head on.
Assorted previous research has shown that people may change their racial self-description over time because they are multiracial, when they marry somebody of a different racial group, or – not to be neglected – because the answer choices in surveys may change over time.
Most people think that unstable or changing racial self-identification is an issue largely confined to a small group of multiracial individuals. This is a country, after all, of the one-drop rule. But research, including our own, shows that that isn’t so.
In a supplementary analysis of the 2001 Census Quality Survey (CQI), we showed that the racial self-identification of “whites" is also surprisingly unstable. The CQI asks more than 50,000 respondents twice within the span of just a few months to identify their own race. Once they were allowed to select only one race, and the other time they were given the option of selecting multiple races (this gets at the difference between the old and the new Census race questions). The answers were then matched to individual responses from the official 2000 Census.
Depending on whether we compared between consecutive responses to the same race question on the Census and the CQS, or between the different questions asked in the two waves of the CQS, and whether we treat “Hispanic" as a category distinct from black and white, the agreement between answers for whites ranged from 95.6 to 97.5. We obtained really similar answers for blacks.
Meaning, between 2 and 5 percent of people who used to identify as white, would call themselves either something else or a mixture of races when given the chance. And the percentage of “whites" who will change their racial self-description as a function of question wording is about the same as the percentage of “blacks" who will do likewise.
Posted by Felix Elwert at 6:00 AM
10 April 2006
To follow up a previous post of mine, here's another statistics-related lesson to do with your kids. I came up with it at an ice cream shop with my 10-year-old daughter a couple of weeks ago. The point of the lesson is about the power of combinatorics and really, really big numbers. The result is pretty surprising. Here's the recipe:
INGREDIENTS: An ice cream shop, some money for some ice cream, a kid, and a calculator. [I hear 2 objections. To the first: Don't worry, you're probably already carrying a calculator; look closer at your cell phone. The second is: shouldn't we be requiring kids to make the calculations themselves? The fact is that lots of famous mathematicians and statisticians are pretty bad at arithmetic, even though they are obviously spectacularly good at higher level mathematics. Being able to multiply 2-digit numbers in your head is probably useful for something, but understanding the point of the calculation -- why you're doing it, what the inputs are, and what the result of the calculation means -- is far more important.]
DIRECTIONS: Make your order, sit down, and, while you're eating, pose this question to your kid: Suppose the choices on the menu on the wall have never changed since the shop opened. How many choices do you see that have never been chosen even once?
After thinking about weird but fun options like pouring coffee in an ice cream cone, we try it a little more systematically. So we first set out to figure out how many options there are. So I ask, "how many ice cream flavors are there?" My daughter counts them up; it was 20. So how many combinations of one flavor can you have? 20 obviously. How many combinations of two flavors can you have (where for simplicity, we'll count a cone with chocolate on the bottom and vanilla on the top as different from the reverse)? The answer is 20 x 20 or 400. (Its not 40, its 400. Think of a checkerboard with one flavor down the 20 rows and another across the 20 columns and the individual squares as the combination of the two.)
So how many toppings could we have on that ice cream? She went to the counter and counted: 18. And then did 18*400, which she figured out is 7,200. After that we used the calculator and just continued to multiply and multiply as I point out categories on the menu and she counts each up. The total gets big very fast. We got to numbers in the trillions in just a few minutes.
So we find that the total number of options is a really big number. But what does that say about how many options have been tried?
Let's suppose, I say, that it only takes one second for someone to make their choice and receive their order, and that the shop is open 24 hours a day, 7 days a week, all year round. (You could make more realistic assumptions, and teach some good data collection techniques, by watching people get their orders and timing them.) Then we figure out how long it would take for the shop to have been open (under these wildly optimistic assumptions) in order to serve up all the options. To calculate the number of years, all you do is take the number of options, divide by 60 (seconds a minute), 60 (minutes an hour), 24 (hours in a day), and 365 (days a year). In our case, to serve all the options, the shop would have had to be open for around 43,000 years!
So even if the shop had been open for 100 years, it couldn't have served even a tiny fraction of the available options. So how many choices have never been tried at the ice cream shop? Its not just the few that we can cleverly dream up. In fact, almost all of them (over 99 percent of the possibilities) have never been tried!
(At which point my daughter said, "ok, let's get started!")
Actually, if you go to a deli and try this, you can get much larger numbers. For example, if the menu has about 85 items, and each one can be ordered in 10 different ways, the number of possible orders (10 to the 85) is larger than the number of elementary particles in the universe.
Posted by Gary King at 6:00 AM
This week the Applied Statistics Workshop will present a talk by Matthew Harding, a Ph.D. candidate in the Department of Economics at MIT. He received his BA from University College London and an M.Phil in economics from the University of Oxford. His work in econometrics focuses on stochastic eigenanalysis with applications to economic forecasting, modeling of belief distributions, and international political economy. He also works on modeling heterogeneity in nonlinear random coefficients models, duration models, and panels.
Matt will present a talk entitled "Evaluating Policy Counterfactuals in Voting Models with Aggregate Heterogeneity." A link to a background paper for the presentation is available from the workshop website. The presentation will be at noon on Wednesday, April 5 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.
Posted by Mike Kellermann at 12:00 AM
6 April 2006
Jens Hainmueller
Great News for people studying immigration: The first-full cohort module of the New Immigrant Survey (NIS 2003) is now online. The NIS is "a nationally representative multi-cohort longitudinal study of new legal immigrants and their children to the United States based on nationally representative samples of the administrative records, compiled by the U.S. Immigration and Naturalization Service (INS), pertaining to immigrants newly admitted to permanent residence."
The sampling frame consists of new-arrival and adjustee immigrants. The Adult Sample covers all immigrants who are 18 years of age or older at admission to the Lawful Permanent Residence (LPR) program. There is also a Child Sample, which covers immigrants with child-of-U.S.-citizen visas who are under 18 years of age and adopted orphans under five years of age. Overall 8,573 adults and 810 children were interviewed. This constiutes a response rate of about 65%.
The NIS features a wide variety of questions regarding demographics, pre-immigration experiences, employment, health, health and life Insurance, health care utilization and daily activities, income, assets, transfers, social variables, migration history, etc. There is also the controversial and much discussed skin color scale test, where the survey measured respondent skin color using an 11-point scale, ranging from zero to 10, with zero representing albinism,
or the total absence of color, and 10 representing the darkest possible skin. The Scale was memorized by the interviewers, so that the respondent never sees the chart. Check out the ten shades of skin color corresponding to the points 1 to 10 and a description of the skin color test here.
Posted by Jens Hainmueller at 6:00 AM
5 April 2006
Felix Elwert
In a previous post, Mike quoted Alan Greenspan, "I suspect greater payoffs will come from more data than from more technique." Not an uncommon opinion. But there are more and less flattering ways of reading such statements.
For what’s behind the sentiment, I sometimes suspect (I’m not picking fights with the Maestro), is not just the desire for better data but a distrust of advanced statistical methods. There’s this perception that more complicated math necessitates more assumptions, ergo less robust results. By this logic, the simpler the method, the more credible the conclusion. Crosstabs rule, ANOVA passes muster. The truth, of course, is the opposite: simple stats in observational data analysis usually require more assumptions. As we move from crosstabs to OLS to GEE for a given analytical goal we are usually trying to relax assumptions. Tragically, the presence of said assumptions often becomes obvious only after the author points them out. And then it’s open season on the messenger.
I witnessed this sort of thinking recently when I reviewed a paper for a leading sociological journal. The author pointed out some serious methodological flaws in one strand of comparative welfare state research, then proposed an alternative to one well regarded analysis by relaxing some offending assumptions. Boom, did he get slammed by one reviewer for allegedly making the very assumptions he had exposed in the first place. The paper was rejected in the first round. (This is sort of a pet peeve of mine, and I might vent again.)
Posted by Felix Elwert at 6:00 AM
4 April 2006
Jim Greiner
In a previous post, I brought up the subject of how we quantitative analysts can abuse the trust decision makers (judges, government officials, members of the public) put in us, when they are inclined to trust us at all. Decision makers should be able to depend on us to give them not just a (clearly and understandably stated) summary of inferences we believe are plausible, but also a (clearly and understandably stated) statement of the weak points of those inferences. “No kidding,� you might say. OK. If it’s that obvious, how come none of us is able to do it?
Here’s an exercise, again, something that’s come out of my experience in teaching a class on statistical expert witnesses in litigation. Next time you think you’ve “got it,� that you’ve done the right thing with a dataset and have drawn some solid inferences, step back and ask: “Suppose I was paid $____/hour to convince people that the work I’ve just done is not worthy of credence. What would I say?�. If all you can come up with are criticisms that make you laugh (because they’re so silly) or ideas that you can dismiss as unscrupulous babbling motivated by a desire for fees, then you might suffer from a mutilating and disfiguring disease: AE.
In the litigation and expert witnesses class, we’re giving students datasets and assigning them positions (plaintiffs or defendants). One of the refreshing things about this exercise has been that it is forcing the student-experts to think about where attacks on their reports will come from. Perhaps even more importantly, because the sources of those attacks are their friends and peers (i.e., people they respect), students begin to remember something they knew before the academic environment tried to make them forget it: there are weaknesses in everything they do.
I don’t know if all academics suffer from AE. Perhaps I’ve been unlucky in meeting a great many who suffer from especially severe cases. Who knows? Perhaps I’m a carrier myself? (Nah . . .)
Posted by James Greiner at 6:00 AM
3 April 2006
If inference is the process of using data we have in order to learn about data we do not have, it seems obvious that there can never be a proof that anyone has arrived at the "correct" theory of inference. After all, the data we have might have nothing to do with the data we don't have. So all the (fairly religious) attempts at unification -- likelihood, Bayes, Bayes with frequentist checks, bootstrapping, etc., etc. -- each contribute a great deal but they are unlikely to constitute The Answer. The best we can hope for is an agreement, or a convention, or a set of practices that are consistent across fields. But getting people to agree on normative principles in this area is not obviously different from getting them to agree on the normative principles of political philosophy (or any other normative principles).
It just doesn't happen, and even if it does it would have merely the status of a compromise rather than the correct answer, the latter being impossible.
Yet, there is a unifying principal that would represent progress in the sense that would advance the field: we will know that something like unification has occurred when we distribute the same data, and the same inferential question, to a range of scholars with different theories of inference, that go by different names, use different conventions, and are implemented with different software, and yet they all produce approximately the same emprical answer.
We are not there yet, and there are some killer examples where the different approaches yield very different conclusions, but there does appear to be some movement in this direction. The basic unifying idea I think is that all theories of inference require some assumptions, but we should never take any theory of inference so seriously that we don't stop to check the veracity of the assumptions. The key is that conditioning on a model does not work, since of course all models are wrong, and some are really bad. What I notice is that most of the time, you can get roughly the same answers using (1) likelihood or Bayesian models with careful goodness of fit checks and adjustments to the model if necessary, (2) various types of robust, semi-parametric, etc. statistical methods, (3) matching for use as preprocessing data that is later analyzed or further adjusted by parametric likelihood or Bayesian methods, (4) Bayesian model averaging, with a large enough class of models to average over, (5) the related "committee methods'', (6) mixture of experts models, and (7) some highly flexible functional forms, like neural network models. Done properly, these will all usually give similar answers.
This is related to Xiao-Li Meng's self-efficiency result: the rule that ``more data are better'' only holds under the right model. Inference can't be completely automated for most quantities, and we typically can't make inferences without some modeling assumptions, but the answer won't be right unless the assumptions are correct, and we can't ever know that the assumptions are right. That means that any approach has to come to terms with the concept that some of the data might not be right for the given model, or the model might be wrong for the observed data. Each of the approaches above has an extra component to try to get around the problem of incorrect models. This isn't a unification of statistical procedure, or a single unified theory of inference, but it may be leading to a unificiation of results of many diverse procedures, as we take the intuition from each area and apply it across them all.
Posted by Gary King at 6:00 AM
This week the Applied Statistics Workshop will present a talk by L.J. Wei and Tianxi Cai of the Department of Biostatistics at the Harvard School of Public Health. Professor Wei received his Ph.D. in statistics from the University of Wisconsin at Madison and has served on the faculty of several universities before coming to Harvard in 1991. Professor Cai received her Sc.D. from the Harvard School of Public Health in 1999 and was a faculty member at the University of Washington before returning to HSPH in 2002. Professors Wei and Cai will present a talk entitled "Evaluating Prediction Rules for t-Year Survivors With Censored Regression Models." The presentation will be at noon on Wednesday, April 5 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided. The abstract of the paper follows on the jump:
Suppose that we are interested in establishing simple, but reliable rules for predicting future t-year survivors via censored regression models. In this article, we present inference procedures for evaluating such binary classification rules based on various prediction precision measures quantified by the overall misclassification rate, sensitivity and specificity, and positive and negative predictive values. Specifically, under various working models we derive consistent estimators for the above measures via substitution and cross validation estimation procedures. Furthermore, we provide large sample approximations to the distributions of these nonsmooth estimators without assuming that the working model is correctly specified. Confidence intervals, for example, for the difference of the precision measures between two competing rules can then be constructed. All the proposals are illustrated with two real examples and their finite sample properties are evaluated via a simulation study.
Posted by Mike Kellermann at 12:00 AM