February 2007
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28      

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries

Categories

Blogroll

SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« January 2007 | Main | March 2007 »

27 February 2007

Adventures in Identification II: Exposing Corrupt Politicians

Today we continue our voyage in the treasure quest for identification in observational studies. After our sojourn in Spain two weeks ago, the next stopover is in Brazil, where in a recent paper Claudio Ferraz and Frederico Finan discovered a nice natural experiment that allows to estimate the effect of transparency on political accountability. Many in the policy world are agog over the beneficial impact of transparency on good governance. Yet, empirical studies of this subject are often bedevilled by selection problems for obvious reasons. Ideally, we would like to find a situation in which changes in transparency are randomly assigned, which (also for obvious reasons) tends to be a low probability event. But is does happen. Turns out that in a recent anti-corruption program in Brazil, the federal government randomly audits 60 municipalities every month and then discloses the findings of the report to the municipality and the media. The authors exploit this variation and find that the dissemination of information on corruption, which is facilitated by media, does indeed have a detrimental impact on the incumbent’s electoral performance.

Here is the abstract of the paper:

Exposing Corrupt Politicians: The Effects of Brazil’s Publicly Released Audits on Electoral
Outcomes

This paper examines whether access to information enhances political accountability. Based upon the results of Brazil’s recent anti-corruption program that randomly audits municipal expenditures of federally-transferred funds, it estimates the effects of the disclosure of local government corruption practices upon the re-election success of incumbent mayors. Comparing municipalities audited before and after the elections, we show that the audit policy reduced the incumbent’s likelihood of re-election by approximately 20 percent, and was more pronounced in municipalities with radio stations. These findings highlight the value of information and the role of the media in reducing informational asymmetries in the political process.

Posted by Jens Hainmueller at 12:48 PM

26 February 2007

Applied Statistics - Donald Rubin

This week, the Applied Statistics Workshop will present a talk by Donald Rubin, the John Loeb Professor of Statistics at Harvard. Professor Rubin has published widely on numerous topics in statistics, and is perhaps best known for his work on missing data and causal inference. His articles have appeared in over thirty journals, and he is the author or co-author of several books on missing data, causal inference, and Bayesian data analysis, many of which are the standards in their fields. In 1995, Professor Rubin received the Samuel S. Wilks Memorial Award from the American Statistical Association.

Professor Rubin will present a talk entitled "Principal Stratification for Causal Inference with Extended Partial Compliance," which is based on joint work with Hui Jin. Their paper is available from the workshop website. The presentation will be at noon on Wednesday, February 28 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract of the paper follows on the jump:

Principal Stratification for Causal Inference with Extended Partial Compliance

Hui Jin and Donald B. Rubin

Abstract

Many double-blind placebo-controlled randomized experiments with active drugs suffer from complications beyond simple noncompliance. First, the compliance with assigned dose is often partial, with patients taking only part of the assigned dose, whether active or placebo. Second, the blinding may be imperfect in the sense that there may be detectable positive or negative side-effects of the active drug, and consequently, simple compliance has to be extended to allow different compliances to active drug and placebo. Efron and Feldman (1991) presented an analysis of such a situation and discussed inference for dose-response from the non-randomized data in the active treatment arm, which stimulated active discussion, including concerning the role of the intention-to-treat principle in such studies. Here, we formulate the problem within the principal stratification framework of Frangakis and Rubin (2002), which adheres to the intention-to-treat principle, and we present a new analysis of the Efron-Feldman data within this framework. Moreover, we describe precise assumptions under which dose-response can be inferred from such non-randomized data, which seem debatable in the setting of this example. Although this article only deals in detail with the specific Efron-Feldman data, the same framework can be applied to various circumstances in both natural science and social science.

Posted by Mike Kellermann at 11:26 AM

23 February 2007

Translating Statistics-Speak

I wish we all talked more about how scientific results are translated by the media. Fully understanding the assumptions and limitations of a study is challenging enough for those performing the research. In some ways, the journalists’ job is harder, finding lay language to summarize outcomes and implications without generalizing or ignoring uncertainty. I do not envy them the task.

Byron Calame, the public editor of the New York Times, recently discussed his paper's presentation of a study about marital status. On January 16, the front page read, "51% of Women are Now Living Without Spouse.” Calame’s response noted that in the study, “women” included females aged 15 and older; the Census set the lower bound at 15 to catch all married women. The original article did not call attention to the fact that teenagers living at home were counted as single women.

Apparently, when other journalists pointed out the misleading lack of clarity, some readers felt that they had been deceived. Is the “true” parameter just over 50% or just under? I would argue that the lower age bound set by the census is as reasonable as any. I also think that it doesn’t make much difference whether the percentage of women who are unmarried is a tiny bit over 50 or a tiny bit under (Sam Roberts, who wrote the original article, eventually made the same argument).

Regardless, Calame reports that an executive Times editor plans to spend more time discussing statistical results with colleagues who have expertise in the relevant fields. This seems like a great plan. I wonder how far this idea could be taken – how can researchers best work with journalists to successfully translate results?

A Crimson article published yesterday went so far as to refer to the “basic statistical measures—such as p-values or R-squared values,” or lack thereof, in a study conducted by Philip Morris. And when covering The New England Journal of Medicine’s discussion of stents for heart patients, The Times focused on the fact that some risks are “tough to assess.” This journalistic direction seems promising.

Posted by Cassandra Wolos at 2:01 PM

22 February 2007

Cheating for Honest People

Let me follow up on yesterday’s post by Jim Greiner.

Jim’s problem: He’s touring the country touting tools for increased honesty in applied statistical research, only to be asked, effectively, for recommendations about using these tools to cheat more effectively. Yay academic job market.

Jim’s example goes like this: An analyst is asked to model the effect of a treatment, T, on the outcome, Y, while controlling for a bunch of confounders, X. To minimize the potential for data dredging we give the analyst only the treatment and the observed potential confounders to model the treatment assignment process, but we withhold the outcome data. Only after the analyst announces success in balancing the data (by including X, functions of X,f(X), deleting off-support observations etc), would we communicate the outcome data, plug the outcome in the equation, run it once, and be done.

So how can we help Jim help his audience cheat? Let’s make two assumptions (which I’d be willing to defend with my life). First, although the analyst is not given the actual outcome data, the analyst does know what the outcome is (wages, say). Second, the analyst is permitted to drop elements of X from the analysis, based on his or her analytic judgment.

Now let’s cheat. First, select the covariate, C, from the pool of potential confounders, X, believed to correlate most strongly with the outcome, Y. Second, treat C as the outcome and build a model through data dredging to maximize (or minimize, if this is your objective) the “effect” of T on C. Specifically, find the subset of functions of X, S(f(X)), that maximizes the effect of T on C while maintaining balance in S(f(X)). Third, upon receiving the outcome data, just plug them into the model but “forget” to mention that you didn’t include C in the treatment assignment model. If C really correlates strongly with Y then this procedure should lead to an upwardly biased estimate of T on Y.

I fear that this would work well in practice (though one could construct a counterexample). Seems to me, however, that it would be more technically demanding to cheat in this way than to cheat in, say, standard regression analysis.

Posted by Felix Elwert at 6:42 PM

21 February 2007

How do I cheat with potential outcomes?

As some folks know, I'm on the legal academic job market this year. My job talk paper is on the application of the potential outcomes framework for causation to legal matters, particularly anti-discrimination issues that arise in litigation. As I've presented the framework, I've highlighted one of its advantages as being the fact that much of the hard work of separating covariates from intermediate outcomes and balancing covariates can (and should) be done without access to the outcome variable. The idea is that without access to the outcome variable, it is harder for a researcher (or, God forbid, an expert witness) to model-snoop, i.e., to fit model after model until finding one that "proves" a pet theory.

In a few schools, reaction to the claim of increased objectivity has been chilly. Skeptics have said, in essense, "I don't know enough about statistics to argue with you, but I'm REALLY SURE that your method is just as manipulable as, say, regression, even if you don't have access to the outcome variable when you do the hard work." The skeptics have then asked me to tell them how to manipulate the potential outcomes framework (i.e., to tell them why they are right and I am wrong), assuming no access to the outcome variable.

Any ideas on this? I'm able to think of one way it can be done (although the results of "my" way would not be nearly as bad as those from model-snooping), but I'd prefer not to stifle any comments folks might have by putting forth my own thoughts.

Posted by James Greiner at 3:33 PM

20 February 2007

Borat's Effect on Kazakhstan

If you’ve seen it or paid some attention to what’s going on in the popular media in the past six months, you will not have missed the movie ``Borat: Cultural Learnings of America for Make Benefit Glorious Nation of Kazakhstan’’ by Sacha Baron Cohen. The movie went from huge hype to packed movie theatres, and is due out on March 6 on DVD. Some described the movie as ``brilliant’’, for others it was 15 minutes of mediocre jokes drawn out into 82 minutes of film.

Whatever you may think, the government of Kazakhstan certainly took issue. They felt that their country was portrayed in a particularly unfair light, and started an image campaign with advertisements in the New York Times and other news media (see here for an article on that matter by the NYT). But what actually was the impact on Kazakhstan’s image of that movie? Fifteen minutes on Google Trends are suggestive (or frivolous, as Amy suggested).

Here is the timeline of events from Wikipedia: Borat was first screened at some film festivals from July 2006 onwards. It was officially released at the Toronto Film Festival on September 7, 2006 which started the hype. The movie opened in early November in the US, Canada and most European countries. It was number 1 at the US box office for two weeks and only left the top 10 in mid-December.

Here’s a graph of search terms and their associated search volume from Google Trends until November 2006 (you can get this live here and modify as you please). The blue line is the term ``borat movie’’; the red line is ``kazakhstan’’ and the orange line is ``uzbekistan’’ which will serve as (admittedly imperfect) control country. The news reference volume refers to the number of times each topic appeared in Google News.

borat_1.png

As you can see, searches for ``borat movie'' take off in September 2006 which coincides with the official release. It spikes in late October before the movie opens at the box office and goes down afterwards. The event B is the announcement of the movie as picked up by Google News. All as expected even if the blips before July are a little strange.

Interestingly the search volume for ``uzbekistan’’ follows that of ``kazakhstan’’ quite well before the movie appears in the spotlight in September. From September onwards the volume for ``kazakhstan’’ somewhat tracks the volume for the movie instead. If you were to look at monthly data you would see that the relationship is not as clear but there does seem to be a trend. So maybe the movie generated some interest in the country.

Here’s another chart for September 2006 (from here). The blue and red lines are as before, but now the orange line is for ``kazakstan’’. It turns out that you can write the name correctly with or without the ``h’’. Maybe people who spell it for the first time would use this version. This search term appears in the search volume just before the movie hits the theaters.

borat_2.png

Google Trends gives another hint. If you look at the cities of origin for the searches, you will notice a mix of US/European countries and cities in the second half of 2006. And ``kazakstan’’ is mostly searched by British users. In the first half of the year however almost all searches come from Almaty, the largest city in Kazakhstan.

Now, obviously nothing is causal and proven but it does look interesting. Not only did the search volume on Google shoot up around the time of the introduction of the movie, but also the geographic composition of the searches shifted to where the movie was very popular and the country not well known before Fall 2006.

What does all this mean for Kazakhstan? Is this good or bad publicity? It seems that people became interested in the country beyond the movie (see a USA Today story here). A poll of users of a UK travel website put Kazakhstan in the Top 3 places to visit (right after Italy and the UK if you believe the results), and the Lonely Planet already has an article on the real Kazakhstan ``beyond Borat''. We'll see if those people are really going in the end, and if the trend persists over time as Google supplies more information. But all in all the movie might have generated some useful publicity for the country. Estimating the impact on tourism and world opinion, anyone?

Posted by Sebastian Bauhoff at 1:24 AM

19 February 2007

Applied Statistics - Dan Hopkins

This week, the Applied Statistics Workshop will present a talk by Dan Hopkins, a Ph.D. candidate at in the Government Department at Harvard. Dan has a long-standing association with Harvard, having graduated from the College in 2000. His research focuses on political behavior, state and local politics, and political methodology. His work has appeared in the American Political Science Review. He will present a talk entitled "Flooded Communities: Estimating the Post-Katrina Migration's Impact on Attitudes towards the Poor and African Americans." The paper is available from the workshop website. The presentation will be at noon on Wednesday, February 21 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract of the paper follows on the jump:

Flooded Communities: Estimating the Post-Katrina Migration's Impact on Attitudes towards the Poor and African Americans

This paper uses the post-Katrina migration as a quasi-experiment to confront concerns of selection bias and measurement error that have long plagued research on environmental effects. Drawing primarily on a phone survey of 3,879 respondents, it demonstrates that despite the attention to issues of race and poverty following Hurricane Katrina, people in communities
that took in evacuees actually became less supportive of the poor, of African Americans, and of policies to help those groups. The patterns uncovered suggest that the key mechanism is not direct contact, physical proximity, or persuasion by local elites. Instead, the empirical observations accord with a new theory of environmental effects emphasizing the interaction of changing demographics and the media environment. Under the theory of politicized change, sudden changes in local demographics make demographics salient to local residents. Media coverage can convey information about these shifts and can also frame people's thinking on issues related to them.

Posted by Mike Kellermann at 12:07 PM

Unwed teenagers and other statistics in the news

Gary Langer, the director of polling for ABC News, has posted an interesting piece on some recent coverage (or mis-coverage) of social science and medical research. One of his targets is an article that appeared on the front page of the New York Times announcing that "51% of Women Are Now Living Without Spouse." I had heard a lot about this particular story, not least because one of my colleagues has it posted on the bulletin board in our office. As it turns out, the magic 51% number was obtained by including women aged 15-17 in the data, something that was not particularly transparent in the article. So, while there is nothing necessarily wrong with the data itself, it is not clear that these are the numbers that you should be looking at (unless you are concerned about the national epidemic of unwed teenagers living with their parents).

In addition to leading the polling unit, Langer serves as kind of a "statistical watchdog" for ABC News. He was on a panel here at IQSS about a year ago and told some great stories about the amount of garbage that crosses their desks on a regular basis. It would be nice if all of the major news organizations had similar arrangements in place to vet their coverage of statistical reportage. (Hat tip: Mystery Pollster)

Posted by Mike Kellermann at 11:40 AM

16 February 2007

Initiative for Innovative Computing - Edward Tufte

The Initiative in Innovative Computing, an interdisciplinary program that aims to "foster the creative use of computational resources to address issues at the forefront of data-intensive science," is hosting a talk by Edward Tufte next week. It is easy to forget that Tufte began his career as a political scientist, long before he became known for his work on the visual representation of evidence. His 1975 article on "Determinants of the Outcomes of Midterm Elections" is one of the 20 most-cited articles published in the first 100 years of the APSR. I don't know that I would want to leave political science in the way that Tufte did, but having a job entitled "Senior Critic" sounds like a lot of fun. The details of the talk follow:

February 21, 2007; 7:00pm
Biolabs Room 1068, 16 Divinity Avenue

Edward Tufte, Professor Emeritus of Political Science, Statistics, and Computer Science, and Senior Critic in the School of Art at Yale

An Academic and Otherwise Life, An N = 1

Abstract

Edward Tufte will talk about his education and careers in statistics, political economy, analytical design, landscape sculpture, book publishing, and consulting. A question session will follow the talk

Bio

Edward Tufte's most recent book is Beautiful Evidence. He taught at Princeton and Yale for 32 years, and is Professor Emeritus of Political Science, Statistics, and Computer Science, and Senior Critic in the School of Art at Yale.

Posted by Mike Kellermann at 2:10 PM

14 February 2007

Data sharing and visualization

A friend of mine pointed me to this website, Many eyes. Basically any random person can upload any sort of dataset, visualize the dataset in any number of ways, and then make the results publically available so that anyone can see them.

The negative, of course, is much the same as with anything that "just anyone" can contribute to: there is a lot of useless stuff, and (if the source of the dataset is uncited) you don't know for sure how valid the dataset itself is. There may be a lot of positives, though: the volume of data alone is like a fantastic dream for many a social scientist; it's a great tool for getting "ordinary people" interested in doing their own research or analysis of their lives (for instance, I noticed some people graphing changes in their own sports performance over time); many of the interesting datasets have ongoing conversations about them; and only time will tell, but I imagine there is at least a chance this could end up being Wikipedia-like in its usefulness.

It may also serve as a template for data-sharing among scientists. Wouldn't it be nice if, every time you published, you had to make your dataset (or code) publically available? We might already be trending in that direction, but some centralized location for scientific data-sharing sure would speed it along.

Posted by Amy Perfors at 10:24 AM

13 February 2007

Adventures in Identification I: Voting After the Bomb

Jens Hainmueller

I've decided to start a little series of entries under the header `Adventures in Identification.' The title is inspired by the increasing trend in the social sciences, in particular economics, public health, also political science, sociology, etc. to look for natural or quasi-experiments to identify causal effects in observational settings. Although there are of course plenty of bad examples of this type of study, I think the general line of research is very promising and the rising interest in issues of identification is commendable. Natural experiments often provide the only credible alternative to answer many of the questions we care about in the social sciences, where real experiments are often unethical or infeasible (or both) and observational data usually has selection bias written all over it. Enough said, let's jump right into the material: `Adventures in Identification I: Voting After the Bomb -- a Macabre Natural Experiments in electoral politics.

A recent question in political science and also economics is how terrorism effects democratic elections. Now clearly this seems a fairly tricky question to get some (identification) handle on. Heretic graduate students riding on their Rubin horses around IQSS will tell you two minutes into your talk that you can't just run a regression and call it `causal.' One setting where an answer may be (partly) possible is the case of the Spanish congressional elections in 2004. The incumbent conservative party led by Prime Minister Jose Maria Aznar had been favored to win by a comfortable margin according to opinion polls. On March 11, however, Islamic terrorists deposited nine backpacks full of explosive in several commuter trains in Madrid. The explosions killed 191 people and wounded 1,500. Three days later Spain's socialists under the lead of Jose-Luis Rodriguez Zapatero scored a stunning victory in the elections. Turnout was high and many have argued that voters seemingly expressed anger with the government, accusing it of provoking the Madrid attacks by supporting the U.S.-led war in Iraq, which most Spaniards opposed.

Now the question is how (if at all) the terrorist attacks affected the election result. As usual, only one potential outcome is observed and the crucial question is what the election results would have been like in the absence of the attacks. One could do a simple before and after study imputing this missing potential outcome based on some extrapolated pre-attacks trend in opinion polls. But then the question remains whether these opinion polls are an accurate representation of how people would have voted on election day. A difference-in-differences design seems better suited, but given that the attacks probably affected all voters a control group is hard to come by.

In a recent paper, Jose G. Montalvo, actually found a control group. Turns out that at the time the attacks hit, Spanish residents abroad had already cast their absentee ballots. Thus, they were not affected in their decision by the attacks. The author then sets up a diff-in-diffs exploiting voting trends in the treated group (Spanish residents) and the control group (Spanish citizens in a foreign country). He finds that the attacks had a large effect on the result to the benefit of the opposition party. Interestingly, this result seems to be different from the findings of other simple before and after studies on the topic (although I can't say because I have not read the other papers cited).

Of course, the usual disclaimers about DID estimates apply. Differential trends between the groups may exist if foreign residents perceived terrorism differently than Spanish residents over time. Foreign residents are probably very different than Spanish residents. But to the defense of the author, the results seem fairly robust given the checks he presents. And hey, it's a though question to ask and this provides a more appropriate way to get a handle on identifying the counterfactual outcome then simply comparing before and after.

Posted by Jens Hainmueller at 8:00 AM

12 February 2007

Applied Statistics - Jens Hainmueller

This week, the Applied Statistics Workshop will present a talk by Jens Hainmueller, a Ph.D. candidate at in the Government Department at Harvard. Prior to joining the department, he received degrees from the London School of Economics and the Kennedy School of Government. His work has appeared in International Organization and the Journal of Legislative Studies. He will present a talk entitled "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." This talk is based on joint work with Alberto Abadie and Alexis Diamond; their paper and supporting software are available from the workshop website. The presentation will be at noon on Wednesday, February 14 in Room N354, CGIS North, 1737 Cambridge St. As always, lunch will be provided. An abstract of the paper follows on the jump:

Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program

Alberto Abadie – Harvard University and NBER
Alexis Diamond – Harvard University
Jens Hainmueller – Harvard University

Building on an idea in Abadie and Gardeazabal (2003), this article investigates
the application of synthetic control methods to comparative case studies.
We discuss the advantages of these methods and apply them to study the effects
of Proposition 99, a large-scale tobacco control program that California
implemented in 1988. We demonstrate that following Proposition 99 tobacco
consumption fell markedly in California relative to a comparable synthetic control
region. We estimate that by the year 2000 annual per-capita cigarette
sales in California were about 26 packs lower than what they would have been
in the absence of Proposition 99. Given that many policy interventions and
events of interest in social sciences take place at an aggregate level (countries,
regions, cities, etc.) and affect a small number of aggregate units, the potential
applicability of synthetic control methods to comparative case studies is very
large, especially in situations where traditional regression methods are not appropriate.
The methods proposed in this article produce informative inference
regardless of the number of available comparison units, the number of available
time periods, and whether the data are individual (micro) or aggregate (macro).
Software to compute the estimators proposed in this article is available at the
authors’ web-pages.

Posted by Mike Kellermann at 12:16 AM

9 February 2007

Corruption in the Classroom

In the fall, I mentioned the debate over teaching kids to read using whole language versus phonics methods. The heavily funded Reading First program, part of No Child Left Behind, is intended to promote phonics and relies on research published by the National Reading Panel (which I don’t completely trust, but today that’s beside the point).

The latest is a report by psychologist Louisa Moats claiming that instead of changing their curricula to focus on phonics, reading programs are sprinkling key phonics catchphrases throughout their marketing materials and selling the same old whole language lessons. The press release for Moats’ report contrasted the situation with the F.D.A.’s oversight of drugs. The government authority approves the treatment; companies marketing the treatment rely on public trust in the authority. The difference is that education companies get away with much more than the drug companies ever could.

Reports like this highlight for me the differences in how natural and social science results become policy. I see that medical dishonesty can kill people while the effects of corruption in education are less direct. But how does it happen that New York City public schools spend anti-whole language funding on thinly disguised whole language curricula? What other social programs are subject to this kind of deceit?

Posted by Cassandra Wolos at 9:37 AM

7 February 2007

Timing Is Everything

Jim Greiner

Per previous blog posts, I'm giving today's presentation at CGIS on causal inference and immutable characteristics. I've previewed some of the ideas from this research in blog posts. Basically, the idea is that if we shift our thinking from "actual" immutable characteristics (e.g., race), a concept I find poorly defined in some situations, to perceived immutable characteristics, then the potential outcomes framework of causation can sometimes be usefully applied to things like race, gender, and ethnicity.

A key point here is the timing of treatment assignment. If treatment is conceptualized in terms of perceptions, then a natural point at which to consider treatment applied is the moment the decision maker whose conduct is being studied first perceives a unit's race, gender, ethnicity, whatever. This works well only if we're willing to exonerate the decision maker from responsibility for whatever happened before that moment of first perception. In the law, sometimes we're willing to do so. Sometimes, we're not.

Take the employment discrimination context. Typically, we don't hold an employer responsible for the discrimination of someone else, particular when it occurred (say) prior to a job application, even if that prior discrimination means that some groups (e.g., minorities) have less attractive covariates (e.g., educational achievement levels) than others (e.g., whites). Perhaps potential outcomes could work here; a study of the employer's hiring can safely condition on educational achievement levels (i.e., take them as given, balance on them, etc.) and other covariates. More covariates means that the ignorability assumption required for most causal inference is more plausible.

Contrast the employment discrimination setting to certain standards applying to education institutions. For example, we may not want to allow a university to justify allocating fewer resources to female sports teams on the grounds that its female students show less interest in sports (even if we believed the university to be telling the truth). Here, we might consider that the preferences of the female students were probably shaped by prior stereotyping, and we might want to force the university to take steps to combat those stereotypes and change the female students' preferences. If so, we are unwilling to take the previous social pressure as "given," so we cannot balance on it. The result is fewer covariates and greater pressure on the ignorability assumption.

My thanks to Professor Roderick Hills of NYU law school, whose insightful question during a job talk I recently gave there helped solidify the above Title IX example.

Posted by James Greiner at 4:00 PM

6 February 2007

Ask why...why, why, why

askwhy1.jpeg

Posted by Jens Hainmueller at 10:11 PM

Presentation, Presentation (at conferences, that is)

An article by Jane Miller in the current issue of Health Services Research explains strategies for preparing conference posters. As she writes, posters are a "hybrid of a published paper and an oral presentation" and people often fail to recognize this in preparing a poster. The article reviews existing literature on research communication and provides some guidelines on how to present statistical methods and results appropriately. It's all common sense stuff, might come in handy for first-time presenters looking for guidance.

It also goes nicely with Gary's "Publication, Publication" guide for writing research papers which you can find here.

Jane E. Miller (2007) "Preparing and Presenting Effective Research Posters" Health Services Research 42(1p1): 311–328. doi:10.1111/j.1475-6773.2006.00588.x

Posted by Sebastian Bauhoff at 3:10 PM

5 February 2007

Applied Statistics - Jim Greiner

This week, the Applied Statistics Workshop will present a talk by Jim Greiner, a Ph.D. candidate in the Statistics Department. The talk is entitled "Potential Outcomes and Immutable Characteristics," and is based on joint work with Don Rubin from the Statistics Department. An abstract of the talk follows on the jump.

Jim graduated with a B.A. in Government from the University of Virginia in 1991 and then received a J.D. from the University of Michigan Law School in 1995. He clerked for Judge Patrick Higginbotham on the U.S. Court of Appeals for the Fifth Circuit and was a practicing lawyer in the Justice Department and private practice before joining the Statistics Department here at Harvard. His research interests focus on causal inference and ecological inference, particularly as they relate to issues arising in the legal process. He is also the former chair of and a current contributor to this blog.

The Applied Statistics Workshop will meet in Room N354 in the CGIS Knafel Building (next to the Design School) at 12:00 on Wednesday, February 7th. Everyone is welcome, and lunch is provided. We hope to see you there!

Potential Outcomes And Immutable Characteristics

D. James Greiner & Donald B. Rubin

In the United States legal system, various directives attempt to reduce the relevance of "immutable characteristics" (e.g., race, sex) in specified societal settings (e.g., employment, voting, capital punishment). Typically, the directive is phrased in terms of a prohibition on action taken "because of" or "on account of" a prohibited trait, suggesting some kind of causal inquiry. Some researchers, however, have suggested that causal reasoning is inappropriate in such settings because immutable characteristics cannot be manipulated or randomized. We demonstrate that a shift in focus from "actual" characteristics to perceptions of traits allows application of the potential outcomes framework of causation to some (but not all) civil rights concerns. We articulate assumptions necessary for such an application to produce well-posed questions and believable answers. To demonstrate the principles we discuss, we reanalyze data from one of the most famous empirical studies in the law, the so-called "Baldus Study" of the role of race in the administration of capital punishment in Georgia.

Posted by Mike Kellermann at 7:09 AM

1 February 2007

A Rash Of Senicide?

There have been an awful lot of stories lately about the world's oldest person dying; in fact, it seems to have happened about three times in the last month or so. Then again, being the world's oldest person is a dubious honour to be sure, since the winner isn't likely to hold the title for very long and likely isn't even aware of their status. (Full disclosure: my great-grandmother was a centenarian but likely never knew my name.)

These stories have been bouncing in my mind lately and I'm trying to figure out why. I can think of a few scientifically relevant explanations:

1) The life expectancy of a centenarian is on the order of a year, and three successive deaths in a month is a rare event; conditioned on the first one, assuming independence and exponential life span (a reasonable assumption for the tail end), the probability of the next two events coming within a month is roughly 0.0033. And this happened to be the month for it.

2) The events aren't at all rare, and the centenarian death rate is actually dramatically higher, but it's a slow news month, and the stories themselves are floating to the top of the pile.

3) Online news services like Reuters and CNN have dedicated spaces for more `entertaining' and `bizarre' news stories, meaning that no matter how much news there is, people are seeing these stories.

4) Guinness sales are down, despite the "brilliant!" advertising campaign, and the World Record people are seeking out these changing events for the sake of their own discreet advertising.

5) I read this in The Onion and the satire hit me point blank, meaning I'm selecting and remembering the stories more often when they appear.

I'm thinking it's Number 5, but I'd be curious to know if anyone knew the mean centenarian death rate and whether this was a rare occurrence or not.

Posted by Andrew C. Thomas at 9:56 AM