March 2010
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries

Categories

Blogroll

SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« February 2010 | Main | April 2010 »

16 March 2010

Humans make comeback in Zombie models!

Last Halloween, I alerted readers of the social science statistics blog to cutting edge research suggesting that if zombies attacked, humans faced serious risk of extinction.

It turns out that some of these conclusions may have been premature. Some recent research by Blake Messer suggests that if there is terrain that favors humans in some way, then humans may have a better shot at survival.

But it doesn't end there.

UCLA's Gabriel Rossman points out that Messer's model doesn't account for the possibility of human stupidity/sabotage (always a good thing to include in our models, I guess). Rossman's findings suggest that in the face of a zombie onslaught, small islands of weapons stockpiles might be more favorable for the long-term survival of the human race than a single cache -- perhaps the most important policy implication to come out of this renewed debate.

I think future research in this area will be worth following. First, I hear there is interesting work afoot on the spread of zombification through social networks, although getting the zombies to accurately report who bit who can be difficult. I've also heard rumors of some machine learning research that attempts to classify zombie speech (early results suggest that there is only one category: "BRAINS!"), and I believe some economists are using the apparent exogeneity of zombie outbreaks to finally identify the effect of education on wages.

Stay tuned!

Posted by Richard Nielsen at 12:49 PM

12 March 2010

Putting Statistics in Golf

Here's a neat article in the Wall Street Journal on a new putting statistic recently adopted by the PGA that was developed by researchers at MIT's Sloan School of Management. The article gives a great rundown on the deficiencies of the "putting average" traditionally used to rate pro golfers, then explains in detail how this new statistic improves upon it. Cool stuff!

Posted by John Graves at 12:01 PM

10 March 2010

Google public data explorer

The Google Public Data Explorer just went up and it is worth a look. They have collected a number of large datasets and created a set of visualization tools to explore the data. Probably most interesting is the ability to show how the data changes over time using animation. This will be familiar to you if you have seen any of Hans Rosling's TED talks.

While it is fun to play around with the data, it can be a bit overwhelming. Content requires curation. One that I found interesting was the World Bank data on net migration:

It's hard to get the colors/sizes quite right since size measures just the magnitude (positive or negative) and the colors range from red (people coming) to blue (people going). This sort of feels like the natural extension of programs like SPSS to the web.

Posted by Matt Blackwell at 4:53 PM

8 March 2010

Zajonc on "Bayesian Inference for Dynamic Treatment Regimes"

We hope you will join us this Wednesday, March 10th at the Applied Statistics workshop when we will be happy to have Tristan Zajonc (Harvard Kennedy School). Details and an abstract are below. A light lunch will be served. Thanks!

"Bayesian Inference for Dynamic Treatment Regimes"
Tristan Zajonc
Harvard Kennedy School
March 10th, 2010, 12 noon
K354 CGIS Knafel (1737 Cambridge St)

Abstract:

Policies in health, education, and economics often unfold sequentially and adapt to developing conditions. Doctors treat patients over time depending on their prognosis, educators assign students to courses given their past performance, and governments design social insurance programs to address dynamic needs and incentives. I present the Bayesian perspective on causal inference and optimal treatment choice for these types of adaptive policies or dynamic treatment regimes. The key empirical difficulty is dynamic selection into treatment: intermediate outcomes are simultaneously pre-treatment confounders and post-treatment outcomes, causing standard program evaluation methods to fail. Once properly formulated, however, sequential selection into treatment on past observables poses no unique difficulty for model-based inference, and analysis proceeds equivalently to a full-information analysis under complete randomization. I consider optimal treatment choice as a Bayesian decision problem. Given data on past treated and untreated units, analysts propose treatment rules for future units to maximize a policymaker's objective function. When policymaker's have multidimensional preferences, the approach can estimate the set of feasible outcomes or the tradeoff between equity and efficiency. I demonstrate these methods through an application to optimal student tracking in ninth and tenth grade mathematics. An easy to implement optimal dynamic tracking regime increases tenth grade mathematics achievement 0.1 standard deviations above the status quo, with no corresponding increase in inequality. The proposed methods provide a flexible and principled approach to causal inference for sequential treatments and optimal treatment choice under uncertainty.

Posted by Matt Blackwell at 12:42 PM

Tufte goes to Washington

In case you have not heard, Edward Tufte has been appointed to the Recovery Independent Advisory Panel by President Obama. The mission statement of the Panel is:

To promote accountability by coordinating and conducting oversight of Recovery funds to prevent fraud, waste, and abuse and to foster transparency on Recovery spending by providing the public with accurate, user-friendly information.

It is hard to imagine a better person for this panel than Tufte. As Feltron said, this is wonderful news for data nerds, designers, and the general public.

Posted by Matt Blackwell at 1:00 AM

6 March 2010

Teaching teachers

Andrew Gelman has some good comments on the great Elizabeth Green article about teaching in the New York Times Magazine. The article is about how to improve both classroom management and subject instruction for K-12 teachers, but Gelman correctly points out that many of these the struggles resonate with those of us teaching statistics at the undergraduate and graduate levels.

I used to be of the opinion that the teaching of children and the teaching of adults were two fundamentally different beasts and comparisons between the two were missing the point. The more I teach, though, the more I see teaching as a kind of a skill which is separated from the material being taught. Knowing a topic well does not imply being able to teach a topic well. This should have been obvious to me given the chasm between good research and good presentations.1 The article nails this as it talks about math instruction:

Mathematicians need to understand a problem only for themselves; math teachers need both to know the math and to know how 30 different minds might understand (or misunderstand) it. Then they need to take each mind from not getting it to mastery. And they need to do this in 45 minutes or less. This was neither pure content knowledge nor what educators call pedagogical knowledge, a set of facts independent of subject matter, like Lemov's techniques. It was a different animal altogether.

If this is true, how can we improve teaching? I think that Gelman is right in identifying student participation as important to teaching statistics. Most instructors would agree that statistics is all about learning by doing, but many of us struggle to identify how to actually implement this, especially in lectures. Cold-calling is extremely popular with law and business schools, but rare in the social sciences. Breaking off to do group work is another useful technique. In addition to giving up control of the class (which Gelman mentions), instructors have to really build the class around these breaks.

Reflecting on my own experience, both as a student and an instructor, I am starting to believe in three (related) fundamentals of statistics teaching:


  1. Repetition. If we really do learn by doing, then we should pony up and have students do many simple problems that involve the same fundamental skill or concept.

  2. Mantras. We are often trying to give students intuitions about the way statistics "works," but many students just need a simple, compact definition of the concept. Before I understood the Central Limit Theorem, I could tell you what it was ("The sums and means of random variables tend to be Normal as we get more data") because of the mantra that my first methods instructor taught me. As a friend told me, statistics is a foreign language and in order to write sentences you first need to know some vocabulary.

  3. Maps. It is so easy to feel lost in a statistics course and not understand how one week relates to the next. A huge help is to give students a diagram that represents where they are (specific topic) and where they are going (goals). The whole class should be focused around the path to the goals and they should always be able to locate themselves on the path.

There are probably more fundamentals that I am missing, but I think each of these is important and overlooked. Often this is simply because they are hard implement, instructors have other commitments, and the value-added of improving instruction can be very low. In spite of these concerns and the usual red herrings2, I think that there are simple changes we can make to improve our teaching.
--
1Perhaps a more subtle point is that being a good presenter does not imply being a good instructor. They are related, though. Good public speakers have an advantage as teachers, since they are presumably more comfortable in front of crowds. The goal of presenting (persuasion) and the goal of instruction (training people in a skill) are very different. People confuse the two because the medium is often so similar (lecture halls, podiums, etc).
2Teaching evaluations are important, but they are often very coarse. Students know if they didn't understand something, but rarely know why. Furthermore, improving evaluations need not come from improving instruction.

Posted by Matt Blackwell at 4:10 PM

5 March 2010

Collecting datasets

Infochimps hosts what looks to be a growing number of datasets mostly free. There seems to be some ability to sell your dataset (at a 50% commission rate!), but the real story is quick ability to browse data. It looks a little thin now, bu as someone who is constantly looking for good examples for teaching, this could be a valuable resource. (via gelman)

Posted by Matt Blackwell at 9:47 AM

2 March 2010

Newsdot maps the news

Newsdot is a new tool from Slate that displays a "social network" for topics in the news, be they people, organizations, or locations. Here's a look:

newsdot.png

It uses a product called Calais, which does automatic tagging of documents by finding keywords. You can try it out with any set of text with their viewer. Here is a sample output from an article in the New York Times about the primary elections in Texas:

calais.png

You can see that Calais has been able to identify all the Gov. Perry and Sen. Hutchison in addition to any pronouns or verbs that refer to them.

Some thoughts are below the fold.

  1. I love the idea of mapping the space of "news" and using tags is an creative way of doing this. One way of improving this whole enterprise would be to cluster the topics and use those clusters to color the dots instead of the type of "node" it is (currently, it's blue for countries, red for people, etc)
  2. Calais is the kind of tool that really grabs my attention, much like Mechanical Turk did when I first heard about it. These types of products are going to completely change the way we do research. There used to be large barriers of entry to conducting research because of the resources needed to collect, manage and store data. Even just a few years ago, if you wanted to get a large dataset, you would have to either spend a lot of time or hire someone. Tools like Calais and mturk allow non-programmers to collect and manage data at much faster rates, for much cheaper. This opening up of data could shake up academia by increasing the speed of research production and allowing "startup" researchers to produce high-quality analyses. (Relatedly, the opening up of information (not limited to data) over the last decade lowered the cost of becoming an "expert" and altered the depth vs. breadth tradeoff.)

Posted by Matt Blackwell at 11:17 AM

1 March 2010

Steenburgh on "Substitution Patterns of the Random Coefficients Logit"

We hope you will join us this Wednesday, March 3rd at the Applied Statistics workshop when we will be happy to have Thomas Steenburgh (Harvard Business School). Details, an abstract, and a link to the paper are below. A light lunch will be served. Thanks!

"Substitution Patterns of the Random Coefficients Logit"
Thomas Steenburgh
Harvard Business School
March 3rd, 2010, 12 noon
K354 CGIS Knafel (1737 Cambridge St)

You can find the paper at the SSRN.

Abstract:

Previous research suggests that the random coefficients logit is a highly flexible model that overcomes the problems of the homogeneous logit by allowing for differences in tastes across individuals. The purpose of this paper is to show that this is not true. We prove that the random coefficients logit imposes restrictions on individual choice behavior that limit the types of substitution patterns that can be found through empirical analysis, and we raise fundamental questions about when the model can be used to recover individuals' preferences from their observed choices.

Part of the misunderstanding about the random coefficients logit can be attributed to the lack of cross-level inference in previous research. To overcome this deficiency, we design several Monte Carlo experiments to show what the model predicts at both the individual and the population levels. These experiments show that the random coefficients logit leads a researcher to very different conclusions about individuals' tastes depending on how alternatives are presented in the choice set. In turn, these biased parameter estimates affect counterfactual predictions. In one experiment, the market share predictions for a given alternative in a given choice set range between 17% and 83% depending on how the alternatives are displayed both in the data used for estimation and in the counterfactual scenario under consideration. This occurs even though the market shares observed in the data are always about 50% regardless of the display.

Posted by Matt Blackwell at 10:43 AM