Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
« September 2008 | Main | November 2008 »
30 October 2008
LendingClub is a P2P lending site much like Propser. What makes them special is that they've released a full data set of all 4,564 past loans and their current status. As a data source this is extraordinary, since most literature on credit scoring uses proprietary data. For the LendingClub data, can we beat the FICO at default prediction by incorporating additional clues?
This post focuses on the borrower's "Loan Description," which I use along with FICO scores to predict defaults. The loan description is written by the borrower and usually pitches his qualifications and reasons for needing the money. Here's a randomly chosen example from someone who is current on his payments.
I have some credit card debt that I would like to pay-off. It makes sense to pay one lender as opposed to 5 credit card companies. I'd rather pay interest to one payee rather than split between 5 or 6.
This is a relatively short one -- the average description is 58 words long. Perhaps there are keywords in the description that impact the probability of default after controlling for the FICO score. Here's what I did to test for these keywords:
Now, the fun stuff. For our purposes define a Delinquency as either being late in your payments or having defaulted completely. The 10 words with the greatest p-values are below. I report marginal delinquency probabilities, not broken out by FICO score, simply for brevity; the actual M-H test controlled for the FICO scores.
Word | Loans With | P(Delinquency|No word) | P(Delinquency|Word) | p-value |
---|---|---|---|---|
also | 215 | 0.067 | 0.140 | 0.0004 |
need | 608 | 0.062 | 0.105 | 0.0015 |
business | 233 | 0.069 | 0.116 | 0.0038 |
live | 91 | 0.070 | 0.154 | 0.0057 |
already | 64 | 0.071 | 0.156 | 0.0059 |
other | 285 | 0.068 | 0.112 | 0.0081 |
bills | 223 | 0.067 | 0.135 | 0.0082 |
bill | 279 | 0.066 | 0.125 | 0.0117 |
interest | 660 | 0.081 | 0.053 | 0.0136 |
I have good credit and am looking to consolidate all my debt into one easy payment. I am looking to get married soon so the less multiple bills we have to keep track of the better. I have two credit cards with low balances that I would like to pay off. I have a furniture debt that I would also like to consolidate and I need to overhaul the commuter vehicle my fiance will begin driving. I have no recorded late or delinquent payments on my credit. I have worked for my current employer for 5 1/2 yrs and have good standing. I am excited to join hands in marriage with my lovely fiance and the remainder balance after consolidation will be used for marraige documentation purposes. I appreciate your consideration. Thank you.As for the other words, "need" implies that the borrower is in straits of some kind, while "live," "bill" and "bills" suggest that the money will be used for day-to-day expenses rather than a targeted goal, implying a systemic negative cash flow. "Already" suggests an existing outstanding loan. All but one word ("interest") on the list enhances delinquency risk. "Business" is somewhat surprising -- people who want money to start businesses must be greater risks. Here's an example:
i am trying to buy a residential Land in emerging and booming market like new delhi where building cost is very cheap and return of investment is 150% in just six months. I intend to purchase the land build the house with my friends help who is in building house business and make a six flats/3 floor house. and sale it each one of them under USD 12, 000.00.I'm stunned something like this got funded! All in all, such keywords look like a good building block for enhancing a credit score model that goes beyond FICO scores. In a saner credit market, a viable strategy would be to fund P2P loans judged by an enhanced model to minimize default risk. Right now, however, I'd be worried that the credit crisis could wipe out all these sites at the drop of a hat.
Posted by Kevin Bartz at 2:16 PM
29 October 2008
Amid the name-calling, insinuation and jingoism of this political season it is easy to get a bit depressed about the democratic process. Joe Bafumi and Michael Herron have an interesting working paper that is cause for some comfort. The paper, entitled "Preference Aggregation, Representation, and Elected American Political Institutions," assesses the extent to which our federal political institutions are representative, in the sense that elected officials have similar views to those of their constituents. They do this by lining up survey questions from the Cooperative Congressional Elections Study (recently discussed in our weekly seminar by Steve Ansolabehere) alongside similar roll call votes recorded for members of Congress, as well as President Bush's positions on a number of pieces of legislation. There are enough survey questions to be able to place the survey respondents on an ideological scale (using Bayesian ideal point estimation), enough pieces of legislation to place the members of Congress and the President on an ideological scale, and enough survey questions that mirrored actual roll call votes to bring everyone together on a unified scale.
Overall, the authors find that the system is pretty effective at aggregating and representing voters' preferences. Members of Congress are more extreme than the constituencies they represent (perhaps because they represent partisans in their own districts), but the median member of a state's delegation is usually pretty close to the median voter in that state. Since the voters were surveyed in 2006, the paper is able to look at how the election affected the ideological proximity of government to the voters, and as one would hope Bafumi and Herron find that government moved somewhat closer to the voters as a result of the legislative reshuffling.
Below is one of the interesting figures from the paper. The grey line shows the density of estimated ideal points among the voters (ie CCES survey respondents); the green and purple solid lines are the density of estimated ideal points among members of the current House and Senate. The arrows show the location of the median member of the current and previous House and Senate, the median American at the time of the 2006 election (based on the survey responses), and President Bush. As you can see, before the 2006 election the House and Senate were both to the right of the median American (as was President Bush); after the Democratic sweep Congress has moved closer to the median American. Members of Congress are more partisan than the voters throughout, although this seems to be more the case on the right than the left.
Posted by Andy Eggers at 9:45 AM
26 October 2008
Please join us this Wednesday, October 29th, when Michael Kellerman, PhD Candidate in the Department of Government, will present his work on "Electoral Punishment as Signaling in Subnational Elections". Mike provided the following abstract,
It is a well-established empirical regularity that parties in federal office suffer setbacks in state-level elections. Many authors attribute this to a desire on the part of voters to balance the policy preferences of the federal incumbent. In this paper, I consider an alternative explanation with a long tradition in the literature: voters punish the party of the federal incumbent in state elections in order to send a signal to the federal government. I construct a simple signaling model to formalize this intuition, which predicts that under most circumstances signaling can occur at only one level of government. I estimate a statistical model allowing for electoral punishment using data from German elections and find support for punishment at the state level, rather than the punishment at both levels implied by balancing theories.
Mike also provided a copy of his paper, available here .
The applied statistics workshop meets each Wednesday in room K-354 CGIS-Knafel, 1737 Cambridge St, Cambridge MA. The workshop convenes at 12 noon with a light-lunch, presentations usually begin around 1215 and conclude by 130 pm. As always, everyone is welcome!
Posted by Justin Grimmer at 10:03 PM
25 October 2008
There is an interesting paper by Guillermina Jasso and Samuel Kotz in Sociological methods and Research in which they analyzed the mathematical connections between two kinds of inequality: inequality between persons and inequality between subgroups. They showed that a general inequality parameter (a shape parameter c of a two-parameter continuous univariate distribution), or a deep structure of inequality, governs both types of inequality. More concretely, they demonstrated convenient measures of personal inequality like Gini coefficient, Arkinson's measure, Theil's MLD and Pearson's coefficient of variation, and measures of inequality between subgroup are nothing but functions of this general inequality parameter c. The c parameter, according to the authors, also governs the shape of Lorenz curve, a conventional graph tool to express inequality.
Given the unitary operation of this inequality parameter, the authors concluded there is a monotonic connection between personal inequality and between-group inequality, namely, as personal inequality increases, so does between-group inequality. This conclusion is kind of surprising and even contradictory to our intuition that it is very plausible, if not usual, that personal inequality can change due to within-group transfers while between-group inequality still keeps the same. The authors admitted that their conclusion hold only under certain set of conditions. For example, the derived relation between the two types of inequality assumes two-parameter distribution and non-intersecting Lorenz curves. You may consult the full article to obtain more technical details if interested.
Source:
Jasso, Guillermina and Samuel Kotz. 2008. "Two Types of Inequality: Inequality Between Persons and Inequality Between Subgroups." Sociological Methods & Research 37: 31-74.
click here to get a working paper version of that from IDEAS
Posted by Weihua An at 2:40 PM
23 October 2008
Students here are often interested in how to efficiently collect information from the web. Here's a basic tool: iMacros is a plugin for the Firefox browser and lets you create macros to automate tasks or collect information. It exploits that all elements in html pages can be identified and hence targeted. For example a form field will have an ID that iMacros finds and fills with a value of your choice or click a specified button for you. Two nice features are that you can record your own macros without scripting, and that you can use the plugin to collect text information off the web. The capabilities are not what you would get from your customized Python script but it's easy to use and edit, and gets the basics done.
(The basic plugin is free but they also sell other editions with more capabilities.)
Posted by Sebastian Bauhoff at 3:22 PM
22 October 2008
In reading Bill Easterly's working paper "Can the West Save Africa?," I came across an interesting metric Easterly uses to compare African nations with the rest of the world on a set of development indicators. The metric is, "Given that there are K African nations, what percent of the K lowest scoring countries were African?" I don't think I've ever seen anyone use that particular metric, but maybe someone has. Does it have a name? Does it deserve one?
Generally, looking at the percent of units below (or above) a certain percentile that have some feature is a way of describing the composition of that tail of the distribution. What's interesting about using a cutoff corresponding to the total number of units with that feature is that it produces an intuitive measure of overlap of two distributions: it gives us a rough sense of how many countries would have to switch places before all the worst countries were African or, put differently, before all of the African countries are in the worst group. It reminds me a bit of measures of misclassification in machine learning, where here the default classification is, "All the worst countries are African."
Needless to say, the numbers were bleak -- 88% for life expectancy, 84% for percent of population with HIV, 75% for infant mortality.
Posted by Andy Eggers at 11:02 PM
21 October 2008
Dan Hopkins, an IQSS post-doctoral fellow, is getting a lot of press lately for his paper on the vanishing Bradley effect (aka Wilder effect) -- whereby pre-election polls favor black candidates more than electorates. His results indicate that this effect has vanished and he predicts it will have little or no effect in the upcoming U.S. Presidential election. If you missed all the articles in the mainstream media, see this Science Magazine article. But more interesting is his paper on the subject, "No Wilder Effect, Never a Whitman Effect: When and Why Polls Mislead about Black and Female Candidates", which is easily the most extensive and definitive study of its kind; you can find a copy here.
Posted by Gary King at 10:14 AM
20 October 2008
Please note, there has been a scheduling change. Kosuke Imai, Department of Politics, Princeton University, will be presenting on November 12th.
In Kosuke's place, this wednesday, October 22nd, Don Rubin, Professor of Statistics, Harvard University, will present his paper, "For Objective Causal Inference, Design Trumps Analysis". Don provided the following abstract:
For obtaining causal inferences that are objective, and therefore have the best chance of revealing scientific truths, carefully designed and executed randomized experiments are generally considered to be the gold standard. Observational studies, in contrast, are generally fraught with problems that compromise any claim for objectivity of the resulting causal inferences. The thesis here is that observational studies have to be carefully designed to approximate randomized experiments, in particular, without examining any final outcome data. Often a candidate data set will have to be rejected as inadequate because of lack of data on key covariates, or because of lack of overlap in the distributions of key covariates between treatment and control groups, often revealed by careful propensity score analyses. Sometimes the template for the approximating randomized experiment will have to be altered, and the use of principal stratification can be helpful in doing this. These issues are discussed and illustrated using the framework of potential outcomes to define causal effects, which greatly clarifies critical issues.
Don has provided the full paper available here .
The applied statistics workshop meets at 12 noon in Room K-354, CGIS Knafel (1737 Cambridge St), with a light lunch. Our presentations begin at 1215 and usually conclude around 130 pm. As always, everyone is welcome!
Posted by Justin Grimmer at 3:07 PM
15 October 2008
While everyone is thinking about how the U.S. presidential election will turn out, I thought some of you might also be interested in a forthcoming Journal of Economic History article on a venerable electoral question -- why a democratic electorate in Germany chose a party which then ended their democracy. The article is "Ordinary Economic Voting Behavior in the Extraordinary Election of Adolf Hitler," by me, Ori Rosen, Martin Tanner, and Alex Wagner. There's also a good SwissInfo news story about our article.
Here's the abstract: The enormous Nazi voting literature rarely builds on modern statistical or economic research. By adding these approaches, we find that the most widely accepted existing theories of this era cannot distinguish the Weimar elections from almost any others in any country. Via a retrospective voting account, we show that voters most hurt by the depression, and most likely to oppose the government, fall into separate groups with divergent interests. This explains why some turned to the Nazis and others turned away. The consequences of Hitler's election were extraordinary, but the voting behavior that led to it was not.
Posted by Gary King at 10:57 AM
Like many people I know, I often find it hard to stay on task and avoid the temptations of the internet while I work. Email, blogs, news of financial meltdown -- I find myself turning to these distractions in between spurts of productivity, knowing that I would get more done if I just turned off the wireless and kept on task for longer stretches of time.
Well, those of us who have trouble giving up our blogs and other internet distractions may have an unlikely enabler in Alfred Marshall, the great economist. When he was seventeen, Marshall observed an artist who took a lengthy break after drawing each element of a shop window sign. As he later recounted, the episode shaped his own productivity strategy, towards something that sounds vaguely similar to my own routine:
That set up a train of thought which led me to the resolve never to use my mind when it was not fresh, and to regard the intervals between successive strains as sacred to absolute repose. When I went to Cambridge and became full master of myself, I resolved never to read a mathematical book for more than a quarter of an hour at a time without a break. I had some light literature always by my side, and in the breaks I read through more than once nearly the whole of Shakespeare, Boswell's Life of Johnson, the Agamemnon of Aeschylus (the only Greek play I could read without effort), a great part of Lucretius and so on. Of course I often got excited by my mathematics, and read for half an hour or more without stopping, but that meant that my mind was intense, and no harm was done.
Now, somehow I doubt that Marshall would consider the NYT op-ed pages to be "light literature" on par with Boswell, or that he would agree that watching incendiary political videos at TalkingPointsMemo.com qualifies as "absolute repose." But never mind that. Alfred Marshall told me I shouldn't work for more than fifteen minutes without distractions!
Posted by Andy Eggers at 8:06 AM
14 October 2008
The good folks at CNN are hot on the trail of a swing to McCain in Ohio, a crucial battleground state. CNN's headline claims, "Ohio Poll of Polls: McCain Gains Some Ground in Tight Race". From the story , we learn that,
"CNN's new Ohio poll of polls shows Barack Obama leading McCain by three points, 49 to 46 percent. Five percent of the state's voters were unsure about their presidential pick.
The network's last Ohio poll of polls, released October 9, showed Obama leading McCain by four points, 50 to 46 percent. In the September 21 poll of polls, Obama led McCain by a single point, 47 to 46 percent."
This is the smallest possible shift that a network would be willing to report: a one-percentage point decrease in support for Obama and no-change in support for McCain. The survey design and the poll of polls would have to be incredibly powerful to detect this subtle shift in the electorate's preferences.
With my interest piqued, I read further. As it turns out, CNN's analysis of the poll of polls is based on some claims that are suspect :
"The Ohio general election "poll of polls" consists of four surveys: Ohio Newspaper Poll/University of Cincinnati (October 4-8), ARG (October 4-7), CNN/Time/ORC (October 3-6) and ABC/Washington Post (October 3-5). The poll of polls does not have a sampling error."
What? No sampling error?
If CNN thinks that averaging four polls removes all variability, then I have a bridge in Alaska up for sale (and I'll throw in some oceanfront property in Arizona , which also seems appropriate).
It is more likely that the author meant that the margin of error would be hard to calculate. This is not equivalent to the margin of error not existing at all. For example, it is hard to calculate when the Cubs are going to win another World Series . But I pray that this does not mean that the date is undefined (which seems infinitely worse than never).
Of course, news networks want to justify covering politics as a horse race and want to ignore the warnings that small changes in polls are not real, even when you average over four surveys. But this seems like a particularly egregious abuse of polling numbers to make a race seem more fluid than reality (or reasonable statistics) seems to permit.
Posted by Justin Grimmer at 10:45 AM
13 October 2008
Dear Applied Statistics Community,
Please join us this Wednesday (October 15th) when Stephen Ansolabehere, Professor in Harvard's Department of Government, will present his work on "Vote Validation in the 2006 CCES". Stephen provided the following abstract,
New technology and recent political reform have made vote validation an easier and
more reliable process than it has been in the past. We present a basic summary of the
vote validation procedure used in the 2006 CCES, a Web-based survey of nearly 35,000
Americans that has been validated electronically with new state-wide voter files. As
the validation method in the CCES is quite different from the method used by the
National Election Studies (NES) in the 1960s through 1980s, we compare the CCES
procedure and results with the most recent midterm elections validated by the NES.
We show that while the rate of vote misreporting is substantially higher in the 2006
Web-based survey, the pattern of misreporting is consistent with the NES samples. We
also show how the large sample size in the CCES can be exploited to study phenomena
beyond vote misreporting using the validated records.
A paper is available for download here
The applied statistics workshop meets at 12 noon in Room K-354, CGIS Knafel (1737 Cambridge St), with a light lunch. Our presentations begin at 1215 and usually conclude around 130 pm. As always, everyone is welcome!
Cheers
Justin Grimmer
Posted by Justin Grimmer at 2:05 PM
10 October 2008
Hiding in the Ivory tower, I did not feel any impacts of this financial collapse until several friends in Los Angels told me that they were laid off and looking for new jobs. In contrast, another friend who owned a real estate appraisal firm said that his business actually became better, because more people came to re-evaluate their housing values. I am not sure if there is a direct causal effect of the financial collapse on my friends' unemployment. But a more interesting question, which arises very often these days, is what are the causes of the financial collapse, or say who shall be responsible for the financial collapse. Mortgage lenders, greedy wall-Street investment banks, the government with loose regulations, home buyers or any others? As quantitative social scientists, are we able and how able are we to answer to this kind of question? Before building any models, whether formal or empirical, let's see what Anderson Copper has gotten for us.
Over the last couple of weeks we've heard politicians tell us that now is not the time to point fingers and blame people for the financial crisis. I remember them saying that in the days after Hurricane Katrina as well. The truth is that's what politicians always say. They mean that now is the time to fix the problem, but once the world's attention moves on, the time for hold people accountable never seems to arrive. Politicians point fingers at members of the opposite party, but no one ever seems to take real responsibility.
So who is to blame for this financial fiasco? That's the question we've begun investigating. We've put together a list of the Ten Most Wanted: Culprits of the Collapse. This week and next week, every night, we will be adding a name to the list and telling you what they have done, and how much it's costing you. It's a rogues gallery of Wall Street executives, politicians, and government officials who did not do their jobs. It's time you know their names, their faces, it's time they be asked to account for their actions. (Excerpt from AC360)
Think about the models while enjoying the videos!
http://ac360.blogs.cnn.com/category/culprits-of-the-collapse/
Posted by Weihua An at 6:56 PM
8 October 2008
Jim Snyder and David Stromberg have produced a very interesting working paper called "Press Coverage and Political Accountability." It's a big paper and I haven't processed the whole thing, but I think it is an important and clever paper that speaks to big issues about the media and democratic accountability.
The goal of the paper is to trace the cycle of political accountability: politicians go about their jobs, the media reports on the politicians, voters consume the news and become informed about the politicians, and politicians shape their behavior to respond to or anticipate pressure from voters. It is a difficult thing to measure any of the effects implied by this cycle (e.g. how much do politicians respond to voter pressure? how much does media coverage respond to actual politician behavior? how much do voters learn from the news?) for the usual endogeneity reasons endemic in social science. It usually takes a very careful research design to say something convincing about any part of this cycle. Here, the cleverness comes in the observation that the amount of news coverage devoted to a member of Congress depends to some extent on the congruence between congressional district boundaries and media market boundaries. This congruence is high if most people in a congressional district read newspaper X, and most of paper X's readers are in that congressional district. It can be low in bigger cities, particularly cities located on state boundaries, and in areas with a lot of gerrymandering.
The innovation of the paper is to use the degree of fit between congressional districts and media markets as an exogenous source of variation in how much political news voters are exposed to. The authors look to see whether their measure of congruence is correlated with how much media coverage is devoted to the member of Congress, how much voters know about their member of Congress, and how energetic and effective members of Congress appear to be in carrying out their jobs. The correlations are surprisingly strong at each point in the cycle.
I kept expecting to see an instrumental variables regression, where congruence would serve as an instrument for, e.g., voter information in its effect on member discipline. Instead they kept providing the reduced form regression for everything, which is fine. In a sense there are more IV regressions here than you could figure out what to do with, since congruence could be thought of as an instrument in estimating any subsequent effect.
Here's the part of their abstract where they describe their findings:
Exploring the links in the causal chain of media effects -- voter information, politicians' actions and policy -- we find statistically significant and substantively important effects. Voters living in areas with less coverage of their U.S. House representative are less likely to recall their represenative's name, and less able to describe and rate them. Congressmen who are less covered by the local press work less for their constituencies: they are less likely to stand witness before congressional hearings, to serve on constituency-oriented committees (perhaps), and to vote against the party line. Finally, this congressional behavior affects policy. Federal spending is lower in areas where there is less press coverage of the local members of congress.
Posted by Andy Eggers at 10:58 PM
7 October 2008
With many of my friends are preparing for the annual job market song and dance, one question they will have soon is what salary expectations are appropriate for what position and institution.
It seems hard to know. Fortunately (and somewhat incredibly) the Department of Labor Foreign Labor Certification Data Center not only collects employer petitions for H-1B visas for foreign professionals, but the DOL also posts them online. The data goes back until 2001; information for other visa types is sometimes available for earlier years. Overall this seems like a great source for labor economic studies or the effects of visa restrictions etc. (Let us know if you use it!)
But the data is also good for a quick reality check on salary expectations. You can search by institution on the DOL website or type in a keyword in this search engine.
For example, looking for "assistant professor economics harvard" will reveal two visa petitions from the university, with a proposed salary of $115,000 in 2005. Stanford proposed to pay $120,000 in early 2006. The data is not just limited to academic jobs of course. You can also see that Morgan Stanley proposed to pay $85,000 for an analyst in New York in 2006. Or that a taxi company in Maryland proposed $11.41 per hour.
Naturally the data is limited since it only covers a specific group of job applicants. Maybe they'll take a lower salary in exchange for help with the visa, or they get paid more to leave their home countries. But the relative scales across institutions could be similar and it's better than no idea at all. Good luck on your job hunts and negotiations!
Posted by Sebastian Bauhoff at 2:40 PM
5 October 2008
Please join us this Wednesday, October 8th when Stefano Iacus, Department of Economics, Business and Statistics, University of Milan (yes, in Italy) will be presenting his work on Stochastic differential equations and applied statistics. Stefano provided the following abstract:
Stochastic differential equations (SDEs) arise naturally in many fields of science. Solutions of SDEs are continuous time processes and are usually proposed as alternative models to standard time series. While continuous time modeling seems better in describing the natural evolving nature of the underlying data generating process, observations always come in discrete form. This discrepancy raised new statistical challenges (e.g., the discrete time likelihood is not always available).
In the first part of the talk, we present few examples (from biostatistics, econometrics, political analysis, etc.) in which SDEs naturally emerge. Then, we present the general statistical issues peculiar to these models and finally we present some new applications (with solutions) like change point analysis, hypotheses testing and cluster analysis for discretely observed stochastic differential equations.
Stefano suggested that the following papers might offer helpful background information for his presentation.
De Gregorio, A., Iacus, S.M. (2008) Clustering of discretely observed diffusion processes
De Gregorio, A., Iacus, S.M. (2008) Divergences Test Statistics for Discretely Observed Diffusion Processes
The workshop will begin at 12 noon in room K-354 in 1737 Cambrdge St (CGIS-Knafel) with a light lunch and the presentation will commence around 1215. The workshop usually adjourns around 130 pm. All are welcome!
Posted by Justin Grimmer at 3:36 PM
4 October 2008
A guest post from Marc Alexander of Harvard's Gov department, who blogs at Politics and Health:
Politics kills! A new study on traffic fatalities on the election day...
A brilliant research report published in the Oct 2 issue of JAMA found that driving fatalities increase significantly on the election day in the US. Redelmeier from U of Toronto and Robert Tibshirani from Stanford found that the hazard of being hurt or dying in a traffic accident rises on the day of the Presidential election. While the effect seems to be bipartisan (or non-partisan?), the risk is higher for men, for those in the Northeast, and for those who vote early in the day. To my knowledge, this is the best systematic evidence that shows the dark side of political participation in the US; despite all the benefits and necessities of active participation to keep democracy alive, there also seem to be significant costs. Remember to vote, but be careful when driving or crossing the street this election season! The article was covered by Reuters and the New York Times here.
The original research report is available from JAMA here and is titled "Driving Fatalities on US Presidential Election Days." Here is the free excerpt from JAMA:
The results of US presidential elections have large effects on public health by their influence on health policy, the economy, and diverse political decisions. We are unaware of studies testing whether the US presidential electoral process itself has a direct effect on public health. We hypothesized that mobilizing approximately 50% to 55% of the population, along with US reliance on motor vehicle travel, might result in an increased number of fatal motor vehicle crashes during US presidential elections.
Posted by Andy Eggers at 12:35 PM
3 October 2008
This post looks at the linguistics of last night's Biden-Palin debate. Palin used the word "reform" 12 times compared to Biden's none. Biden used "middle class" 12 times to Palin's one.
Here's a sequel to my earlier Obama-Clinton post. Overall,
Overall, Biden uttered 7,065 words and Palin 7,646, with a total of 2,117 unique words. Which words did Biden use significantly more or less than Palin? For each word, we apply a chi-squared test that the candidates spoke the word with equal probability. Finally, we sort the list by p-value, highlighting the differences. I've eliminated words that appear over 50 times (mostly stop words like "the," which Palin evidently used a couple hundred times more than Biden).
Word | Biden | Palin | pval |
---|---|---|---|
also | 3 | 47 | 0.0000 |
their | 32 | 4 | 0.0000 |
number | 15 | 0 | 0.0002 |
want | 0 | 16 | 0.0003 |
united | 16 | 1 | 0.0004 |
policy | 22 | 4 | 0.0004 |
just | 6 | 28 | 0.0007 |
those | 10 | 34 | 0.0013 |
too | 0 | 13 | 0.0014 |
they | 41 | 18 | 0.0015 |
well | 24 | 7 | 0.0019 |
these | 1 | 15 | 0.0020 |
said | 40 | 18 | 0.0022 |
reform | 0 | 12 | 0.0023 |
who | 11 | 34 | 0.0025 |
even | 3 | 19 | 0.0025 |
down | 16 | 3 | 0.0034 |
gwen | 16 | 3 | 0.0034 |
Observations:
We can also look at bigrams, pairs of words, in a similar way.
Word | Biden | Palin | pval |
---|---|---|---|
the united | 16 | 1 | 0.0004 |
united states | 16 | 1 | 0.0004 |
we have | 9 | 34 | 0.0007 |
want to | 0 | 14 | 0.0009 |
he said | 11 | 0 | 0.0016 |
have got | 0 | 12 | 0.0023 |
and i | 6 | 25 | 0.0025 |
that is | 4 | 21 | 0.0026 |
and that's | 1 | 14 | 0.0032 |
middle class | 12 | 1 | 0.0035 |
Posted by Kevin Bartz at 11:38 AM
I recently came across a new paper by David Card, Alexandre Mas, and Jesse Rothstein entitled "Tipping and the Dynamics of Segregation." What's interesting from a methodological standpoint is that the authors use what may be called "inverted" regression discontinuity methods to test for race-based tipping in neighborhoods in American cities.
In a classic regression discontinuity design researchers commonly exploit the fact that treatment assignment changes discontinuously as a function of one or more underlying variables. For example scholarships may be assigned based on whether students exceed a test score threshold (like in the classic paper by Thistlethwaite and Campbell (1960)). Unlucky students who just miss the threshold are assumed to be virtually identical to lucky ones who score just above the cutoff value so that the threshold offers a clean identification of the counterfactual of interest (assuming no sorting).
In the Card et al. paper, the situation is slightly different because the authors have no hard-and-fast decision rule, but a theory that posits that whites' willingness to pay for homes depends on the neighborhood minority share and exhibits a tipping behavior. If the minority share exceeds a critical threshold, all the white households will leave. Since the location of the (city-specific) tipping point is unknown, the author's estimate it from the data and find that there are indeed significant discontinuities in the white population growth rate at the identified tipping points. Once the tipping point is located, they go on to examine whether rents or housing prices exhibit non-linearity around the tipping point but find no effects. They also try to explain the location of the tipping points by looking at survey data on racial attitudes of whites. Cities with more tolerant whites appear to have higher tipping points.
I think this is a very creative paper. The general approach could be useful in other contexts so take a look!
Posted by Jens Hainmueller at 8:10 AM
1 October 2008
I first saw IQSS's own Dan Hopkins' paper on the Wilder effect this summer at the PolMeth conference. Jens and I agreed that, of all the research that was presented at the conference, this was probably the thing that would have been most interesting to journalists. It directly addresses the speculation that, because survey respondents are afraid to appear racist, polls overstate Barack Obama's level of support. Here's the abstract:
The 2008 election has renewed interest in the Wilder effect, the gap between the share of survey respondents expressing support for a candidate and the candidate's vote share. Using new data from 133 gubernatorial and Senate elections from 1989 to 2006, this paper presents the first large-sample test of the Wilder effect. It demonstrates a significant Wilder effect only through the early 1990s, when Wilder himself was Governor of Virginia. Although the same mechanisms could affect female candidates, this paper finds no such effect at any point in time. It also shows how polls' over-estimation of front-runners' support can exaggerate estimates of the Wilder effect. Together, these results accord with theories emphasizing how short-term changes in the political context influence the role of race in statewide elections. The Wilder effect is the product of racial attitudes in specific political contexts, not a more general response to under-represented groups.
In the last couple of weeks, I have twice been in a situation where someone brings up the idea that Obama will do worse than the polls suggest because of the "Wilder effect." It's nice to have some research at hand to speak to this.
Googling around I notice that Dan's paper has been covered by a ton of blogs, as well as the Washington Post and some other papers. Nice work, Dan.
Posted by Andy Eggers at 5:41 PM