Predicting Pennsylvania, Updated 

  Update : Check out how my predictions fared! Two comparisons are given, one showing both maps in the same image and one as an animated GIF (kudos to the  animation  package in R). 

   


     

 Overall, my predictions did pretty well. Their overall correlation with the true vote shares was .89 -- leading to an R^2 of .79, just below the in-sample R^2. My biggest miss was Centre County, where I predicted that Clinton would edge out Obama. Instead, Obama won pretty convincingly, with over 60% of the vote. I also overestimated Obama’s support in some of the counties surrounding Philadelphia. Not sure what I can do to improve the model next time. If you have any ideas, leave a comment. 

  Original entry :This isn't my normal blogging day, but I wanted to show my final Pennsylvania prediction map. Later on I will update my post to include the true map in the same color scheme, so we can compare. I have updated the prediction model after everyone's suggestions last time. 

     

 The big problems last time were: 

 
 Kerry's vote share was only a loose indicator of Obama's, not enough to base a model upon 
 The model didn't incorporate other obvious factors like population density, nearby colleges, etc. 
 R^2 = 0.16 isn't all that god! 
 

 There were other comments, too, but not all of them could be addressed effectively (What else can I do besides predict on the county level? That's where we have data!) Well, I'm happy to say that for the latest model I pulled in lots more covariates from the census: 

 
	 Kerry's 2004 vote share 
	 % Whites 
	 % Blacks 
	 % Hispanics 
	 % males 
	 % young people (age 18 through 21) 
	 % urban population 
	 Population density 
	 Median household income 
 

 With all these, the model fits like a dream come true. R^2 = 0.82 and a residual standard error of 0.04 (i.e., +- 8% of Obama's true share). Here are the estimated coefficients (after pruning some variables based on the BIC): 

    Name    Estimate    Std. Error    t value    Pr(&gt;|t|)     (Intercept)   -1.93    0.35    -5.44    0.00     kerry   -0.29    0.06    -4.66    0.00     black    1.00    0.10     9.81    0.00     hisp    0.74    0.30     2.49    0.01     male   -1.52    0.33    -4.60    0.00     young    1.46    0.22     6.59    0.00     log(income)    0.29    0.03     9.96    0.00    

 The coefficients are pretty much as you expect: counties with more Blacks, young people and higher incomes vote for Obama. Poorer counties and counties where Kerry did well tend to go for Clinton. The only somewhat surprising part is the negative coefficient on male population. You would think counties with more females would go for Clinton. There's probably some confounder, because there were several counties in Ohio with 55% male populations who went for Clinton. 

 Anyway, I will update this post tomorrow comparing my predictions to the realized results.