Victorian Literature, Statistics Style 

 A  recent New York Times article  highlights  recent research  by Dan Cohen and Fred Gibbs that uses computational power and statistics* to answer questions about what Victorian literature says about the Victorians.  I've known that this kind of thing was in the works in various parts of the humanities, but I haven't been keeping up.  I think this kind of analysis will be making more inroads into the humanities and social sciences in the future (a  previous NYT article in the series  takes up this issue).   

 * Ok, they are just using word frequencies at the moment, but the data they are collecting in collaboration with Google made me drool at the possibilities for machine learning applications. 
 


 One quick observation: 
It was interesting to me that the criticisms of quantitative text analysis are the same in literature as they are in political science. 
(1) It lets people get away with not reading or interpreting the texts. 
(2) It undermines the ability of research to get nuanced meaning out of texts. 
(3) It shapes the kinds of questions researchers ask. 

 My quick thoughts on these: 
(1) People who use statistical text analysis in their work generally have to read a lot of texts.  I don't think quantification is a substitute for reading. 
(2) Quant text analysis often does gloss over nuanced meanings, but it can often reveal broad trends in a huge body of texts that a close reading of a handful of texts can't. 
(3) We tend to only really entertain research questions that we think we have the tools to answer, so until recently, very few people have tried to answer questions where you'd need to read 100,000 documents to get an answer.  Putting these types of questions in the realm of possibility is not a bad thing.