Intro to Text Analysis example.
Rich Nielsen
rnielsen@mit.edu
2/27/2014

This is a self-contained example of the process of quantitative text analysis.
I scrape blog posts from the IQSS blog (http://blogs.iq.harvard.edu/sss/),
parse them, and apply unsupervised and supervised text analysis methods.

Files and directories:

sss_analysis.R:  This is the main script.  It scrapes the docs, then analyzes them.

parseSSShtml.py:  This is a python script that parses the docs.  A note in sss_analysis.R tells you when to run it.

~/sss_html: this directory has the raw html

~/sss_posts: this directory has the processed blog posts (this is part of the output of parseSSShtml.py)

sss_data.csv: this is meta-data about the blog posts (this is the other output of parseSSShtml.py)



UPDATE: 4/20/2015. The IQSS blog has been taken down so the webscraping example is broken.  Also, some options for the tm package have changed.  I updated the code.

