Tuesday, November 18, 2008

Releasing datasets and preserving privacy

This is an old but thought provoking New York Times story describing how an AOL database of search results where names of searchers had been removed could still be used to track down a single individual by analysis of search terms. It points to the difficulty of identity blinding a database that may contain complex sources of information that provide hints at location.




Tracking flu trends using massive numbers of searches on Google

Google is using keyword searches from its home page to track flu trends across the country. http://www.google.org/flutrends/


Particularly interesting is the graph on this page (http://www.google.org/about/flutrends/how.html) showing predictions from search terms versus data from the CDC.


This is clearly an example of where large amounts of data allow new types of data mining. It has also been flagged by some privacy groups (see http://bits.blogs.nytimes.com/2008/11/13/does-google-flu-trends-raises-new-privacy-risks/)


Saturday, November 1, 2008

Using mobile phones to report voting irregularities

This election phones will be used to report voting irregularities. NPR, for example, is participating in Twitter VoteReport, a nationwide initiative to get voters to report problems using Twitter, text messaging, or just normal phone lines.


This is another example of a large, real-time feedback project that, just a few years ago, would not have required a much larger amount of infrastructure to develop. (The benefit of twitter or text messages is that they can be more easily automatically processed than voicemail messages).