Inauthentic Paper Detector 

 A group at the Indiana School of Informatics has developed a  software  to detect whether a document is "human written and authentic or not."  The idea was inspired by the successful attempt of MIT students in 2004 to place a computer-generated document at a conference (see  here ).  Their program collated random fragments of computer science speak into a short paper that was accepted at a major conference without revision.  (That program is online and you can generate your own paper, though unfortunately it only writes computer science articles). 


 The new tool lets users paste pieces of text and then assesses whether the content is likely to be authentic or just gibberish.  The program tries to identify human-style writing that is characterized by certain repition patterns and apparently does rather well.   It is not clear whether this works well for social science type articles.  The first paragraphs of a recent health economics article (to remain unnamed) only have a 35.5% chance of being authentic.  Hmm... 

 So is this just a joke or useful programming?  The authors  say  it could be used to differentiate whether a website is authentic or bogus, or to identify different types of texts (articles vs blogs, for example).  I wonder what the algorithms behind such technology are, and whether this will lead to an arms race between fakers and detectors?  If one of them can recognize a human-written text could this be used by the faking software? 

 If further tweaked, could this have an application in the social sciences?  Maybe we could use the faking software to search existing papers, collate them smartly and use that to identify patterns and get new ideas?  Maybe everyone should run their papers through a detector software before submitting it to a journal or presenting at a workshop?  And students watch out!  No more random collating at 3am to meet the next day deadline! 

 PS: this blog entry has been classified as "inauthentic with a 26.3% chance of being an authentic text"...