Gelman's Paradox (or, The Probabilistic Backwards Reasoning Fallacy) 

  Andy Gelman  posted this forwarded item regarding  an apparent fallacy  with averages and the misunderstanding of uncertainly. Essentially, it boils down to this reversal: 

 a) 100 students take a class, and 50 pass. 
b) Given that next time, 50 students pass the (identical) class, how many students, on average, were enrolled? 

 The "fallacy" is in assuming that the expected number of original enrollees is 100, when it must  necessarily  be greater than 100 due to the uncertainty in the estimation of passing the class. The article points out that it's ignorance of the prior distribution of passing students that's at fault for the "fallacy" - I argue that it's the prior distribution of one student passing a test that's the cause of the paradox. 


 Break the problem in two: 
a) 100 students take a class, and 50 pass. 

 Assume for the moment that a student passes or fails the class independent of their peers (which is a reasonable assumption for the initial problem, dealing with the failure rate of vehicles.) Let's assume the standard noninformative prior case, that "half a student" passes and "half a student" fails (the Jeffreys prior) and that students are basically identical. Then the posterior distribution of the probability of passing the test is equivalent to a  Beta (50.5,50.5) distribution. 

 b) Given 50 students passed, on average how many enrolled? 
The number of students enrolled in the class for each one who passed is then 1/p - but the mean of 1/p (in this case, 2.02) is necessarily greater than 1/(the mean of p), 2. So the expected class size must be greater under these assumptions. So roughly 101 students enrolled. 

 The original authors, however, make a profound overestimation of the average of starting students, choosing a "posterior" distribution that yields a class size of 150. To get an expectation this big with this prior information, we would observe a posterior of Beta(2.0,2.0) - or, 1.5 students passing and 1.5 failing! Putting this in perspective, the most likely way I can see this happening is that students pooled their talents and produced 3 distinct final papers: one good, one bad, and one just good enough to get the professor to flip a coin. 

 It does, however, seem to explain why Harvard classrooms always seem to overflow chaotically at the beginning of each term. 

 P.S. The original authors call this the "backwards reasoning fallacy", even though Google says the name is better applied to   startling schoolchildren  deterministically rather than failing them stochastically. Resolving the namespace collision here, does this problem go by another name, or shall we go via  Stigler  and call it Gelman's paradox? 

 ----------------------------------- 
Update:  We recently received this comment from the work's original author, as the comment system failed to post it. I've attached it verbatim. -AT, 8-12-08 

 I am the author of the original article and a colleague of mine alerted me to your posting on Andy Gellman's blog. You said (about my article): 

 "An interesting problem with an awful delivery." 

 You also said: 

 "I'd normally agree that someone's selling something with this, but the fact that the page was cosponsored by a university makes me wonder about their grossly exaggerated result." 

 For a start it would not have been too difficult for you to have found out who I was since my name is very clearly stated at the bottom of the article, and the web site provides full information about me. So it would have been nice for you to raise the concerns you have about the article with me directly rather than through the use of insulting comments on a third party web site. 

 As to the substance of your criticisms, you seem to have misunderstood the particular problem and context and have produced a different model, that does not address the very real example that we had to deal with. You say that 

 "The original authors ... make a profound overestimation of the average of starting students, choosing a "posterior" distribution that yields a class size of 150." 

 This is not what I did at all. I made it clear that the crucial assumption was the prior average class size. To illustrate the problem I chose an example in which the prior average was deliberately high, 180. The fact that this gives a posterior average class size of about 153 when the 50 passes is observed is exactly the point I wanted to emphasize. Your comment about us making a "profound overestimation" is quite simply nonsense. Part of the fallacy was to assume that the class size of 100 in the specific example was in any way representative of the average class size. 

 I suggest you read the article again and pay particular attention to the (real) vehicle example at the end. The model that I produced EXACTLY represented the real data. 

 You should also be aware that the aim of my probability puzzles/fallacies web page is to raise awareness of probability (and in particular Bayesian reasoning) to as broad an audience as possible. While I am pleased if other professional statisticians read it, it is not they who are the target. This means having to use a language and presentation style that does not fit with the traditional academic approach. 

 In fact, one thing I have discovered over the years is that too many academic statisticians tend to speak only to other like-minded academic statisticians. The result is that in practice (i.e. in the real world) potentially powerful arguments have been 'lost' or simply ignored due to the failure to present them in a way in which lay people can understand. I have seen this problem extensively first hand in work as an expert witness. For example, in a recent medical negligence case the core dispute was solved by a very straightforward Bayesian argument. However, this had been presented to the defence lawyers and expert physicians in the traditional formulaic way. Neither the lawyers nor the physicians could understand the argument, and the QC was adamant that he could not present it in court. We were brought in to check the validity of the Bayesian results and to provide a user-friendly explanation that would enable the lawyers and doctors to understand it sufficiently well to present it in court. The statisticians simply did not realise that what is simple to them may be incomprehensible to others, and that there are much better (visual) ways to present these arguments. We used a decision tree and all the parties understood it immediately because it was couched in term of real number of patients rather than abstract probabilities. Had we not been involved the (valid) Bayesian argument would simply have never been used. 

 Norman Fenton 

 Professor of Computer Science 

 Head of RADAR (Risk Assessment and Decision Analysis Research) 

 Computer Science Department 

 Queen Mary (University of London) 

 London E1 4NS. 

 Email: norman@dcs.qmul.ac.uk 

 www.dcs.qmul.ac.uk/research/radar/ 

 www.dcs.qmul.ac.uk/~norman/ 

 Tel: 020 7882 7860 

  
CEO 

 Agena Ltd 

 www.agena.co.uk 

 London Office: 

 32-33 Hatton Garden 

 London EC1N 8DL 

 Tel: +44 (0) 20 7404 9722 

 Fax: +44 (0) 20 7404 9723