Bayesian inference and natural selection 

 I saw an thought-provoking post at  John Baez's diary  the other day pointing out an interesting analogy between natural selection and Bayesian inference, and I can't decide if I should classify it as just "neat" or if it might also be "neat, and potentially deep" (which is where I'm leaning). Because it's a rather lengthy post, I'll just quote the relevant bits: 


 
The analogy is mathematically precise, and fascinating. In rough terms, it says that the process of natural selection resembles the process of Bayesian inference. A population of organisms can be thought of as having various "hypotheses" about how to survive - each hypothesis corresponding to a different  allele. (Roughly, an allele is one of several alternative versions of a gene.) In each successive generation, the process of natural selection modifies the proportion of organisms having each hypothesis, according to Bayes' law!

 Now let's be more precise: 

 Bayes' law says if we start with a "prior probability" for some hypothesis to be true, divide it by the probability that some observation is made, then multiply by the "conditional probability" that this observation will be made given that the hypothesis is true, we'll get the "posterior probability" that the hypothesis is true given that the observation is made. 

 Formally, the exact same equation shows up in population genetics! In fact, Chris showed it to me - it's equation 9.2 on page 30 of this book: 

 * R. Bürger, The Mathematical Theory of Selection, Recombination and Mutation, section I.9: Selection at a single locus, Wiley, 2000. 

 But, now all the terms in the equation have different meanings! 

 Now, instead of a "prior probability" for a hypothesis to be true, we have the frequency of occurence of some allele in some generation of a population. Instead of the probability that we make some observation, we have the expected number of offspring of an organism. Instead of the "conditional probability" of making the observation, we have the expected number of offspring of an organism given that it has this allele. And, instead of the "posterior probability" of our hypothesis, we have the frequency of occurence of that allele in the next generation.  
  

 Baez goes on to wonder, as I do, if people doing work on genetic programming or Bayesian approaches to machine learning have noticed this relationship.  I feel like I would have remembered if I'd seen something like this (at least recently), and I don't remember anything, but that doesn't mean it's not there -- any pointers, anyone?  [The closest I can think of is an interesting  chapter (pdf)  by David MacKay called "Why have sex? Information acquisition and evolution", but it's mainly about how one can use information theory to quantify the argument for why recombination (sex) is a better way to spread useful mutations and clear less-useful ones]. 

 Also, re: the conceptual deepness of this point... I've long thought (and I'm sure I'm not alone in this) that it's useful to see natural selection as a guided search over genotype (or phenotype) space; Bayesian inference, i.e., searching over "problem space" so as to maximize posterior probability seems to be a valuable and useful thing to do in machine learning and cognitive science. [Incidentally, I've also found it to be a useful rhetorical tool in discussing evolution with creationists -- the idea that computers can do intelligent searches over large spaces and find things with small "chance" probability is one that many of them can accept, and from there it's not so much of a leap to think that evolution might be kind of analogous; it also helps them to understand how "natural selection" is not "random chance", which seems to be the common misunderstanding]. Anyway, in that superficial sense, it's perhaps not surprising that this analogy exists; on the other hand, the analogy goes deeper than "they are both searches over a space" -- it's more along the lines of "they are both trying to, essentially, maximize the same equation (posterior probability)."  And  that's  interesting; where they differ is just, of course, in how each particular probability is calculated.  I'd guess that a lot of the work in natural selection is done in calculating P (offspring | allele) -- just as in a lot of Bayesian inference, many times the problem-specific part is setting up and figuring out how to calculate the likelihood.  Also, I wonder if you could do some interesting work in mathematical genetics by manipulating the prior (P(allele))... Baez's excerpt defines it simply as the frequency of occurrence of some allele in some generation, but of course it's more complicated than that - you also have to include the probability of a mutation to that allele, since otherwise you'd never get the emergence of novel alleles or combinations.  And I bet figuring out how to include the additional complexity would be very interesting and meaningful. 

 Anyway, I'm now speculating on things I know very little about, and I should go read the Burger book (which has been duly added to my ever-expanding reading list).  But I thought I'd throw out these speculations right now anyway, since you all might find them interesting.  And if anyone has any other references, I'd love to see them.