Applied Statistics - Gopi Goswami 

 This week, the Applied Statistics Workshop will be presenting a talk by  Gopi Goswami  of the Harvard Statistics Department entitled " Evolutionary Monte Carlo Methods for Clustering. " Gopi Goswami received his Ph.D. from the Department of Statistics at Harvard in June 2005.  Before coming to Harvard, he was an undergraduate and master's student at the Indian Statistical Institute in Calcutta.  His dissertation, "On Population-Based MCMC Methods,"  develops new techinques for more efficiently sampling from a target density.  He is currently a post-doctoral scholar in the Harvard Statistics Department.  The presentation will be at noon on Wednesday, October 26 in Room N354, CGIS North, 1737 Cambridge St. Lunch will be provided.  The paper he will present on Wednesday explores these methods in the context of clustering problems: 
 


 We consider the problem of clustering a group of observations according to some objective function (e.g. K-means clustering, variable selection) or according to a posterior density (e.g. posterior from a Dirichlet
Process prior) of cluster indicators. We cast both kinds of problems in the framework of sampling for cluster indicators. So far, Gibbs sampling, “split-merge? Metropolis-Hasting algorithm and various modifications of
these have been the basic tools used for sampling in this context. We propose a new population based MCMC approach, in the same vein as parallel tempering. We introduce three new “crossover moves? (based on
swapping and reshuffling sub-clusters intersections) which make such an algorithm very efficient with respect to Integrated Autocorrelation Time (IAT) of various relevant statistics and also with respect to the ability to
escape from local modes. We call this new algorithm Population Based Clustering (PBC) algorithm. We apply PBC algorithm to motif clustering, Beta mixture of Bernoulli clustering and a Bayesian Information
Criterion (BIC) based variable selection problem. We also discuss clustering of mixture of Normals and compare the performance PBC algorithm as a stochastic optimizer with K-means clustering.