My research is at the interface of Machine Learning, Statistics, and Optimization. I am interested in formalizing the process of learning, in analyzing the learning models, and in deriving and implementing the emerging learning methods. A significant thrust of my research is in developing theoretical and algorithmic tools for online prediction, a learning framework where data arrives in a sequential fashion.
A high level description of a few research areas:
- Online Learning: We aim to develop robust prediction methods that do not rely on the i.i.d. or stationary nature of data. In contrast to the well-studied setting of Statistical Learning, methods that predict in an online fashion are arguably more complex and nontrivial. Major questions that arise in this setting are: (a) How to model the problem at hand? (b) How many examples are required to achieve certain level of performance, and what are the computationally-efficient methods? (c) How to deal with incomplete feedback and the exploration-exploitation dilemma? Examples: sequentially predicting users' preferences, classifying nodes in a social network, sequentially selecting medical treatment strategies while observing limited feedback about the past decisions, etc.
- High-Dimensional Statistics: This setting is centered around the problem of recovery of high-dimensional and structured signals hidden in noise. Since standard statistical methods are often computationally intractable, the question of interplay between computation and statistical optimality arises. Examples: estimation of communities in networks, recovery of few relevant genes in a large set of gene expression data, etc.
- Statistical Learning: We study the problem of building a good predictor based on an i.i.d. sample. While much is understood in this classical setting, our current focus is on the Deep Learning models. In particular, we study the various measures of complexity of neural networks that govern their out-of-sample performance. We aim to understand the "geometry" (in an appropriate sense) of neural networks and its relation to the prediction ability of trained models.
- Non-Convex Landscapes: Here we are interested in understanding properties of high-dimensional empirical landscapes that arise when one attempts to fit a model with many parameters (such as a multi-layer neural network or a latent variable model) to data. Some of the questions that arise are: (a) What is the behavior of optimization methods on such landscapes? (b) What salient features of the landscape arise from its random nature? (c) How can one exploit randomness in the optimization method to analyze its convergence?