In a joint work with David Baker (Biochemistry, UW) and his lab,
we are studying ways to improve the numerical optimization procedure in
their Fortran code Rosetta for
predicting the 3-dimensional structure of a folded protein
from the amino acid sequence.
This is one of the central problems in computational molecular biology.
See Baker Lab homepage.
for more information on this problem.
A protein may be modelled as a set of points (atoms) in R3, with
certain pairs of atoms linked, i.e., the
Euclidean distance between each pair is constant and known.
The free energy function depends on the pairwise distance
between the atoms, and involves highly nonlinear terms such as
the Lennard-Jones function. The atoms are further categorized as either
backbone atoms or sidechain atoms.
The main problem is to find torsion angles between adjacent
links of backbone atoms that globally minimize the energy function, thus
yielding a prediction of the folded protein.
In Rosetta, the global
minimization of this energy function is done by using a Monte Carlo method.
In the current implementation of Monte Carlo in the Baker lab's
Rosetta code, the torsion angles are updated by moves. Four
different types of moves are used to update the torsion angles for the backbone.
In each one of these moves, a small subset of torsion angles (e.g., two
adjacent angles or five randomly chosen angles)
is randomly perturbed according to a Gaussian distribution
and then local minimization
(Powell's derivative-free method or the BFGS method) is applied
to find a local minimum of the energy with respect to backbone
angles within, say, 5 residues of the perturbed angles (the other angles are held fixed).
The resulting set of angles is accepted if it doesn't increase the
energy and is accepted with small probability if it
increases the energy.
For every, say, 25 of these moves attempted, a fifth move is attempted
to minimize the total energy with respect to the sidechain torsion angles.
The latter minimization is combinatorial in nature and expensive
(which is why it is attempted less frequently than the other
moves). These five moves are currently attempted sequentially.
The energy function has many local minima, corresponding to different
folding configurations and packing of atoms, and getting near a global
minimum from a local minimum usually requires a sequence of
moves/perturbations that initially increases the function value.
Part of the difficulty in efficiently finding global minimum
lies in the dimension
of the problem (number of backbone torsion angles), between 100 and 200,
which makes searching for a neighborhood of global convergence challenging.