================================================================

Protein Folding


What is protein folding?

Proteins are extremely abundant and important biological molecules. Every protein is a composed exclusively of a long chain of amino acids.

See structures of all the amino acids

But in cells, proteins do not exist as long linear chains. They exist as complex 3-dimensional structures. The 3-dimensional structure of a protein is very important to its function. Enzymes, for example, function by binding other molecules at particular sites that are formed by their 3-dimensional shape.

See 3D "ribbon models" of actual proteins

The 3-dimensional shape is obtained when the linear amino-acid chain folds up in a particular way. This occurs almost instantaneously after the protein is synthesized in the cell. Given nothing but its sequence of amino acids, the protein somehow finds the correct shape (presumably the one that is most energetically stable) within seconds and folds into it.

Though the proteins seem to have no difficulties doing this, it is a difficult problem for humans and computers. To take a sequence of amino acids and translate it into the correct 3-dimensional shape, a computer would have to take the following steps:

Bearing in mind that a typical protein is composed of hundreds to thousands of amino acids, it becomes clear that this is a problem that would be difficult to solve efficiently. It is unknown how the proteins themselves do it. They "do not fold by a random search" (5) as a computer would. A small protein of only 100 amino acids would result in a search of 1E50 years just to find every possible folding by a random method.

Obviously, since the problem is so inefficient, computer programs must use vastly simplified models to study protein folding.


Why is this an important problem?

Sometimes (as when drugs are being designed) it would be useful to find a protein with a particular shape. For example, a drug company might want to find a drug molecule that would fit into a particular enzyme receptor site. If it were known what sequence of amino acids would produce the shape wanted, then such a protein could be synthesized.

Also, given the amino acid sequence of an existing protein, it would be useful to know what its shape is. The Human Genome Project (a project to sequence the entire human genome) will near completion in the next year. This project will provide scientists with the DNA sequence of every gene, which can be easily translated into the amino acid sequence of the proteins that the genes produce. However, an amino acid sequence of an unknown protein gives no clue as to the protein's function. If the shape could be found from the sequence, the function of many previous unknown proteins might then be elucidated.


What does this have to do with linkages?

A useful way of modeling proteins, as well as other molecules, is through the use of linkages. A complex model might see every atom as a joint (vertex) and every bond as a bar (edge.) Simpler models of proteins ignore properties like bond length, bond angle, and individual atomic interactions. These simple models consider each amino acid (actually a large multiatom structure) as a single unit (vertex), and ignore all bond properties. The bonds between amino acids, rather than between individual atoms, are the edges.

Using such a model, every protein can be turned into an open-chain linkage. These linkages are easy to fold up. By picturing such a linkage on a grid, it becomes simple to compute possible foldings. Obviously, such a highly simplified model ignores much of the atomic interactions that contribute to protein folding. But it is possible to start with a very simple open-chain linkage model and add other features until the folding becomes more realistic.



================================================================

Need more information?

An article from American Scientist describing a simple model of protein folding. (This is also linked from the Linkage Page.)
An article from Science News describing computer simulations of protein folding.
A paper in press on computer simulations of protein folding.
Some basic descriptions of protein folding.
Another student project on protein folding, discussing two different computer models.
Essays on the biological importance of protein folding.
Huge list of references on protein structure prediction by mathematical methods.
The Protein Structure Prediction Center

Still need more information?

All of these papers are available in Smith's science library.

  1. Chan, Hue Sun, and Dill, Ken A. 1993. The protein folding problem. Physics Today February: 24-32.
  2. Dinner, Aaron R., So, Sung-Sau, and Karplus, Martin. 1998. Use of quantitative structure-property relationships to predict the folding ability of model proteins. Proteins 33, no. 2: 177-203.
  3. Orengo, C. A. 1999. Analysis and assessment of ab initio three-dimensional prediction, secondary structure, and contacts prediction. Proteins 0, no. SUPPL. 3: 149-170.
  4. Radford, Sheena E. and Dobson, Christopher M. 1999. From computer simulations to human disease: Emerging themes in protein folding. Cell 97, no. 3: 291-298.
  5. Skolnick, Jeffrey, and Kolinski, Andrzej. 1990. Simulations of the folding of a globular protein. Science 250: 1121-1125.
  6. Wernisch, Lorenz, Hunting, Marcel, and Wodak, Shoshana J. 1999. Identification of structural domains in proteins by a graph heuristic. Proteins 35, no. 3: 338-352.
  7. Yue, Kaizhi, et al. 1995. A test of lattice protein folding algortihms. Proceedings of the National Academy of the Sciences of the U.S.A. 92: 325-329.

Not good enough for you? Need some more complex modelling methods?

Ignoring individual bonds and atoms can be useful, but it is not very realistic. That's why there are ways of more realistically finding molecular conformation spaces.