Proteins are extremely abundant and important biological molecules. Every protein is a composed exclusively of a long chain of amino acids.
See structures of all the amino acids
But in cells, proteins do not exist as long linear chains. They exist as complex 3-dimensional structures. The 3-dimensional structure of a protein is very important to its function. Enzymes, for example, function by binding other molecules at particular sites that are formed by their 3-dimensional shape.
See 3D "ribbon models" of actual proteins
The 3-dimensional shape is obtained when the linear amino-acid chain folds up in a particular way. This occurs almost instantaneously after the protein is synthesized in the cell. Given nothing but its sequence of amino acids, the protein somehow finds the correct shape (presumably the one that is most energetically stable) within seconds and folds into it.
Though the proteins seem to have no difficulties doing this, it is a difficult problem for humans and computers. To take a sequence of amino acids and translate it into the correct 3-dimensional shape, a computer would have to take the following steps:
Bearing in mind that a typical protein is composed of hundreds to thousands of amino acids, it becomes clear that this is a problem that would be difficult to solve efficiently. It is unknown how the proteins themselves do it. They "do not fold by a random search" (5) as a computer would. A small protein of only 100 amino acids would result in a search of 1E50 years just to find every possible folding by a random method.
Obviously, since the problem is so inefficient, computer programs must use vastly simplified models to study protein folding.
Sometimes (as when drugs are being designed) it would be useful to find a protein with a particular shape. For example, a drug company might want to find a drug molecule that would fit into a particular enzyme receptor site. If it were known what sequence of amino acids would produce the shape wanted, then such a protein could be synthesized.
Also, given the amino acid sequence of an existing protein, it would be useful to know what its shape is. The Human Genome Project (a project to sequence the entire human genome) will near completion in the next year. This project will provide scientists with the DNA sequence of every gene, which can be easily translated into the amino acid sequence of the proteins that the genes produce. However, an amino acid sequence of an unknown protein gives no clue as to the protein's function. If the shape could be found from the sequence, the function of many previous unknown proteins might then be elucidated.
A useful way of modeling proteins, as well as other molecules, is through the use of linkages. A complex model might see every atom as a joint (vertex) and every bond as a bar (edge.) Simpler models of proteins ignore properties like bond length, bond angle, and individual atomic interactions. These simple models consider each amino acid (actually a large multiatom structure) as a single unit (vertex), and ignore all bond properties. The bonds between amino acids, rather than between individual atoms, are the edges.
Using such a model, every protein can be turned into an open-chain linkage. These linkages are easy to fold up. By picturing such a linkage on a grid, it becomes simple to compute possible foldings. Obviously, such a highly simplified model ignores much of the atomic interactions that contribute to protein folding. But it is possible to start with a very simple open-chain linkage model and add other features until the folding becomes more realistic.
All of these papers are available in Smith's science library.
Ignoring individual bonds and atoms can be useful, but it is not very realistic. That's why there are ways of more realistically finding molecular conformation spaces.