Proteins are an important class of biological macromolecules present in all
organismIn biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...
s. Proteins are polymers of
amino acidAmino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...
s. Classified by their physical size, proteins are
nanoparticleIn nanotechnology, a particle is defined as a small object that behaves as a whole unit in terms of its transport and properties. Particles are further classified according to size : in terms of diameter, coarse particles cover a range between 10,000 and 2,500 nanometers. Fine particles are sized...
s (definition: 1–100 nm). Each protein polymer – also known as a polypeptide – consists of a sequence formed from 20 possible L-α-amino acids, also referred to as residues. For chains under 40 residues the term
peptidePeptides are short polymers of amino acid monomers linked by peptide bonds. They are distinguished from proteins on the basis of size, typically containing less than 50 monomer units. The shortest peptides are dipeptides, consisting of two amino acids joined by a single peptide bond...
is frequently used instead of protein. To be able to perform their biological function, proteins fold into one or more specific spatial conformations, driven by a number of non-covalent interactions such as hydrogen bonding, ionic interactions, Van Der Waals forces, and hydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure. This is the topic of the scientific field of
structural biologyStructural biology is a branch of molecular biology, biochemistry, and biophysics concerned with the molecular structure of biological macromolecules, especially proteins and nucleic acids, how they acquire the structures they have, and how alterations in their structures affect their function...
, which employs techniques such as
X-ray crystallographyX-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and causes the beam of light to spread into many specific directions. From the angles and intensities of these diffracted beams, a crystallographer can produce a...
, NMR spectroscopy, and
dual polarisation interferometryDual polarization interferometry is an analytical technique that can probe molecular scale layers adsorbed to the surface of a waveguide by using the evanescent wave of a laser beam confined to the waveguide...
to determine the structure of proteins.
Protein structures range in size from tens to several thousand residues Very large aggregates can be formed from
protein subunitIn structural biology, a protein subunit or subunit protein is a single protein molecule that assembles with other protein molecules to form a protein complex: a multimeric or oligomeric protein. Many naturally occurring proteins and enzymes are multimeric...
s: for example, many thousand
actinActin is a globular, roughly 42-kDa moonlighting protein found in all eukaryotic cells where it may be present at concentrations of over 100 μM. It is also one of the most highly-conserved proteins, differing by no more than 20% in species as diverse as algae and humans...
molecules assemble into a
microfilamentMicrofilaments are the thinnest filaments of the cytoskeleton, a structure found in the cytoplasm of all eukaryotic cells. These linear polymers of actin subunits are flexible and relatively strong, resisting buckling by multi-piconewton compressive forces and filament fracture by nanonewton...
.
A protein may undergo reversible structural changes in performing its biological function. The alternative structures of the same protein are referred to as different
conformationIn chemistry, conformational isomerism is a form of stereoisomerism in which the isomers can be interconverted exclusively by rotations about formally single bonds...
s, and transitions between them are called
conformational changeA macromolecule is usually flexible and dynamic. It can change its shape in response to changes in its environment or other factors; each possible shape is called a conformation, and a transition between them is called a conformational change...
s.
Protein covalent structure and stereochemistry
Protein amino acids are combined into a single polypeptide chain in a
condensation reactionA condensation reaction is a chemical reaction in which two molecules or moieties combine to form one single molecule, together with the loss of a small molecule. When this small molecule is water, it is known as a dehydration reaction; other possible small molecules lost are hydrogen chloride,...
. This reaction is
catalysedCatalysis is the change in rate of a chemical reaction due to the participation of a substance called a catalyst. Unlike other reagents that participate in the chemical reaction, a catalyst is not consumed by the reaction itself. A catalyst may participate in multiple chemical transformations....
by the
ribosomeA ribosome is a component of cells that assembles the twenty specific amino acid molecules to form the particular protein molecule determined by the nucleotide sequence of an RNA molecule....
in a process known as translation.
The 20 naturally occurring amino acids have different
physical and chemical propertiesProteinogenic amino acids are those amino acids that can be found in proteins and require cellular machinery coded for in the genetic code of any organism for their isolated production. There are 22 standard amino acids, but only 21 are found in eukaryotes. Of the 22, 20 are directly encoded by...
, including their electrostatic charge, pK
a, hydrophobicity, size and specific functional groups. These properties play a major role in molding protein structure.
The peptide bond
The
peptide bondThis article is about the peptide link found within biological molecules, such as proteins. A similar article for synthetic molecules is being created...
tend to be planar due to the delocalization of the
electronThe electron is a subatomic particle with a negative elementary electric charge. It has no known components or substructure; in other words, it is generally thought to be an elementary particle. An electron has a mass that is approximately 1/1836 that of the proton...
s from the double bond. The rigid peptide
dihedral angleIn geometry, a dihedral or torsion angle is the angle between two planes.The dihedral angle of two planes can be seen by looking at the planes "edge on", i.e., along their line of intersection...
, ω (the bond between C
1 and N) is always close to 180 degrees. The dihedral angles phi φ (the bond between N and C

) and psi ψ (the bond between C

and C
1) can have a certain range of possible values. These angles are the internal degrees of freedom of a protein, they control the protein's conformation. They are restrained by geometry to allowed ranges typical for particular secondary structure elements, and represented in a
Ramachandran plot-Introduction and early history:A Ramachandran plot , originally developed in 1963 by G. N. Ramachandran C. Ramakrishnan and V...
. A few important
bond length- Explanation :Bond length is related to bond order, when more electrons participate in bond formation the bond will get shorter. Bond length is also inversely related to bond strength and the bond dissociation energy, as a stronger bond will be shorter...
s are given in the table below.
| Peptide bond |
Average length |
Single bond |
Average length |
Hydrogen bond |
Average (±30) |
C –C |
153 pm |
C–C |
154 pm |
O–H --- O–H |
280 pm |
| C–N |
133 pm |
C–N |
148 pm |
N–H --- O=C |
290 pm |
N–C |
146 pm |
C–O |
143 pm |
O–H --- O=C |
280 pm |
Side-chain conformation
The atoms along the side chain are named with Greek letters in Greek alphabetical order: α, β, γ, δ, є, and so on. C

refers to the carbon atom of the backbone closest to the carbonyl group of that amino acid, C
β the second closest and so on. The dihedral angles around the bonds between these atoms are named χ1, χ2, χ3, etc. The dihedral angle of the first movable atom of the side chain,

, defined as N-C

-C

-

, is named χ1. Side chains tend to adopt different staggered conformations called
gauche(-),
trans, and
gauche(+), which corresponds to rotation angles of 60°, 180°, and -60°, respectively, around the sp3-sp3 bonds.
The diversity of side-chain conformations is often expressed in rotamer libraries. A rotamer library is a collection of rotamers for each residue type. Side-chain dihedral angles are not evenly distributed, but for most side chain types, the

angles occur in tight clusters around certain values. Rotamer libraries therefore are usually derived from statistical analysis of side-chain conformations in known structures of proteins by clustering observed conformations or by dividing dihedral angle space into bins, and determining an average conformation in each bin.
Levels of protein structure
There are four distinct levels of protein structure.
Primary structure
The
primary structureThe primary structure of peptides and proteins refers to the linear sequence of its amino acid structural units. The term "primary structure" was first coined by Linderstrøm-Lang in 1951...
refers to amino acid sequence of the polypeptide chain. The primary structure is held together by
covalentA covalent bond is a form of chemical bonding that is characterized by the sharing of pairs of electrons between atoms. The stable balance of attractive and repulsive forces between atoms when they share electrons is known as covalent bonding....
or
peptide bondThis article is about the peptide link found within biological molecules, such as proteins. A similar article for synthetic molecules is being created...
s, which are made during the process of
protein biosynthesisProtein biosynthesis is the process in which cells build or manufacture proteins. The term is sometimes used to refer only to protein translation but more often it refers to a multi-step process, beginning with amino acid synthesis and transcription of nuclear DNA into messenger RNA, which is then...
or translation. The two ends of the polypeptide chain are referred to as the carboxyl terminus (C-terminus) and the amino terminus (N-terminus) based on the nature of the free group on each extremity. Counting of residues always starts at the N-terminal end (NH
2-group), which is the end where the amino group is not involved in a peptide bond. The primary structure of a protein is determined by the gene corresponding to the protein. A specific sequence of
nucleotideNucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...
s in
DNADeoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
is
transcribedTranscription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...
into mRNA, which is read by the ribosome in a process called translation. The sequence of a protein is unique to that protein, and defines the structure and function of the protein. The sequence of a protein can be determined by methods such as
Edman degradationEdman degradation, developed by Pehr Edman, is a method of sequencing amino acids in a peptide. In this method, the amino-terminal residue is labeled and cleaved from the peptide without disrupting the peptide bonds between other amino acid residues....
or tandem mass spectrometry. Often however, it is read directly from the sequence of the gene using the
genetic codeThe genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....
. Post-translational modifications such as disulfide formation, phosphorylations and glycosylations are usually also considered a part of the primary structure, and cannot be read from the gene.
Amino acid residues
Each α-amino acid consists of a backbone part that is present in all the amino acid types, and a side chain that is unique to each type of residue. An exception from this rule is
prolineProline is an α-amino acid, one of the twenty DNA-encoded amino acids. Its codons are CCU, CCC, CCA, and CCG. It is not an essential amino acid, which means that the human body can synthesize it. It is unique among the 20 protein-forming amino acids in that the α-amino group is secondary...
. Because the carbon atom is bound to four different groups it is
chiralA chiral molecule is a type of molecule that lacks an internal plane of symmetry and thus has a non-superimposable mirror image. The feature that is most often the cause of chirality in molecules is the presence of an asymmetric carbon atom....
, however only one of the
isomerIn chemistry, isomers are compounds with the same molecular formula but different structural formulas. Isomers do not necessarily share similar properties, unless they also have the same functional groups. There are many different classes of isomers, like stereoisomers, enantiomers, geometrical...
s occur in biological proteins. Glycine however, is not chiral since its side chain is a hydrogen atom. A simple
mnemonicA mnemonic , or mnemonic device, is any learning technique that aids memory. To improve long term memory, mnemonic systems are used to make memorization easier. Commonly encountered mnemonics are often verbal, such as a very short poem or a special word used to help a person remember something,...
for correct L-form is "CORN": when the C
α atom is viewed with the H in front, the residues read "CO-R-N" in a clockwise direction.
Secondary structure
Secondary structureIn biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...
refers to highly regular local sub-structures. Two main types of secondary structure, the
alpha helixA common motif in the secondary structure of proteins, the alpha helix is a right-handed coiled or spiral conformation, in which every backbone N-H group donates a hydrogen bond to the backbone C=O group of the amino acid four residues earlier...
and the beta strand, were suggested in 1951 by
Linus PaulingLinus Carl Pauling was an American chemist, biochemist, peace activist, author, and educator. He was one of the most influential chemists in history and ranks among the most important scientists of the 20th century...
and coworkers. These secondary structures are defined by patterns of hydrogen bonds between the main-chain peptide groups. They have a regular geometry, being constrained to specific values of the dihedral angles ψ and φ on the
Ramachandran plot-Introduction and early history:A Ramachandran plot , originally developed in 1963 by G. N. Ramachandran C. Ramakrishnan and V...
. Both the alpha helix and the beta-sheet represent a way of saturating all the hydrogen bond donors and acceptors in the peptide backbone. Some parts of the protein are ordered but do not form any regular structures. They should not be confused with
random coilA random coil is a polymer conformation where the monomer subunits are oriented randomly while still being bonded to adjacent units. It is not one specific shape, but a statistical distribution of shapes for all the chains in a population of macromolecules...
, an unfolded polypeptide chain lacking any fixed three-dimensional structure. Several sequential secondary structures may form a "
supersecondary unitA supersecondary structure is a compact three-dimensional protein structure of several adjacent elements of secondary structure that is smaller than a protein domain or a subunit. Supersecondary structures can act as nucleations in the process of protein folding. Examples include β-hairpins,...
".
Tertiary structure
Tertiary structureIn biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...
refers to three-dimensional structure of a single protein molecule. The alpha-helices and beta-sheets are folded into a compact globule. The folding is driven by the
non-specific hydrophobic interactions (the burial of hydrophobic residues from water), but the structure is stable only when the parts of a protein domain are locked into place by
specific tertiary interactions, such as
salt bridgeA salt bridge, in chemistry, is a laboratory device used to connect the oxidation and reduction half-cells of a galvanic cell , a type of electrochemical cell...
s, hydrogen bonds, and the tight packing of side chains and
disulfide bondIn chemistry, a disulfide bond is a covalent bond, usually derived by the coupling of two thiol groups. The linkage is also called an SS-bond or disulfide bridge. The overall connectivity is therefore R-S-S-R. The terminology is widely used in biochemistry...
s. The disulfide bonds are extremely rare in cytosolic proteins, since the cytosol is generally a reducing environment.
Quaternary structure
Quaternary structure is a larger assembly of several protein molecules or polypeptide chains, usually called
subunitsIn structural biology, a protein subunit or subunit protein is a single protein molecule that assembles with other protein molecules to form a protein complex: a multimeric or oligomeric protein. Many naturally occurring proteins and enzymes are multimeric...
in this context. The quaternary structure is stabilized by the same non-covalent interactions and
disulfide bondIn chemistry, a disulfide bond is a covalent bond, usually derived by the coupling of two thiol groups. The linkage is also called an SS-bond or disulfide bridge. The overall connectivity is therefore R-S-S-R. The terminology is widely used in biochemistry...
s as the tertiary structure. Complexes of two or more polypeptides (i.e. multiple subunits) are called multimers. Specifically it would be called a dimer if it contains two subunits, a trimer if it contains three subunits, and a tetramer if it contains four subunits. The subunits are frequently related to one another by
symmetry operationsThe symmetry group of an object is the group of all isometries under which it is invariant with composition as the operation...
, such as a 2-fold axis in a dimer. Multimers made up of identical subunits are referred to with a prefix of "homo-" (e.g. a homotetramer) and those made up of different subunits are referred to with a prefix of "hetero-" (e.g. a heterotetramer, such as the two alpha and two beta chains of
hemoglobinHemoglobin is the iron-containing oxygen-transport metalloprotein in the red blood cells of all vertebrates, with the exception of the fish family Channichthyidae, as well as the tissues of some invertebrates...
).
Domains, motifs, and folds in protein structure
Protein are frequently described as consisting from several structural units.
- A structural domain is an element of the protein's overall structure that is self-stabilizing and often folds
Protein folding is the process by which a protein structure assumes its functional shape or conformation. It is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil....
independently of the rest of the protein chain. Many domains are not unique to the protein products of one geneA gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
or one gene familyA gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions...
but instead appear in a variety of proteins. Domains often are named and singled out because they figure prominently in the biological function of the protein they belong to; for example, the "calcium-binding domain of calmodulinCalmodulin is a calcium-binding protein expressed in all eukaryotic cells...
". Because they are independently stable, domains can be "swapped" by genetic engineeringGenetic engineering, also called genetic modification, is the direct human manipulation of an organism's genome using modern DNA technology. It involves the introduction of foreign DNA or synthetic genes into the organism of interest...
between one protein and another to make chimeras.
- The structural
In a chain-like biological molecule, such as a protein or nucleic acid, a structural motif is a supersecondary structure, which appears also in a variety of other molecules...
and sequenceIn genetics, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has, or is conjectured to have, a biological significance...
motifs refer to short segments of protein three-dimensional structure or amino acid sequence that were found in a large number of different proteins.
- The supersecondary structure
A supersecondary structure is a compact three-dimensional protein structure of several adjacent elements of secondary structure that is smaller than a protein domain or a subunit. Supersecondary structures can act as nucleations in the process of protein folding. Examples include β-hairpins,...
refers to a specific combination of secondary structureIn biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...
elements, such as beta-alpha-beta units or helix-turn-helixIn proteins, the helix-turn-helix is a major structural motif capable of binding DNA. It is composed of two α helices joined by a short strand of amino acids and is found in many proteins that regulate gene expression...
motif. Some of them may be also referred to as structural motifs.
- Protein fold refers to the general protein architecture, like helix bundle
A helix bundle is a small protein fold composed of several alpha helices that are usually nearly parallel or antiparallel to each other.-Three-helix bundles:Three-helix bundles are among the smallest and fastest known cooperatively folding structural domains...
, beta-barrel, Rossman fold or different "folds" provided in the Structural Classification of ProteinsThe Structural Classification of Proteins database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins...
database.
Despite the fact that there are about 100,000 different proteins expressed in
eukaryoticA eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...
systems, there are many fewer different domains, structural motifs and folds. This is partly a consequence of
evolutionEvolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...
, since genes or parts of genes can be doubled or moved around within the genome. This means that, for example, a protein domain might be moved from one protein to another thus giving the protein a new function. Because of these mechanisms, pathways and mechanisms tend to be reused in several different proteins.
Protein folding
An unfolded polypeptide
foldsProtein folding is the process by which a protein structure assumes its functional shape or conformation. It is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil....
into its characteristic three-dimensional structure from
random coilA random coil is a polymer conformation where the monomer subunits are oriented randomly while still being bonded to adjacent units. It is not one specific shape, but a statistical distribution of shapes for all the chains in a population of macromolecules...
.
Protein structure determination
Around 90% of the protein structures available in the
Protein Data BankThe Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....
have been determined by
X-ray crystallographyX-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and causes the beam of light to spread into many specific directions. From the angles and intensities of these diffracted beams, a crystallographer can produce a...
. This method allows one to measure the 3D density distribution of electrons in the protein (in the crystallized state) and thereby infer the 3D coordinates of all the atoms to be determined to a certain resolution. Roughly 9% of the known protein structures have been obtained by Nuclear Magnetic Resonance techniques. The secondary structure composition can be determined via
circular dichroismCircular dichroism refers to the differential absorption of left and right circularly polarized light. This phenomenon was discovered by Jean-Baptiste Biot, Augustin Fresnel, and Aimé Cotton in the first half of the 19th century. It is exhibited in the absorption bands of optically active chiral...
or
dual polarisation interferometryDual polarization interferometry is an analytical technique that can probe molecular scale layers adsorbed to the surface of a waveguide by using the evanescent wave of a laser beam confined to the waveguide...
.
Cryo-electron microscopyCryo-electron microscopy , or electron cryomicroscopy, is a form of transmission electron microscopy where the sample is studied at cryogenic temperatures...
has recently become a means of determining protein structures to high resolution (less than 5 angstroms or 0.5 nanometer) and is anticipated to increase in power as a tool for high resolution work in the next decade. This technique is still a valuable resource for researchers working with very large protein complexes such as virus coat proteins and amyloid fibers.
Structure classification
Protein structures can be grouped based on their similarity or a common evolutionary origin.
SCOPThe Structural Classification of Proteins database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine the evolutionary relationship between proteins...
and
CATHThe CATH Protein Structure Classification is a semi-automatic, hierarchical classification of protein domains published in 1997 by Christine Orengo, Janet Thornton and their colleagues....
databases provide two different structural classifications of proteins.
Computational prediction of protein structure
The generation of a protein sequence is much easier than the determination of a protein structure. However, the structure of a protein gives much more insight in the function of the protein than its sequence. Therefore, a number of methods for the computational prediction of protein structure from its sequence have been developed.
Ab initio prediction methods use just the sequence of the protein.
ThreadingProtein threading, also known as fold recognition, is a method of protein modeling which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure.It differs from the homology modeling method of structure...
and
Homology Modeling Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein...
methods can build a 3D model for a protein of unknown structure from experimental structures of evolutionary related proteins.
See also
- distance geometry
Distance geometry is the characterization and study of sets of points based only on given values of the distances between member pairs. Therefore distance geometry has immediate relevance where distance values are determined or considered, such as in surveying, cartography and...
- Protein design
Protein design is the design of new protein molecules, either from scratch or by making calculated variations on a known structure. The use of rational design techniques for proteins is a major aspect of protein engineering....
- Protein dynamics
- Protein structure database
Wikis
- PDBWiki — A discussion forum for macromolecular structures (see PDBWiki
PDBWiki is a wiki that functions as a user-contributed database of protein structure annotations, listing all the protein structures currently available in the Protein Data Bank...
)
- Proteopedia — Annotation of protein structures and other biomolecules
- TOPSAN — Annotation of protein structures in Structural genomics
Structural genomics seeks to describe the 3-dimensional structure of every protein encoded by a given genome. This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling approaches...
Servers
- SSS Database — super-secondary structure protein database
- SPROUTS (Structural Prediction for pRotein fOlding UTility System)