All Topics  
Protein structure

 

   Email Print
   Bookmark   Link






 

Protein structure



 
 
Proteins are an important class of biological macromolecules present in all biological organisms, made up of such elements
Chemical element

A chemical element is a type of atom that is distinguished by its atomic number; that is, by the number of protons in its atomic nucleus. The term is also used to refer to a pure chemical Chemical substance composed of atoms with the same number of protons....
 as carbon
Carbon

Carbon is a chemical element with chemical symbol C and atomic number 6. As a member of group 14 on the periodic table, it is nonmetallic and tetravalence?making four electrons available to form covalent bond chemical bonds....
,hydrogen
Hydrogen

Hydrogen is the chemical element with atomic number 1. It is represented by the chemical symbol H. At standard temperature and pressure, hydrogen is a colorless, odorless, nonmetallic, tasteless, highly combustion and explosive Diatomic molecule gas with the molecular formula H2....
, nitrogen
Nitrogen

Nitrogen is a chemical element that has the symbol N and atomic number 7 and atomic mass 14.00674?. Elemental nitrogen is a colorless, odorless, tasteless and mostly inert diatomic gas at standard conditions, constituting 78% by volume of Earth's atmosphere....
, oxygen
Oxygen

Oxygen no O2 produced; 2) O2 produced, but absorbed in oceans & seabed rock; 3) O2 starts to gas out of the oceans, but is absorbed by land surfaces and formation of ozone layer; 4-5) O2 sinks filled and the gas accumulates]]...
, and sulphur. All proteins are polymers of amino acid
Amino acid

In chemistry, an amino acid is a molecule containing both amine and carboxyl functional groups. These molecules are particularly important in biochemistry, where this term refers to alpha-amino acids with the general formula H2NCHRCOOH, where R is an organic substituent....
s. The polymers, also known as polypeptides consist of a sequence of 20 different L-a-amino acids, also referred to as residues. For chains under 40 residues the term peptide
Peptide

Peptides are short polymers formed from the linking, in a defined order, of a-amino acids. The link between one amino acid residue and the next is known as an amide chemical bond or a peptide bond....
 is frequently used instead of protein. To be able to perform their biological function, proteins fold into one, or more, specific spatial conformations, driven by a number of noncovalent interactions such as hydrogen bonding, ionic interactions, Van der Waals forces and hydrophobic packing.






Discussion
Ask a question about 'Protein structure'
Start a new discussion about 'Protein structure'
Answer questions from other users
Full Discussion Forum



Recent Posts









Encyclopedia


Proteins are an important class of biological macromolecules present in all biological organisms, made up of such elements
Chemical element

A chemical element is a type of atom that is distinguished by its atomic number; that is, by the number of protons in its atomic nucleus. The term is also used to refer to a pure chemical Chemical substance composed of atoms with the same number of protons....
 as carbon
Carbon

Carbon is a chemical element with chemical symbol C and atomic number 6. As a member of group 14 on the periodic table, it is nonmetallic and tetravalence?making four electrons available to form covalent bond chemical bonds....
,hydrogen
Hydrogen

Hydrogen is the chemical element with atomic number 1. It is represented by the chemical symbol H. At standard temperature and pressure, hydrogen is a colorless, odorless, nonmetallic, tasteless, highly combustion and explosive Diatomic molecule gas with the molecular formula H2....
, nitrogen
Nitrogen

Nitrogen is a chemical element that has the symbol N and atomic number 7 and atomic mass 14.00674?. Elemental nitrogen is a colorless, odorless, tasteless and mostly inert diatomic gas at standard conditions, constituting 78% by volume of Earth's atmosphere....
, oxygen
Oxygen

Oxygen no O2 produced; 2) O2 produced, but absorbed in oceans & seabed rock; 3) O2 starts to gas out of the oceans, but is absorbed by land surfaces and formation of ozone layer; 4-5) O2 sinks filled and the gas accumulates]]...
, and sulphur. All proteins are polymers of amino acid
Amino acid

In chemistry, an amino acid is a molecule containing both amine and carboxyl functional groups. These molecules are particularly important in biochemistry, where this term refers to alpha-amino acids with the general formula H2NCHRCOOH, where R is an organic substituent....
s. The polymers, also known as polypeptides consist of a sequence of 20 different L-a-amino acids, also referred to as residues. For chains under 40 residues the term peptide
Peptide

Peptides are short polymers formed from the linking, in a defined order, of a-amino acids. The link between one amino acid residue and the next is known as an amide chemical bond or a peptide bond....
 is frequently used instead of protein. To be able to perform their biological function, proteins fold into one, or more, specific spatial conformations, driven by a number of noncovalent interactions such as hydrogen bonding, ionic interactions, Van der Waals forces and hydrophobic packing. In order to understand the functions of proteins at a molecular level, it is often necessary to determine the three dimensional structure of proteins. This is the topic of the scientific field of structural biology
Structural biology

Structural biology is a branch of molecular biology, biochemistry, and biophysics concerned with the molecular structure of biological macromolecules, especially proteins and nucleic acids, how they acquire the structures they have, and how alterations in their structures affect their function....
, that employs techniques such as X-ray crystallography
X-ray crystallography

X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and scatters into many different directions....
 or NMR spectroscopy, to determine the structure of proteins.

A number of residues are necessary to perform a particular biochemical
Biochemistry

Biochemistry is the study of the chemistry processes in living organisms. It deals with the structure and function of cellular components such as proteins, carbohydrates, lipids, nucleic acids and other biomolecules....
 function, and around 40-50 residues appears to be the lower limit for a functional domain
Protein domain

A protein domain is a part of protein sequence and tertiary structure that can biological evolution, function, and exist independently of the rest of the protein chain....
 size. Protein sizes range from this lower limit to several thousand residues in multi-functional or structural proteins. However, the current estimate for the average protein length is around 300 residues. Very large aggregates can be formed from protein subunit
Protein subunit

In structural biology, a protein subunit or subunit protein is a single protein molecule that assembles with other protein molecules to form a protein complex: a multimeric or oligomeric protein....
s, for example many thousand actin
Actin

Actin is a Globular_protein, roughly 42-kDa protein found in all Eukaryote where it may be present at concentrations of over 100 ?M. It is also one of the most highly-Conservation proteins, differing by no more than 20% in species as diverse as algae and humans....
 molecules assemble into a microfilament.

Levels of protein structure

Protein Structure
Biochemistry refers to four distinct aspects of a protein's structure:
  • Primary structure - the amino acid sequence of the peptide chains.
  • Secondary structure - highly regular sub-structures (alpha helix
    Alpha helix

    A common motif in the secondary structure of proteins, the alpha helix is a right- or left-handed coiled conformation, resembling a spring , in which every backbone amino group donates a hydrogen bond to the backbone carbonyl group of the amino acid four residues earlier ....
     and strands of beta sheet
    Beta sheet

    The ? sheet is the second form of regular secondary structure in proteins consisting of beta strands connected laterally by three or more hydrogen bonds, forming a generally twisted, pleated sheet ....
    ) which are locally defined, meaning that there can be many different secondary motifs present in one single protein molecule.
  • Tertiary structure - three-dimensional structure of a single protein molecule; a spatial arrangement of the secondary structures. It also describes the completely folded and compacted polypeptide chain.
  • Quaternary structure - complex of several protein molecules or polypeptide chains, usually called protein subunits in this context, which function as part of the larger assembly or protein complex.


In addition to these levels of structure, a protein may shift between several similar structures in performing its biological function. This process is also reversible. In the context of these functional rearrangements, these tertiary or quaternary structures are usually referred to as chemical conformation, and transitions between them are called conformational changes.

The primary structure is held together by covalent
Covalent bond

A covalent bond is a form of chemical bonding that is characterized by the sharing of pairs of electrons between atoms, or between atoms and other covalent bonds....
 or peptide bond
Peptide bond

A peptide bond is a chemical bond formed between two molecules when the carboxyl group of one molecule reacts with the amine group of the other molecule, thereby releasing a molecule of water ....
s, which are made during the process of protein biosynthesis
Protein biosynthesis

Protein synthesis is the process in which cell build proteins. The term is sometimes used to refer only to protein translation but more often it refers to a multi-step process, beginning with amino acid synthesis and transcription which are then used for translation ....
 or translation. These peptide bonds provide rigidity to the protein. The two ends of the amino acid chain are referred to as the C-terminal end or carboxyl terminus (C-terminus) and the N-terminal end or amino terminus (N-terminus) based on the nature of the free group on each extremity.

The various types of secondary structure are defined by their patterns of hydrogen bonds
DSSP (protein)

In protein structure, the DSSP algorithm is the standard method for assigning secondary structure to the amino acids of a protein, given the atomic-resolution coordinates of the protein....
 between the main-chain peptide groups. However, these hydrogen bonds are generally not stable by themselves, since the water-amide hydrogen bond is generally more favorable than the amide-amide hydrogen bond. Thus, secondary structure is stable only when the local concentration of water is sufficiently low, e.g., in the molten globule
Molten globule

A molten globule is a stable, partially folded protein state found in mildly denaturation conditions such as low pH , mild denaturant, or high temperature....
 or fully folded
Protein folding

Protein folding is the physical process by which a polypeptide folds into its characteristic and functional protein structure.Each protein begins as a polypeptide, translated from a sequence of mRNA as a linear chain of amino acids....
 states.

Similarly, the formation of molten globules and tertiary structure is driven mainly by structurally non-specific interactions, such as the rough propensities of the amino acids and hydrophobic interactions. However, the tertiary structure is fixed only when the parts of a protein domain are locked into place by structurally specific interactions, such as ionic interactions (salt bridges), hydrogen bonds and the tight packing of side chains. The tertiary structure of extracellular proteins can also be stabilized by disulfide bond
Disulfide bond

In chemistry, a disulfide bond is a single covalent bond derived from the coupling of thiol groups. The linkage is also called an SS-bond or disulfide bridge....
s, which reduce the entropy of the unfolded state; disulfide bonds are extremely rare in cytosolic proteins, since the cytosol is generally a reducing environment.

Structure of the amino acids

An a-amino acid consists of a part that is present in all the amino acid types, and a side chain that is unique to each type of residue. The Ca atom is bound to 4 different atoms: a hydrogen atom (the H is omitted in the diagram), an amino group nitrogen, a carboxyl group carbon, and a side chain carbon specific for this type of amino acid. An exception from this rule is proline
Proline

Proline is an a-amino acid, one of the twenty DNA-encoded amino acids. Its codons are CCU, CCC, CCA, and CCG. It is not an essential amino acid, which means that humans can synthesize it....
, where the hydrogen atom is replaced by a bond to the side chain. Because the carbon atom is bound to four different groups it is chiral
Chirality (chemistry)

The term chiral is used to describe an object that is non-Superposition on its mirror image.Human hands are perhaps the most universally recognized example of chirality: The left hand is a non-superposable mirror image of the right hand; no matter how the two hands are oriented, it is impossible for all the major features of both hands...
, however only one of the isomer
Isomer

In chemistry, isomers are compounds with the same molecular formula but different structural formulae. Isomers do not necessarily share similar properties unless they also have the same functional groups....
s occur in biological proteins. Glycine however, is not chiral since its side chain is a hydrogen atom. A simple mnemonic
Mnemonic

A mnemonic device is a memory aid. Commonly met mnemonics are often verbal, something such as a very short poem or a special word used to help a person remember something, particularly lists, but may be visual, kinesthetic or auditory....
 for correct L-form is "CORN": when the Ca atom is viewed with the H in front, the residues read "CO-R-N" in a clockwise direction. The side chain determines the chemical properties of the a-amino acid and may be any one of the 20 different side chains:

Primary structure of proteins


The primary structure of peptides and proteins refers to the linear number and order of the amino acids present. The convention for the designation of the order of amino acids is that the N-terminal end (i.e. the end bearing the residue with the free a-amino group) is to the left (and the number 1 amino acid) and the C-terminal end (i.e. the end with the residue containing a free a-carboxyl group) is to the right.

The proposal that proteins were linear chains of a-amino acids was made nearly simultaneously by two scientists at the same conference in 1902, the 74th meeting of the Society of German Scientists and Physicians, held in Karlsbad. Franz Hofmeister made the proposal in the morning, based on his observations of the biuret reaction in proteins. Hofmeister was followed a few hours later by Emil Fischer, who had amased a wealth of chemical details supporting the peptide-bond model. For completeness, the proposal that proteins contained amide linkages was made as early as 1882 by the French chemist E. Grimaux.

Despite these data and later evidence that proteolytically digested proteins yielded only oligopeptides, the idea that proteins were linear, unbranched polymers of amino acids was not accepted immediately. Some well-respected scientists such as William Astbury doubted that covalent bonds were strong enough to hold such long molecules together; they feared that thermal agitations would shake such long molecules asunder. Hermann Staudinger faced similar prejudices in the 1920s when he argued that rubber was composed of macromolecules.

Thus, several alternative hypotheses arose. The colloidal protein hypothesis stated that proteins were colloidal assemblies of smaller molecules. This hypothesis was disproven in the 1920s by ultracentrifugation measurements by The Svedberg that showed that proteins had a well-defined, reproducible molecular weight and by electrophoretic measurements by Arne Tiselius that indicated that proteins were single molecules. A second hypothesis, the cyclol hypothesis advanced by Dorothy Wrinch, proposed that the linear polypeptide underwent a chemical cyclol rearrangement C=O + HN C(OH)-N that crosslinked its backbone amide groups, forming a two-dimensional fabric. Other primary structures of proteins were proposed by various researchers, such as the diketopiperazine model of Emil Abderhalden and the pyrrol/piperidine model of Troensegaard in 1942. Although never given much credence, these alternative models were finally disproven when Frederick Sanger successfully sequenced insulin and by the crystallographic determination of myoglobin and hemoglobin by Max Perutz and John Kendrew.

The primary structure of a biological polymer to a large extent determines the three-dimensional shape known as the tertiary structure, but nucleic acid and protein folding are so complex that knowing the primary structure often doesn't help either to deduce the shape or to predict localized secondary structure, such as the formation of loops or helices. However, knowing the structure of a similar homologous sequence (for example a member of the same protein family) can unambiguously identify the tertiary structure of the given sequence. Sequence families are often determined by sequence clustering, and structural genomics projects aim to produce a set of representative structures to cover the sequence space of possible non-redundant sequences.

Secondary structure in proteins

The ordered array of amino acids in a protein confer regular conformational forms upon that protein. These conformations constitute the secondary structures of a protein. In general proteins fold into two broad classes of structure termed, globular proteins and fibrous proteins. Globular proteins are compactly folded and coiled, whereas, fibrous proteins are more filamentous or elongated. It is the partial double-bond character of the peptide bond that defines the conformations a polypeptide chain may assume. Within a single protein different regions of the polypeptide chain may assume different conformations determined by the primary sequence of the amino acids.

The a-Helix
The a-helix is a common secondary structure encountered in proteins of the globular class. The formation of the a-helix is spontaneous and is stabilized by H-bonding between amide nitrogens and carbonyl carbons of peptide bonds spaced four residues apart. This orientation of H-bonding produces a helical coiling of the peptide backbone such that the R-groups lie on the exterior of the helix and perpendicular to its axis.

Not all amino acids favor the formation of the (a-helix due to steric constraints of the R-groups. Amino acids such as A, D, E, I, L and M favor the formation of a-helices, whereas, G and P favor disruption of the helix. This is particularly true for P since it is a pyrrolidine based imino acid (HN=) whose structure significantly restricts movement about the peptide bond in which it is present, thereby, interfering with extension of the helix. The disruption of the helix is important as it introduces additional folding of the polypeptide backbone to allow the formation of globular proteins.

ß-sheets
Whereas an a-helix is composed of a single linear array of helically disposed amino acids, ß-sheets are composed of 2 or more different regions of stretches of at least 5-10 amino acids. The folding and alignment of stretches of the polypeptide backbone aside one another to form ß-sheets is stabilized by H-bonding between amide nitrogens and carbonyl carbons. However, the H-bonding residues are present in adjacently opposed stretches of the polypetide backbone as opposed to a linearly contiguous region of the backbone in the a-helix. ß-sheets are said to be pleated. This is due to positioning of the a-carbons of the peptide bond which alternates above and below the plane of the sheet. ß-sheets are either parallel or antiparallel. In parallel sheets adjacent peptide chains proceed in the same direction (i.e. the direction of N-terminal to C-terminal ends is the same), whereas, in antiparallel sheets adjacent chains are aligned in opposite directions. ß-sheets can be depicted in ball and stick format or as ribbons in certain protein formats.

Ball and Stick Representation of a ß-SheetRibbon Depiction of ß-Sheet

Super-Secondary Structure

Many proteins contain an ordered organization of several adjacent elements of secondary structures that form distinct, commonly observed structural motifs larger than individual secondary structures but smaller than domains or subunits. They are often hypothesized to act as early steps in the process of protein folding
Protein folding

Protein folding is the physical process by which a polypeptide folds into its characteristic and functional protein structure.Each protein begins as a polypeptide, translated from a sequence of mRNA as a linear chain of amino acids....
. Examples include ß-hairpins, helix hairpins, right-handed ß-a-ß loops, and the helix-turn-helix motifs of bacterial proteins that regulate transcription.

Tertiary Structure of Proteins

Tertiary structure refers to the complete three-dimensional structure of the polypeptide units of a given protein. Included in this description is the spatial relationship of different secondary structures to one another within a polypeptide chain and how these secondary structures themselves fold into the three-dimensional form of the protein. Secondary structures of proteins often constitute distinct domains. Therefore, tertiary structure also describes the relationship of different domains to one another within a protein. The interactions of different domains is governed by several forces: These include hydrogen bonding, hydrophobic interactions, electrostatic interactions, van der Waals forces and covalent bonding with use of disulfide bridges.

Quaternary Structure

Many proteins contain 2 or more different polypeptide chains that are held in association by the same non-covalent forces that stabilize the tertiary structures of proteins. Proteins with multiple polypetide chains are oligomeric proteins. The structure formed by monomer-monomer interaction in an oligomeric protein is known as quaternary structure.

Oligomeric proteins can be composed of multiple identical polypeptide chains or multiple distinct polypeptide chains. Proteins with identical subunits are termed homo-oligomers. Proteins containing several distinct polypeptide chains are termed hetero-oligomers.

Hemoglobin, the oxygen carrying protein of the blood, contains two a and two ß subunits arranged with a quaternary structure in the form, a2ß2. Hemoglobin is, therefore, a hetero-oligomeric protein(I. Shahid et al., 2008).

Forces Controlling Protein Structure


Hydrogen Bonding

Polypeptides contain numerous proton donors and acceptors both in their backbone and in the R-groups of the amino acids. The environment in which proteins are found also contains ample H-bond donors and acceptors of the water molecule. H-bonding, therefore, occurs not only within and between polypeptide chains but with the surrounding aqueous medium.

Hydrophobic Forces

Proteins are composed of amino acids that contain either hydrophilic or hydrophobic R-groups. It is the nature of the interaction of the different R-groups with the aqueous environment that plays the major role in shaping protein structure. The spontaneous folded state of globular proteins is a reflection of a balance between the opposing energetics of H-bonding between hydrophilic R-groups and the aqueous environment and the repulsion from the aqueous environment by the hydrophobic R-groups. The hydrophobicity of certain amino acid R-groups tends to drive them away from the exterior of proteins and into the interior. This driving force restricts the available conformations into which a protein may fold.

Electrostatic Forces

Electrostatic forces are mainly of three types; charge-charge, charge-dipole and dipole-dipole. Typical charge-charge interactions that favor protein folding are those between oppositely charged R-groups such as K or R and D or E. A substantial component of the energy involved in protein folding is charge-dipole interactions. This refers to the interaction of ionized R-groups of amino acids with the dipole of the water molecule. The slight dipole moment that exist in the polar R-groups of amino acid also influences their interaction with water. It is, therefore, understandable that the majority of the amino acids found on the exterior surfaces of globular proteins contain charged or polar R-groups.

van der Waals Forces

There are both attractive and repulsive van der Waals forces that control protein folding. Attractive van der Waals forces involve the interactions among induced dipoles that arise from fluctuations in the charge densities that occur between adjacent uncharged non-bonded atoms. Repulsive van der Waals forces involve the interactions that occur when uncharged non-bonded atoms come very close together but do not induce dipoles. The repulsion is the result of the electron-electron repulsion that occurs as two clouds of electrons begin to overlap. Although van der Waals forces are extremely weak, relative to other forces governing conformation, it is the huge number of such interactions that occur in large protein molecules that make them significant to the folding of proteins.

Complex Protein Structures

Proteins also are found to be covalently conjugated with carbohydrates. These modifications occur following the synthesis (translation) of proteins and are, therefore, termed post-translational modifications. These forms of modification impart specialized functions upon the resultant proteins. Proteins covalently associated with carbohydrates are termed glycoproteins. Glycoproteins are of two classes, N-linked and O-linked, referring to the site of covalent attachment of the sugar moieties. N-linked sugars are attached to the amide nitrogen of the R-group of asparagine; O-linked sugars are attached to the hydroxyl groups of either serine or threonine and occasionally to the hydroxyl group of the modified amino acid, hydroxylysine.

There are extremely important glycoproteins found on the surface of erythrocytes. It is the variability in the composition of the carbohydrate portions of many glycoproteins and glycolipids of erythrocytes that determines blood group specificities. There are at least 100 blood group determinants, most of which are due to carbohydrate differences. The most common blood groups, A, B, and O, are specified by the activity of specific gene products whose activities are to incorporate distinct sugar groups onto RBC membrane glycoshpingolipids as well as secreted glycoproteins.

Structural complexes involving protein associated with lipid via noncovalent interactions are termed lipoproteins. The distinct roles of lipoproteins are described on the linked page. Their major function in the body is to aid in the storage transport of lipid and cholesterol.

Amino-Terminal Sequence Determination

Prior to sequencing peptides it is necessary to eliminate disulfide bonds within peptides and between peptides. Several different chemical reactions can be used in order to permit separation of peptide strands and prevent protein conformations that are dependent upon disulfide bonds. The most common treatments are to use either 2-mercaptoethanol or dithiothreitol (DTT). Both of these chemicals reduce disulfide bonds. To prevent reformation of the disulfide bonds the peptides are treated with iodoacetic acid in order to alkylate the free sulfhydryls.

There are three major chemical techniques for sequencing peptides and proteins from the N-terminus. These are the Sanger, Dansyl chloride and Edman techniques. Sanger's Reagent: This sequencing technique utilizes the compound, 2,4-dinitrofluorobenzene (DNF) which reacts with the N-terminal residue under alkaline conditions. The derivatized amino acid can be hydrolyzed and will be labeled with a dinitrobenzene group that imparts a yellow color to the amino acid. Separation of the modified amino acids (DNP-derivative) by electrophoresis and comparison with the migration of DNP-derivative standards allows for the identification of the N-terminal amino acid.

Dansyl chloride: Like DNF, dansyl chloride reacts with the N-terminal residue under alkaline conditions. Analysis of the modified amino acids is carried out similarly to the Sanger method except that the dansylated amino acids are detected by fluorescence. This imparts a higher sensitivity into this technique over that of the Sanger method.

Edman degradation: The utility of the Edman degradation technique is that it allows for additional amino acid sequence to be obtained from the N-terminus inward. Using this method it is possible to obtain the entire sequence of peptides. This method utilizes phenylisothiocyanate to react with the N-terminal residue under alkaline conditions. The resultant phenylthiocarbamyl derivatized amino acid is hydrolyzed in anhydrous acid. The hydrolysis reaction results in a rearrangement of the released N-terminal residue to a phenylthiohydantoin derivative. As in the Sanger and Dansyl chloride methods, the N-terminal residue is tagged with an identifiable marker, however, the added advantage of the Edman process is that the remainder of the peptide is intact. The entire sequence of reactions can be repeated over and over to obtain the sequences of the peptide. This process has subsequently been automated to allow rapid and efficient sequencing of even extremely small quantities of peptide.
Name  (Residue) 3-letter
code
Single
code
Relative
abundance
(%) E.C.
MW pK VdW volume
(ų)
Charged,
Polar,
Hydrophobic,
Neutral
Alanine
Alanine

Alanine is an a-amino acid with the chemical formula CH3CHCOOH. The L-isomer is one of the 20 proteinogenic amino acids, i.e. the building blocks of proteins....
ALA A 13.0 71   67 H
Arginine
Arginine

Arginine is an a-amino acid. The Optical isomerism is one of the 20 most common natural amino acids. Its codons are CGU, CGC, CGA, CGG, AGA, and AGG....
ARG R 5.3 157 12.5 148 C+
Asparagine
Asparagine

Asparagine is one of the 20 most common natural amino acids on Earth. It has carboxamide as the side chain's functional group. It is not an essential amino acid....
ASN N 9.9 114   96 P
Aspartate ASP D 9.9 114 3.9 91 C-
Cysteine
Cysteine

Cysteine is an a-amino acid with the chemical formula HO2CCHCH2SH. It is a non-essential amino acid, which means that humans can synthesize it....
CYS C 1.8 103   86 P
Glutamate GLU E 10.8 128 4.3 109 C-
Glutamine
Glutamine

Glutamine is one of the 20 amino acids encoded by the standard genetic code. Its side chain is an amide formed by replacing the side-chain hydroxyl of glutamic acid with an amine functional group....
GLN Q 10.8 128   114 P
Glycine
Glycine

Glycine is the organic compound with the chemical formula NH2CH2COOH. It is the smallest of the 20 amino acids commonly found in proteins, coded by codons GGU, GGC, GGA and GGG....
GLY G 7.8 57   48 N
Histidine
Histidine

Histidine is one of the 20 standard amino acids present in proteins. In the nutritional sense, in humans, histidine is considered an essential amino acid, but only in children....
HIS H 0.7 137 6.0 118 P,C+
Isoleucine
Isoleucine

Isoleucine is an a-amino acid with the chemical formula HO2CCHCHCH2CH3. It is an essential amino acid, which means that humans cannot synthesize it, so it must be part of our diet....
ILE I 4.4 113   124 H
Leucine
Leucine

Leucine is an a-amino acid with the chemical formula HO2CCHCH2CH2. It is an essential amino acid, which means that humans cannot synthesise it....
LEU L 7.8 113   124 H
Lysine
Lysine

Lysine is an a-amino acid with the chemical formula HO2CCH4NH2. This amino acid is an essential amino acid, which means that humans cannot synthesize it....
LYS K 7.0 129 10.5 135 C+
Methionine
Methionine

Methionine is an a-amino acid with the chemical formula HO2CCHCH2CH2SCH3. This Essential amino acid is classified as nonpolar....
MET M 3.8 131   124 H
Phenylalanine
Phenylalanine

Phenylalanine is an a-amino acid with the chemical formula HO2CCHCH2C6H5, which is found naturally in the breast milk of mammals and manufactured for food and drink products and are also sold as nutritional supplements for their reputed analgesic and antidepressant effects....
PHE F 3.3 147   135 H
Proline
Proline

Proline is an a-amino acid, one of the twenty DNA-encoded amino acids. Its codons are CCU, CCC, CCA, and CCG. It is not an essential amino acid, which means that humans can synthesize it....
PRO P 4.6 97   90 H
Serine
Serine

Serine is an organic compound with the chemical formula hydrogenoxygen2carbonCHCH2OH....
SER S 6.0 87   73 P
Threonine
Threonine

Threonine is an a-amino acid with the chemical formula HO2CCHCHCH3. Its codons are ACU, ACA, ACC, and ACG. This essential amino acid is classified as Chemical polarity....
THR T 4.6 101   93 P
Tryptophan
Tryptophan

Tryptophan is one of the 20 List of standard amino acids, as well as an essential amino acid in the human diet. It is encoded in the standard genetic code as the codon UGG....
TRP W 1.0 186   163 P
Tyrosine
Tyrosine

Tyrosine or 4-hydroxyphenylalanine, is one of the 20 amino acids that are used by cell to protein biosynthesis proteins. This is a non-essential amino acid and it is found in casein....
TYR Y 2.2 163 10.1 141 P
Valine
Valine

Valine is an a-amino acid with the chemical formula HO2CCHCH2. L-Valine is one of 20 proteogenic amino acids....
VAL V 6.0 99   105 H


The 20 naturally occurring amino acids can be divided into several groups based on their chemical proporties. Important factors are charge, hydrophobicity/hydrophilicity, size and functional groups. The nature of the interaction of the different side chains with the aqueous environment plays a major role in molding protein structure. Hydrophobic side chains tends to be buried in the middle of the protein, whereas hydrophilic side chains are exposed to the solvent.

Examples of hydrophobic residues are: Leucine, isoleucine, phenylalanine, and valine, and to a lesser extent tyrosine, alanine and tryptophan. The charge of the side chains plays an important role in protein structures, since ion bonding can stabilize proteins structures, and an unpaired charge in the middle of a protein can disrupt structures. Charged residues are strongly hydrophilic, and are usually found on the out side of proteins. Positively charged side chains are found in lysine and arginine, and in some cases in histidine. Negative charges are found in glutamate and aspartate. The rest of the amino acids have smaller generally hydrophilic side chains with various functional groups. Serine and threonine have hydroxylgroups, and aspargine and glutamine have amide groups. Some amino acids have special properties such as cysteine, that can form covalent disulfide bond
Disulfide bond

In chemistry, a disulfide bond is a single covalent bond derived from the coupling of thiol groups. The linkage is also called an SS-bond or disulfide bridge....
s to other cysteines, proline that is cyclical, and glycine that is small, and more flexible than the other amino acids.

The peptide bond

Two amino acids can be combined in a condensation reaction
Condensation reaction

A condensation reaction is a chemical reaction in which two molecules or moieties combine to form one single molecule, together with the loss of a small molecule....
. By repeating this reaction, long chains of residues (amino acids in a peptide bond) can be generated. This reaction is catalysed
Catalysis

Catalysis is the process in which the reaction rate of a chemical reaction is either increased or decreased by means of a chemical substance known as a catalyst....
 by the ribosome
Ribosome

Ribosomes are complexes of RNA and protein that are found in all cell s. Ribosomes from bacteria, archaea and eukaryotes, the three domains of life on Earth, have significantly different structure and RNA....
 in a process known as translation. The peptide bond
Peptide bond

A peptide bond is a chemical bond formed between two molecules when the carboxyl group of one molecule reacts with the amine group of the other molecule, thereby releasing a molecule of water ....
 is in fact planar due to the delocalization of the electron
Electron

The electron is a subatomic particle that carries a negative electric charge. It has elementary particle and is believed to be a point particle....
s from the double bond. The rigid peptide dihedral angle
Dihedral angle

In geometry, the angle between two Plane s is called their dihedral or torsion angle.The dihedral angle of two planes can be seen by looking at the planes "edge on", i.e., along their line of intersection....
, ? (the bond between C1 and N) is always close to 180 degrees. The dihedral angles phi f (the bond between N and Ca) and psi ? (the bond between Ca and C1) can have a certain range of possible values. These angles are the degrees of freedom of a protein, they control the protein's three dimensional structure. They are restrained by geometry to allowed ranges typical for particular secondary structure elements, and represented in a Ramachandran plot
Ramachandran plot

A Ramachandran plot , developed by Gopalasamudram Narayana Ramachandran, is a way to visualize dihedral angles φ against ψ of amino acid residues in protein structure....
. A few important bond length
Bond length

In molecular geometry, bond length or bond distance is the average distance between nuclei of two chemical bond atoms in a molecule....
s are given in the table below.

Primary structure

The sequence of the different amino acids is called the primary structure
Primary structure

In biochemistry, the primary structure of a biological molecule is the exact specification of its atomic composition and the chemical bonds connecting those atoms ....
 of the peptide or protein. Counting of residues always starts at the N-terminal end (NH2-group), which is the end where the amino group is not involved in a peptide bond. The primary structure of a protein is determined by the gene corresponding to the protein. A specific sequence of nucleotide
Nucleotide

Nucleotides are molecules that comprise the structural units of RNA and DNA. Additionally, nucleotides play central roles in metabolism. In that capacity, they serve as sources of chemical energy , participate in cell signaling , and are incorporated into important cofactors of enzymatic reactions ....
s in DNA
DNA

Deoxyribonucleic acid is a nucleic acid that contains the genetics instructions used in the development and functioning of all known living organisms and some viruses....
 is transcribed
Transcription (genetics)

Transcription is the synthesis of RNA under the direction of DNA. RNA synthesis, or transcription, is the process of transcribing DNA nucleotide sequence information into RNA sequence information....
 into mRNA, which is read by the ribosome in a process called translation. The sequence of a protein is unique to that protein, and defines the structure and function of the protein. The sequence of a protein can be determined by methods such as Edman degradation
Edman degradation

Edman degradation, developed by Pehr Edman, is a method of Protein sequencing amino acids in a peptide. In this method, the amino-terminal residue is labeled and cleaved from the peptide without disrupting the peptide bonds between other amino acid residues....
 or tandem mass spectrometry
Mass spectrometry

Mass spectrometry is an analytical technique for the determination of the elemental composition of a sample or molecule. It is also used for elucidating the chemical structures of molecules, such as peptides and other chemical compounds....
. Often however, it is read directly from the sequence of the gene using the genetic code
Genetic code

The genetic code is the set of rules by which information encoded in genetic material is Translation into proteins by living cell s. The code defines a mapping between tri-nucleotide sequences, called codons, and amino acids....
. Post-transcriptional modifications such as disulfide formation, phosphorylations and glycosylations are usually also considered a part of the primary structure, and cannot be read from the gene.

Secondary structure

By building models of peptides using known information about bond lengths and angles, the first elements of secondary structure, the alpha helix
Alpha helix

A common motif in the secondary structure of proteins, the alpha helix is a right- or left-handed coiled conformation, resembling a spring , in which every backbone amino group donates a hydrogen bond to the backbone carbonyl group of the amino acid four residues earlier ....
 and the beta sheet
Beta sheet

The ? sheet is the second form of regular secondary structure in proteins consisting of beta strands connected laterally by three or more hydrogen bonds, forming a generally twisted, pleated sheet ....
, were suggested in 1951 by Linus Pauling
Linus Pauling

Linus Carl Pauling was an United States scientist, peace activist, author and list of educators. He was one of the most influential chemists in history and ranks among the most important scientists in any field of the 20th century....
 and coworkers. Both the alpha helix and the beta-sheet represent a way of saturating all the hydrogen bond donors and acceptors in the peptide backbone. These secondary structure elements only depend on properties that all the residues have in common, explaining why they occur frequently in most proteins. Since then other elements of secondary structure have been discovered such as various loops and other forms of helices. The part of the backbone that is not in a regular secondary structure is said to be random coil
Random coil

A random coil is a polymer conformation where the monomer subunits are oriented randomness while still being chemical bond to graph units. It is not one specific shape, but a statistics distribution of shapes for all the chains in a statistical population of macromolecules....
. Each of these two secondary structure elements have a regular geometry, meaning they are constrained to specific values of the dihedral angles ? and f. Thus they can be found in a specific region of the Ramachandran plot.

Here are some more representation of the same helix.
 


Turns, loops and a few other secondary structure elements such as a 3-10 helix complete the picture. We have now enough pieces to assemble a complete protein, displaying its typical tertiary structure.

Tertiary structure

The elements of secondary structure are usually folded into a compact shape using a variety of loops and turns. The formation of tertiary structure is usually driven by the burial of hydrophobic residues, but other interactions such as hydrogen bonding, ionic interactions and disulfide bonds can also stabilize the tertiary structure. The tertiary structure encompasses all the noncovalent interactions that are not considered secondary structure, and is what defines the overall fold of the protein, and is usually indispensable for the function of the protein.

Quaternary structure

The quaternary structure is the interaction between several chains of peptide bonds. The individual chains are called subunits. The individual subunits are usually not covalently connected, but might be connected by a disulfide bond. Not all proteins have quaternary structure, since they might be functional as monomers. The quaternary structure is stabilized by the same range of interactions as the tertiary structure. Complexes of two or more polypeptides (i.e. multiple subunits) are called multimers. Specifically it would be called a dimer if it contains two subunits, a trimer if it contains three subunits, and a tetramer if it contains four subunits. The subunits are usually related to one another by symmetry axes, such as a 2-fold axis in a dimer. Multimers made up of identical subunits may be referred to with a prefix of "homo-" (e.g. a homotetramer) and those made up of different subunits may be referred to with a prefix of "hetero-" (e.g. a heterotetramer, such as the two alpha and two beta chains of hemoglobin).

Side chain conformation

The atoms along the side chain are named with Greek letters in Greek alphabetical order: a, ß, ?, d, ? and so on. Ca refers to the carbon atom closest to the carbonyl group of that amino acid, Cß the second closest and so on. The Ca is usually considered a part of the backbone. The dihedral angles around the bonds between these atoms are named ?1, ?2, ?3 etc. E.g. the first and second carbon atom in the side chain of lysine is named a and ß, and the dihedral angle around the a-ß bond is named ?1. Side chains can be in different conformations called gauche(-), trans and gauche(+). Side chains generally tend to try to come into a staggered conformation around ?2, driven by the minimization of the overlap between the electron orbital
Electron orbital

An electron orbital may refer to:* An atomic orbital, describing the behaviour of an electron in an atom* A molecular orbital, describing the behaviour of an electron in a molecule...
s of the hydrogen atoms.

Domains, motifs, and folds in protein structure

Many proteins are organized into several units. A structural domain is an element of the protein's overall structure that is self-stabilizing and often folds
Protein folding

Protein folding is the physical process by which a polypeptide folds into its characteristic and functional protein structure.Each protein begins as a polypeptide, translated from a sequence of mRNA as a linear chain of amino acids....
 independently of the rest of the protein chain. Many domains are not unique to the protein products of one gene
Gene

A gene is the basic unit of heredity in a living organism. All living things depend on genes. Genes hold the information to build and maintain their cell and pass genetic trait to offspring....
 or one gene family
Gene family

A gene family is a set of genes with a known homology . They are generally biochemically similar. Genes are categorized this way into families, depending on shared nucleotide or protein sequences....
 but instead appear in a variety of proteins. Domains often are named and singled out because they figure prominently in the biological function of the protein they belong to; for example, the "calcium-binding domain of calmodulin
Calmodulin

Calmodulin is a calcium-binding protein expressed in all eukaryotic cells. It can bind to and regulate a number of different protein targets, thereby affecting many different cellular functions....
". Because they are self-stabilizing, domains can be "swapped" by genetic engineering
Genetic engineering

Engineering There are a number of ways through which genetic engineering is accomplished. Essentially, the process has five main steps# Isolation of the genes of interest...
 between one protein and another to make chimeras. A motif in this sense refers to a small specific combination of secondary structural elements (such as helix-turn-helix
Helix-turn-helix

In proteins, the helix-turn-helix is a major structural motif capable of binding DNA. It is composed of two alpha helix joined by a short strand of amino acids and is found in many proteins that regulate gene expression....
). These elements are often called supersecondary structures. Fold refers to a global type of arrangement, like helix bundle
Helix bundle

A helix bundle is a small protein tertiary structure composed of several alpha helix that are usually nearly parallel or antiparallel to each other....
 or beta-barrel. Structure motifs usually consist of just a few elements, e.g. the 'helix-turn-helix' has just three. Note that while the spatial sequence of elements is the same in all instances of a motif, they may be encoded in any order within the underlying gene
Gene

A gene is the basic unit of heredity in a living organism. All living things depend on genes. Genes hold the information to build and maintain their cell and pass genetic trait to offspring....
. Protein structural motifs often include loops of variable length and unspecified structure, which in effect create the "slack" necessary to bring together in space two elements that are not encoded by immediately adjacent DNA sequence
DNA sequence

A DNA sequence or genetic sequence is a succession of letters representing the primary structure of a real or hypothetical DNA molecule or strand, with the capacity to carry information as described by the central dogma of molecular biology....
s in a gene. Note also that even when two genes encode secondary structural elements of a motif in the same order, nevertheless they may specify somewhat different sequences of amino acid
Amino acid

In chemistry, an amino acid is a molecule containing both amine and carboxyl functional groups. These molecules are particularly important in biochemistry, where this term refers to alpha-amino acids with the general formula H2NCHRCOOH, where R is an organic substituent....
s. This is true not only because of the complicated relationship between tertiary and primary structure, but because the size of the elements varies from one protein and the next. Despite the fact that there are about 100,000 different proteins expressed in eukaryotic
Eukaryote

Animals, plants, fungus, and protists are eukaryotes , organisms whose Cell are organized into complex structures enclosed within Cell membrane....
 systems, there are much fewer different domains, structural motifs and folds. This is partly a consequence of evolution
Evolution

In biology, evolution is change in the heritability trait of a population of organisms from one generation to the next. These changes are caused by a combination of three main processes: variation, reproduction, and selection....
, since genes or parts of genes can be doubled or moved around within the genome. This means that, for example, a protein domain might be moved from one protein to another thus giving the protein a new function. Because of these mechanisms pathways and mechanisms tends to be reused in several different proteins.

Protein folding


The process by which the higher structures form is called protein folding and is a consequence of the primary structure. A unique polypeptide may have more than one stable folded conformation, which could have a different biological activity, but usually, only one conformation is considered to be the active, or native conformation.

Structure classification


Several methods have been developed for the structural classification of proteins. These seek to classify the data in the Protein Data Bank
Protein Data Bank

The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids. . The data, typically obtained by X-ray crystallography or Protein NMR and submitted by biologists and biochemistry from around the world, are released into the public domain, and can be accessed at no charge...
 in a structured order. Several databases exist which classify proteins using different methods. SCOP
Scop

A was an Old English language poet, the Anglo-Saxons counterpart of the Old Norse '.As far as we can tell from what has been preserved, the art of the scop was directed mostly towards epic poetry; the surviving verse in Old English consists of the epic Beowulf, religious verse in epic formats such as the Dream of the Rood, h...
, CATH
Cath

Cath may refer to:*a Catholic*Cautha, a sun god in Etruscan mythology*Catheter or catheterization*the Irish word for a battle*the Welsh word for a cat...
 and FSSP
FSSP

FSSP may refer to:*Families of structurally similar proteins, a protein structures database*Firing squad synchronization problem, a problem in computer science and cellular automata...
 are the largest ones. The methods used are purely manual, manual and automated, and purely automated. Work is being done to better integrate the current data. The classification is consistent between SCOP, CATH and FSSP for the majority of proteins which have been classified, but there are still some differences and inconsistencies.

Protein structure determination


Around 90% of the protein structures available in the Protein Data Bank
Protein Data Bank

The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids. . The data, typically obtained by X-ray crystallography or Protein NMR and submitted by biologists and biochemistry from around the world, are released into the public domain, and can be accessed at no charge...
 have been determined by X-ray crystallography
X-ray crystallography

X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and scatters into many different directions....
. This method allows one to measure the 3D density distribution of electrons in the protein (in the crystallized state) and thereby infer the 3D coordinates of all the atoms to be determined to a certain resolution. Roughly 9% of the known protein structures have been obtained by Nuclear Magnetic Resonance techniques, which can also be used to determine secondary structure. Note that aspects of the secondary structure as whole can be determined via other biochemical techniques such as circular dichroism
Circular dichroism

Circular dichroism is the differential absorption of left- and right-handed circular polarization light.A CD Spectrometer is an instrument that records this phenomenon as a function of wavelength....
. Secondary structure can also be predicted with a high degree of accuracy (see next section). Cryo-electron microscopy
Cryo-electron microscopy

Electron cryomicroscopy is a form of electron microscopy where the sample is studied at cryogenic temperatures . CryoEM is developing popularity in structural biology....
 has recently become a means of determining protein structures to high resolution (less than 5 angstroms or 0.5 nanometer) and is anticipated to increase in power as a tool for high resolution work in the next decade. This technique is still a valuable resource for researchers working with very large protein complexes such as virus coat proteins and amyloid fibers.

A rough guide to the resolution of protein structures
ResolutionMeaning
>4.0 Individual coordinates meaningless
3.0 - 4.0Fold possibly correct, but errors are very likely. Many sidechains placed with wrong rotamer.
2.5 - 3.0Fold likely correct except that some surface loops might be mismodelled. Several long, thin sidechains (lys, glu, gln, etc) and small sidechains (ser, val, thr, etc) likely to have wrong rotamers.
2.0 - 2.5As 2.5 - 3.0, but number of sidechains in wrong rotamer is considerably less. Manysmall errors can normally be detected. Fold normally correct and number of errors in surface loops is small. Water molecules and small ligands become visible.
1.5 - 2.0Few residues have wrong rotamer. Many small errors can normally be detected. Folds are extremely rarely incorrect, even in surface loops.
0.5 - 1.5In general, structures have almost no errors at this resolution. Rotamer libraries and geometry studies are made from these structures.


Computational prediction of protein structure


The generation of a protein sequence is much simpler than the generation of a protein structure. However, the structure of a protein gives much more insight in the function of the protein than its sequence. Therefore, a number of methods for the computational prediction of protein structure from its sequence have been proposed. Ab initio prediction methods use just the sequence of the protein. Threading
Threading (protein sequence)

Protein threading, also known as fold recognition, is a method of computational protein structure prediction used for protein sequences which have the same fold as proteins of known structures but do not have Homology_#Homology_of_sequences_in_genetics proteins with known Protein structure ....
 uses existing protein structures. Homology Modeling
Homology modeling

Homology modeling, also known as comparative modeling refers to constructing an atomic-resolution model of the "target" protein from its primary structure and an experimental three-dimensional structure of a related homologous protein ....
 to build a reliable 3D model for a protein of unknown structure from one or more related proteins of known structure. The recent progress and challenges in protein structure prediction was reviewed by Zhang .

Rosetta@home
Rosetta@home

Rosetta@home is a distributed computing project for protein structure prediction on the Berkeley Open Infrastructure for Network Computing platform, run by the David Baker at the University of Washington....
 is a distributed computing
Distributed computing

Distributed computing deals with hardware and software systems containing more than one processing element or Computer data storage element, Concurrent computing processes, or multiple programs, running under a loosely or tightly controlled regime....
 project which tries to predict the structures of proteins with massive sampling on thousands of home computers. Foldit
Foldit

Foldit is an experimental video game about protein folding, developed as a collaboration between the University of Washington's departments of Computer Science and Engineering and Biochemistry ....
 is a video game designed to use human pattern recognition
Pattern recognition

Pattern recognition is a sub-topic of machine learning. It is "the act of taking in raw data and taking an action based on the Category of the data"....
 and puzzle solving abilities to improve existing software.

Software

There are many available software packages, such as free web-based STING
Sting

Gordon Matthew Thomas Sumner, Order of the British Empire , almost universally known by his stage name Sting, is an England musician from Wallsend in North Tyneside....
, used to visualize and analyze protein structures. Another example is the web-server which can visualize the quality of a protein-protein alignment in 3D and be used to map sequence feature annotation such as the underlying Intron
Intron

Introns, derived from the term "intragenic regions" and also called intervening sequence , are DNA regions in a gene that are not translated into proteins....
/Exon
Exon

An exon in a gene is a DNA or RNA sequence that is translated into RNA or protein. In contrast, an intron is a DNA sequence in the gene that is not translated....
 structure onto a protein structure.

Several packages, such as Quantum Pharmaceuticals software, can be used to predict conformational changes of proteins and its influence on protein's functions.

Several methods have been developed to compare structures of different proteins. Please see structural alignment
Structural alignment

Structural alignment is a form of sequence alignment that is based on comparison of shape. These alignments attempt to establish equivalences between two or more polymer structures based on their shape and three-dimensional tertiary structure....
.

Computational tools are also frequently employed to check experimental and theoretical models of protein structures for errors (examples: , [https://flipper.services.came.sbg.ac.at/ NQ-Flipper], , , ).

Software for molecular mechanics modeling
Software for molecular mechanics modeling

This is a list of of computer programs that are predominantly used for molecular mechanics calculations.Min - Optimization,MD - Molecular Dynamics,...
 useful for building and simulation of protein models.

Further reading

(Bayesian computational methods for the structure determination from NMR data)

External links

  • super-secondary structure protein database
  • (Structural Prediction for pRotein fOlding Utility System)
  • [https://prosa.services.came.sbg.ac.at/prosa.php ProSA-web] Web service for the recognition of errors in experimentally or theoretically determined protein structures
  • [https://flipper.services.came.sbg.ac.at/ NQ-Flipper] Check for unfavorable rotamers of Asn and Gln residues in protein structures
  • That check nearly 200 aspects of protein structure, like packing, geometry, unfavourable rotamers in general of for Asn, Gln and His especially, strange water molecules, backbone conformations, atom nomenclature, symmetry parameters, etc.
  • . An interactive, fully free, course explaining many of the aspects discussed in this wiki entry.