Protein

Protein

Overview
Proteins (ˈproʊtiːnz) are biochemical compounds
Chemical compound
A chemical compound is a pure chemical substance consisting of two or more different chemical elements that can be separated into simpler substances by chemical reactions. Chemical compounds have a unique and defined chemical structure; they consist of a fixed ratio of atoms that are held together...

 consisting of one or more polypeptides typically folded into a globular
Globular protein
Globular proteins, or spheroproteins are one of the two main protein classes, comprising "globe"-like proteins that are more or less soluble in aqueous solutions...

 or fibrous
Fibrous protein
Scleroproteins, or fibrous proteins, constitute one of the three main classes of proteins, alongside globular proteins and conjugated proteins.Keratin, collagen, elastin, and fibroin are all scleroproteins...

 form, facilitating a biological function. A polypeptide is a single linear polymer
Polymer
A polymer is a large molecule composed of repeating structural units. These subunits are typically connected by covalent chemical bonds...

 chain of amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

s bonded together by peptide bond
Peptide bond
This article is about the peptide link found within biological molecules, such as proteins. A similar article for synthetic molecules is being created...

s between the carboxyl and amino groups of adjacent amino acid residues
Residue (chemistry)
In chemistry, residue is the material remaining after a distillation or an evaporation, or to a portion of a larger molecule, such as a methyl group. It may also refer to the undesired byproducts of a reaction....

. The sequence
Peptide sequence
Peptide sequence or amino acid sequence is the order in which amino acid residues, connected by peptide bonds, lie in the chain in peptides and proteins. The sequence is generally reported from the N-terminal end containing free amino group to the C-terminal end containing free carboxyl group...

 of amino acids in a protein is defined by the sequence
DNA sequence
The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...

 of a gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

, which is encoded in the genetic code
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....

.
Discussion
Ask a question about 'Protein'
Start a new discussion about 'Protein'
Answer questions from other users
Full Discussion Forum
 
Unanswered Questions
Encyclopedia
Proteins (ˈproʊtiːnz) are biochemical compounds
Chemical compound
A chemical compound is a pure chemical substance consisting of two or more different chemical elements that can be separated into simpler substances by chemical reactions. Chemical compounds have a unique and defined chemical structure; they consist of a fixed ratio of atoms that are held together...

 consisting of one or more polypeptides typically folded into a globular
Globular protein
Globular proteins, or spheroproteins are one of the two main protein classes, comprising "globe"-like proteins that are more or less soluble in aqueous solutions...

 or fibrous
Fibrous protein
Scleroproteins, or fibrous proteins, constitute one of the three main classes of proteins, alongside globular proteins and conjugated proteins.Keratin, collagen, elastin, and fibroin are all scleroproteins...

 form, facilitating a biological function. A polypeptide is a single linear polymer
Polymer
A polymer is a large molecule composed of repeating structural units. These subunits are typically connected by covalent chemical bonds...

 chain of amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

s bonded together by peptide bond
Peptide bond
This article is about the peptide link found within biological molecules, such as proteins. A similar article for synthetic molecules is being created...

s between the carboxyl and amino groups of adjacent amino acid residues
Residue (chemistry)
In chemistry, residue is the material remaining after a distillation or an evaporation, or to a portion of a larger molecule, such as a methyl group. It may also refer to the undesired byproducts of a reaction....

. The sequence
Peptide sequence
Peptide sequence or amino acid sequence is the order in which amino acid residues, connected by peptide bonds, lie in the chain in peptides and proteins. The sequence is generally reported from the N-terminal end containing free amino group to the C-terminal end containing free carboxyl group...

 of amino acids in a protein is defined by the sequence
DNA sequence
The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...

 of a gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

, which is encoded in the genetic code
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....

. In general, the genetic code specifies 20 standard amino acids; however, in certain organisms the genetic code can include selenocysteine
Selenocysteine
Selenocysteine is an amino acid that is present in several enzymes .-Nomenclature:...

—and in certain archaea
Archaea
The Archaea are a group of single-celled microorganisms. A single individual or species from this domain is called an archaeon...

pyrrolysine
Pyrrolysine
Pyrrolysine is a naturally occurring, genetically coded amino acid used by some methanogenic archaea and one known bacterium in enzymes that are part of their methane-producing metabolism. It is similar to lysine, but with an added pyrroline ring linked to the end of the lysine side chain...

. Shortly after or even during synthesis, the residues in a protein are often chemically modified by posttranslational modification
Posttranslational modification
Posttranslational modification is the chemical modification of a protein after its translation. It is one of the later steps in protein biosynthesis, and thus gene expression, for many proteins....

, which alters the physical and chemical properties, folding, stability, activity, and ultimately, the function of the proteins. Sometimes proteins have non-peptide groups attached, which can be called prosthetic groups or cofactor
Cofactor (biochemistry)
A cofactor is a non-protein chemical compound that is bound to a protein and is required for the protein's biological activity. These proteins are commonly enzymes, and cofactors can be considered "helper molecules" that assist in biochemical transformations....

s. Proteins can also work together to achieve a particular function, and they often associate to form stable protein complex
Protein complex
A multiprotein complex is a group of two or more associated polypeptide chains. If the different polypeptide chains contain different protein domain, the resulting multiprotein complex can have multiple catalytic functions...

es.

One of the most distinguishing features of polypeptides is their ability to fold into a globular state. The extent to which proteins fold into a defined structure varies widely. Some proteins fold into a highly rigid structure with small fluctuations and are therefore considered to be single structure. Other proteins undergo large rearrangements from one conformation to another. This conformational change is often associated with a signaling event
Signal transduction
Signal transduction occurs when an extracellular signaling molecule activates a cell surface receptor. In turn, this receptor alters intracellular molecules creating a response...

. Thus, the structure of a protein serves as a medium through which to regulate either the function of a protein or activity of an enzyme. Not all proteins require a folding process in order to function, as some function in an unfolded state.

Like other biological macromolecules such as polysaccharide
Polysaccharide
Polysaccharides are long carbohydrate molecules, of repeated monomer units joined together by glycosidic bonds. They range in structure from linear to highly branched. Polysaccharides are often quite heterogeneous, containing slight modifications of the repeating unit. Depending on the structure,...

s and nucleic acid
Nucleic acid
Nucleic acids are biological molecules essential for life, and include DNA and RNA . Together with proteins, nucleic acids make up the most important macromolecules; each is found in abundance in all living things, where they function in encoding, transmitting and expressing genetic information...

s, proteins are essential parts of organisms and participate in virtually every process within cell
Cell (biology)
The cell is the basic structural and functional unit of all known living organisms. It is the smallest unit of life that is classified as a living thing, and is often called the building block of life. The Alberts text discusses how the "cellular building blocks" move to shape developing embryos....

s. Many proteins are enzyme
Enzyme
Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...

s that catalyze
Catalysis
Catalysis is the change in rate of a chemical reaction due to the participation of a substance called a catalyst. Unlike other reagents that participate in the chemical reaction, a catalyst is not consumed by the reaction itself. A catalyst may participate in multiple chemical transformations....

 biochemical reactions and are vital to metabolism
Metabolism
Metabolism is the set of chemical reactions that happen in the cells of living organisms to sustain life. These processes allow organisms to grow and reproduce, maintain their structures, and respond to their environments. Metabolism is usually divided into two categories...

. Proteins also have structural or mechanical functions, such as actin
Actin
Actin is a globular, roughly 42-kDa moonlighting protein found in all eukaryotic cells where it may be present at concentrations of over 100 μM. It is also one of the most highly-conserved proteins, differing by no more than 20% in species as diverse as algae and humans...

 and myosin
Myosin
Myosins comprise a family of ATP-dependent motor proteins and are best known for their role in muscle contraction and their involvement in a wide range of other eukaryotic motility processes. They are responsible for actin-based motility. The term was originally used to describe a group of similar...

 in muscle and the proteins in the cytoskeleton
Cytoskeleton
The cytoskeleton is a cellular "scaffolding" or "skeleton" contained within a cell's cytoplasm and is made out of protein. The cytoskeleton is present in all cells; it was once thought to be unique to eukaryotes, but recent research has identified the prokaryotic cytoskeleton...

, which form a system of scaffolding
Scaffolding
Scaffolding is a temporary structure used to support people and material in the construction or repair of buildings and other large structures. It is usually a modular system of metal pipes or tubes, although it can be from other materials...

 that maintains cell shape. Other proteins are important in cell signaling
Cell signaling
Cell signaling is part of a complex system of communication that governs basic cellular activities and coordinates cell actions. The ability of cells to perceive and correctly respond to their microenvironment is the basis of development, tissue repair, and immunity as well as normal tissue...

, immune response
Antibody
An antibody, also known as an immunoglobulin, is a large Y-shaped protein used by the immune system to identify and neutralize foreign objects such as bacteria and viruses. The antibody recognizes a unique part of the foreign target, termed an antigen...

s, cell adhesion
Cell adhesion
Cellular adhesion is the binding of a cell to a surface, extracellular matrix or another cell using cell adhesion molecules such as selectins, integrins, and cadherins. Correct cellular adhesion is essential in maintaining multicellular structure...

, and the cell cycle
Cell cycle
The cell cycle, or cell-division cycle, is the series of events that takes place in a cell leading to its division and duplication . In cells without a nucleus , the cell cycle occurs via a process termed binary fission...

. Proteins are also necessary in animals' diets, since animals cannot synthesize
Amino acid synthesis
For the non-biological synthesis of amino acids see: Strecker amino acid synthesisAmino acid synthesis is the set of biochemical processes by which the various amino acids are produced from other compounds. The substrates for these processes are various compounds in the organism's diet or growth...

 all the amino acids they need and must obtain essential amino acid
Essential amino acid
An essential amino acid or indispensable amino acid is an amino acid that cannot be synthesized de novo by the organism , and therefore must be supplied in the diet.-Essentiality vs. conditional essentiality in humans:...

s from food. Through the process of digestion
Digestion
Digestion is the mechanical and chemical breakdown of food into smaller components that are more easily absorbed into a blood stream, for instance. Digestion is a form of catabolism: a breakdown of large food molecules to smaller ones....

, animals break down ingested protein into free amino acids that are then used in metabolism.

Proteins were first described by the Dutch
Dutch people
The Dutch people are an ethnic group native to the Netherlands. They share a common culture and speak the Dutch language. Dutch people and their descendants are found in migrant communities worldwide, notably in Suriname, Chile, Brazil, Canada, Australia, South Africa, New Zealand, and the United...

 chemist Gerardus Johannes Mulder
Gerardus Johannes Mulder
Gerardus Johannes Mulder was a Dutch organic and analytical chemist-Biography:Mulder was born in Utrecht, and earned a medical degree from Utrecht University....

 and named by the Swedish chemist Jöns Jacob Berzelius in 1838. Early nutritional scientists such as the German Carl von Voit
Carl von Voit
Carl von Voit was a German physiologist and dietitian.Von Voit was born in Amberg. From 1848 to 1854 he studied medicine in Munich and Würzburg; habilitation in 1857 at the University of Munich, professor of physiology since 1860, as well as curator of the physiological collection.Carl von Voit is...

 believed that protein was the most important nutrient for maintaining the structure of the body, because it was generally believed that "flesh makes flesh." The central role of proteins as enzymes in living organisms was however not fully appreciated until 1926, when James B. Sumner
James B. Sumner
James Batcheller Sumner was an American chemist. He shared the Nobel Prize in Chemistry in 1946 with John Howard Northrop and Wendell Meredith Stanley.-Biography:...

 showed that the enzyme urease
Urease
Urease is an enzyme that catalyzes the hydrolysis of urea into carbon dioxide and ammonia. The reaction occurs as follows:In 1926, James Sumner showed that urease is a protein. Urease is found in bacteria, yeast, and several higher plants. The structure of urease was first solved by P.A...

 was in fact a protein. The first protein to be sequenced
Protein sequencing
Protein sequencing is a technique to determine the amino acid sequence of a protein, as well as which conformation the protein adopts and the extent to which it is complexed with any non-peptide molecules...

 was insulin
Insulin
Insulin is a hormone central to regulating carbohydrate and fat metabolism in the body. Insulin causes cells in the liver, muscle, and fat tissue to take up glucose from the blood, storing it as glycogen in the liver and muscle....

, by Frederick Sanger
Frederick Sanger
Frederick Sanger, OM, CH, CBE, FRS is an English biochemist and a two-time Nobel laureate in chemistry, the only person to have been so. In 1958 he was awarded a Nobel prize in chemistry "for his work on the structure of proteins, especially that of insulin"...

, who won the Nobel Prize for this achievement in 1958. The first protein structure
Protein structure
Proteins are an important class of biological macromolecules present in all organisms. Proteins are polymers of amino acids. Classified by their physical size, proteins are nanoparticles . Each protein polymer – also known as a polypeptide – consists of a sequence formed from 20 possible L-α-amino...

s to be solved were hemoglobin
Hemoglobin
Hemoglobin is the iron-containing oxygen-transport metalloprotein in the red blood cells of all vertebrates, with the exception of the fish family Channichthyidae, as well as the tissues of some invertebrates...

 and myoglobin
Myoglobin
Myoglobin is an iron- and oxygen-binding protein found in the muscle tissue of vertebrates in general and in almost all mammals. It is related to hemoglobin, which is the iron- and oxygen-binding protein in blood, specifically in the red blood cells. The only time myoglobin is found in the...

, by Max Perutz
Max Perutz
Max Ferdinand Perutz, OM, CH, CBE, FRS was an Austrian-born British molecular biologist, who shared the 1962 Nobel Prize for Chemistry with John Kendrew, for their studies of the structures of hemoglobin and globular proteins...

 and Sir John Cowdery Kendrew
John Kendrew
Sir John Cowdery Kendrew, CBE, FRS was an English biochemist and crystallographer who shared the 1962 Nobel Prize in Chemistry with Max Perutz; their group in the Cavendish Laboratory investigated the structure of heme-containing proteins.-Biography:He was born in Oxford, son of Wilford George...

, respectively, in 1958. The three-dimensional structures of both proteins were first determined by X-ray diffraction analysis; Perutz and Kendrew shared the 1962 Nobel Prize in Chemistry
Nobel Prize in Chemistry
The Nobel Prize in Chemistry is awarded annually by the Royal Swedish Academy of Sciences to scientists in the various fields of chemistry. It is one of the five Nobel Prizes established by the will of Alfred Nobel in 1895, awarded for outstanding contributions in chemistry, physics, literature,...

 for these discoveries. Proteins may be purified
Protein purification
Protein purification is a series of processes intended to isolate a single type of protein from a complex mixture. Protein purification is vital for the characterization of the function, structure and interactions of the protein of interest. The starting material is usually a biological tissue or...

 from other cellular components using a variety of techniques such as ultracentrifugation, precipitation
Precipitation (chemistry)
Precipitation is the formation of a solid in a solution or inside anothersolid during a chemical reaction or by diffusion in a solid. When the reaction occurs in a liquid, the solid formed is called the precipitate, or when compacted by a centrifuge, a pellet. The liquid remaining above the solid...

, electrophoresis
Electrophoresis
Electrophoresis, also called cataphoresis, is the motion of dispersed particles relative to a fluid under the influence of a spatially uniform electric field. This electrokinetic phenomenon was observed for the first time in 1807 by Reuss , who noticed that the application of a constant electric...

, and chromatography
Chromatography
Chromatography is the collective term for a set of laboratory techniques for the separation of mixtures....

; the advent of genetic engineering
Genetic engineering
Genetic engineering, also called genetic modification, is the direct human manipulation of an organism's genome using modern DNA technology. It involves the introduction of foreign DNA or synthetic genes into the organism of interest...

 has made possible a number of methods to facilitate purification. Methods commonly used to study protein structure and function include immunohistochemistry
Immunohistochemistry
Immunohistochemistry or IHC refers to the process of detecting antigens in cells of a tissue section by exploiting the principle of antibodies binding specifically to antigens in biological tissues. IHC takes its name from the roots "immuno," in reference to antibodies used in the procedure, and...

, site-directed mutagenesis
Site-directed mutagenesis
Site-directed mutagenesis, also called site-specific mutagenesis or oligonucleotide-directed mutagenesis, is a molecular biology technique in which a mutation is created at a defined site in a DNA molecule. In general, this form of mutagenesis requires that the wild type gene sequence be known...

, nuclear magnetic resonance
Nuclear magnetic resonance
Nuclear magnetic resonance is a physical phenomenon in which magnetic nuclei in a magnetic field absorb and re-emit electromagnetic radiation...

 and mass spectrometry
Mass spectrometry
Mass spectrometry is an analytical technique that measures the mass-to-charge ratio of charged particles.It is used for determining masses of particles, for determining the elemental composition of a sample or molecule, and for elucidating the chemical structures of molecules, such as peptides and...

. Distributed computing
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

 is a relatively new tool researchers are using to examine the infamously complex interactions that govern protein folding; the statistical analysis techniques employed to calculate a protein's probable tertiary structure
Tertiary structure
In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...

 from its amino acid sequence (primary structure
Primary structure
The primary structure of peptides and proteins refers to the linear sequence of its amino acid structural units. The term "primary structure" was first coined by Linderstrøm-Lang in 1951...

) are well-suited for the distributed computing environment, which has made this otherwise prohibitively expensive and time consuming problem significantly more manageable.

Biochemistry


Most proteins consist of linear polymer
Polymer
A polymer is a large molecule composed of repeating structural units. These subunits are typically connected by covalent chemical bonds...

s built from series of up to 20 different L-α-amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

s. All proteinogenic amino acid
Proteinogenic amino acid
Proteinogenic amino acids are those amino acids that can be found in proteins and require cellular machinery coded for in the genetic code of any organism for their isolated production. There are 22 standard amino acids, but only 21 are found in eukaryotes. Of the 22, 20 are directly encoded by...

s possess common structural features, including an α-carbon
Alpha carbon
The alpha carbon in organic chemistry refers to the first carbon that attaches to a functional group . By extension, the second carbon is the beta carbon, and so on....

 to which an amino group, a carboxyl group, and a variable side chain
Side chain
In organic chemistry and biochemistry, a side chain is a chemical group that is attached to a core part of the molecule called "main chain" or backbone. The placeholder R is often used as a generic placeholder for alkyl group side chains in chemical structure diagrams. To indicate other non-carbon...

 are bonded
Chemical bond
A chemical bond is an attraction between atoms that allows the formation of chemical substances that contain two or more atoms. The bond is caused by the electromagnetic force attraction between opposite charges, either between electrons and nuclei, or as the result of a dipole attraction...

. Only proline
Proline
Proline is an α-amino acid, one of the twenty DNA-encoded amino acids. Its codons are CCU, CCC, CCA, and CCG. It is not an essential amino acid, which means that the human body can synthesize it. It is unique among the 20 protein-forming amino acids in that the α-amino group is secondary...

 differs from this basic structure as it contains an unusual ring to the N-end amine group, which forces the CO–NH amide moiety into a fixed conformation. The side chains of the standard amino acids, detailed in the list of standard amino acids, have a great variety of chemical structures and properties; it is the combined effect of all of the amino acid side chains in a protein that ultimately determines its three-dimensional structure and its chemical reactivity.
The amino acids in a polypeptide chain are linked by peptide bond
Peptide bond
This article is about the peptide link found within biological molecules, such as proteins. A similar article for synthetic molecules is being created...

s. Once linked in the protein chain, an individual amino acid is called a residue, and the linked series of carbon, nitrogen, and oxygen atoms are known as the main chain or protein backbone.

The peptide bond has two resonance
Resonance (chemistry)
In chemistry, resonance or mesomerism is a way of describing delocalized electrons within certain molecules or polyatomic ions where the bonding cannot be expressed by one single Lewis formula...

 forms that contribute some double-bond character and inhibit rotation around its axis, so that the alpha carbons are roughly coplanar. The other two dihedral angle
Dihedral angle
In geometry, a dihedral or torsion angle is the angle between two planes.The dihedral angle of two planes can be seen by looking at the planes "edge on", i.e., along their line of intersection...

s in the peptide bond determine the local shape assumed by the protein backbone. The end of the protein with a free carboxyl group is known as the C-terminus or carboxy terminus, whereas the end with a free amino group is known as the N-terminus or amino terminus.
The words protein, polypeptide, and peptide
Peptide
Peptides are short polymers of amino acid monomers linked by peptide bonds. They are distinguished from proteins on the basis of size, typically containing less than 50 monomer units. The shortest peptides are dipeptides, consisting of two amino acids joined by a single peptide bond...

are a little ambiguous and can overlap in meaning. Protein is generally used to refer to the complete biological molecule in a stable conformation
Tertiary structure
In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...

, whereas peptide is generally reserved for a short amino acid oligomers often lacking a stable three-dimensional structure. However, the boundary between the two is not well defined and usually lies near 20–30 residues. Polypeptide can refer to any single linear chain of amino acids, usually regardless of length, but often implies an absence of a defined conformation
Tertiary structure
In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...

.

Synthesis





Proteins are assembled from amino acids using information encoded in gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s. Each protein has its own unique amino acid sequence that is specified by the nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

 sequence of the gene encoding this protein. The genetic code
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....

 is a set of three-nucleotide sets called codons and each three-nucleotide combination designates an amino acid, for example AUG (adenine
Adenine
Adenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...

-uracil
Uracil
Uracil is one of the four nucleobases in the nucleic acid of RNA that are represented by the letters A, G, C and U. The others are adenine, cytosine, and guanine. In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by thymine.Uracil is a common and...

-guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...

) is the code for methionine
Methionine
Methionine is an α-amino acid with the chemical formula HO2CCHCH2CH2SCH3. This essential amino acid is classified as nonpolar. This amino-acid is coded by the codon AUG, also known as the initiation codon, since it indicates mRNA's coding region where translation into protein...

. Because DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 contains four nucleotides, the total number of possible codons is 64; hence, there is some redundancy in the genetic code, with some amino acids specified by more than one codon. Genes encoded in DNA are first transcribed
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

 into pre-messenger RNA
Messenger RNA
Messenger RNA is a molecule of RNA encoding a chemical "blueprint" for a protein product. mRNA is transcribed from a DNA template, and carries coding information to the sites of protein synthesis: the ribosomes. Here, the nucleic acid polymer is translated into a polymer of amino acids: a protein...

 (mRNA) by proteins such as RNA polymerase
RNA polymerase
RNA polymerase is an enzyme that produces RNA. In cells, RNAP is needed for constructing RNA chains from DNA genes as templates, a process called transcription. RNA polymerase enzymes are essential to life and are found in all organisms and many viruses...

. Most organisms then process the pre-mRNA (also known as a primary transcript) using various forms of Post-transcriptional modification
Post-transcriptional modification
Post-transcriptional modification is a process in cell biology by which, in eukaryotic cells, primary transcript RNA is converted into mature RNA. A notable example is the conversion of precursor messenger RNA into mature messenger RNA , which includes splicing and occurs prior to protein synthesis...

 to form the mature mRNA, which is then used as a template for protein synthesis by the ribosome
Ribosome
A ribosome is a component of cells that assembles the twenty specific amino acid molecules to form the particular protein molecule determined by the nucleotide sequence of an RNA molecule....

. In prokaryote
Prokaryote
The prokaryotes are a group of organisms that lack a cell nucleus , or any other membrane-bound organelles. The organisms that have a cell nucleus are called eukaryotes. Most prokaryotes are unicellular, but a few such as myxobacteria have multicellular stages in their life cycles...

s the mRNA may either be used as soon as it is produced, or be bound by a ribosome after having moved away from the nucleoid
Nucleoid
The nucleoid is an irregularly-shaped region within the cell of a prokaryote that contains all or most of the genetic material. In contrast to the nucleus of a eukaryotic cell, it is not surrounded by a nuclear membrane. The genome of prokaryotic organisms generally is a circular, double-stranded...

. In contrast, eukaryote
Eukaryote
A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...

s make mRNA in the cell nucleus
Cell nucleus
In cell biology, the nucleus is a membrane-enclosed organelle found in eukaryotic cells. It contains most of the cell's genetic material, organized as multiple long linear DNA molecules in complex with a large variety of proteins, such as histones, to form chromosomes. The genes within these...

 and then translocate it across the nuclear membrane into the cytoplasm
Cytoplasm
The cytoplasm is a small gel-like substance residing between the cell membrane holding all the cell's internal sub-structures , except for the nucleus. All the contents of the cells of prokaryote organisms are contained within the cytoplasm...

, where protein synthesis
Protein biosynthesis
Protein biosynthesis is the process in which cells build or manufacture proteins. The term is sometimes used to refer only to protein translation but more often it refers to a multi-step process, beginning with amino acid synthesis and transcription of nuclear DNA into messenger RNA, which is then...

 then takes place. The rate of protein synthesis is higher in prokaryotes than eukaryotes and can reach up to 20 amino acids per second.

The process of synthesizing a protein from an mRNA template is known as translation
Translation (genetics)
In molecular biology and genetics, translation is the third stage of protein biosynthesis . In translation, messenger RNA produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein...

. The mRNA is loaded onto the ribosome and is read three nucleotides at a time by matching each codon to its base pair
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

ing anticodon located on a transfer RNA
Transfer RNA
Transfer RNA is an adaptor molecule composed of RNA, typically 73 to 93 nucleotides in length, that is used in biology to bridge the three-letter genetic code in messenger RNA with the twenty-letter code of amino acids in proteins. The role of tRNA as an adaptor is best understood by...

 molecule, which carries the amino acid corresponding to the codon it recognizes. The enzyme aminoacyl tRNA synthetase
Aminoacyl tRNA synthetase
An aminoacyl tRNA synthetase is an enzyme that catalyzes the esterification of a specific amino acid or its precursor to one of all its compatible cognate tRNAs to form an aminoacyl-tRNA. This is sometimes called "charging" the tRNA with the amino acid...

 "charges" the tRNA molecules with the correct amino acids. The growing polypeptide is often termed the nascent chain. Proteins are always biosynthesized from N-terminus to C-terminus.

The size of a synthesized protein can be measured by the number of amino acids it contains and by its total molecular mass
Molecular mass
The molecular mass of a substance is the mass of one molecule of that substance, in unified atomic mass unit u...

, which is normally reported in units of daltons (synonymous with atomic mass unit
Atomic mass unit
The unified atomic mass unit or dalton is a unit that is used for indicating mass on an atomic or molecular scale. It is defined as one twelfth of the rest mass of an unbound neutral atom of carbon-12 in its nuclear and electronic ground state, and has a value of...

s), or the derivative unit kilodalton (kDa). Yeast
Yeast
Yeasts are eukaryotic micro-organisms classified in the kingdom Fungi, with 1,500 species currently described estimated to be only 1% of all fungal species. Most reproduce asexually by mitosis, and many do so by an asymmetric division process called budding...

 proteins are on average 466 amino acids long and 53 kDa in mass. The largest known proteins are the titin
Titin
Titin , also known as connectin, is a protein that in humans is encoded by the TTN gene. Titin is a giant protein that functions as a molecular spring which is responsible for the passive elasticity of muscle. It is composed of 244 individually folded protein domains connected by unstructured...

s, a component of the muscle
Muscle
Muscle is a contractile tissue of animals and is derived from the mesodermal layer of embryonic germ cells. Muscle cells contain contractile filaments that move past each other and change the size of the cell. They are classified as skeletal, cardiac, or smooth muscles. Their function is to...

 sarcomere
Sarcomere
A sarcomere is the basic unit of a muscle. Muscles are composed of tubular muscle cells . Muscle cells are composed of tubular myofibrils. Myofibrils are composed of repeating sections of sarcomeres, which appear under the microscope as dark and light bands...

, with a molecular mass of almost 3,000 kDa and a total length of almost 27,000 amino acids.

Chemical synthesis


Short proteins can also be synthesized chemically by a family of methods known as peptide synthesis
Peptide synthesis
In organic chemistry, peptide synthesis is the production of peptides, which are organic compounds in which multiple amino acids are linked via amide bonds which are also known as peptide bonds...

, which rely on organic synthesis
Organic synthesis
Organic synthesis is a special branch of chemical synthesis and is concerned with the construction of organic compounds via organic reactions. Organic molecules can often contain a higher level of complexity compared to purely inorganic compounds, so the synthesis of organic compounds has...

 techniques such as chemical ligation
Chemical ligation
Chemical ligation is a set of techniques used for creating long peptide or protein chains. It is the second step of a convergent approach. First, smaller peptides containing 30-50 amino acids are prepared by conventional chemical peptide synthesis. Then, they are completely deprotected...

 to produce peptides in high yield. Chemical synthesis allows for the introduction of non-natural amino acids into polypeptide chains, such as attachment of fluorescent probes to amino acid side chains. These methods are useful in laboratory biochemistry
Biochemistry
Biochemistry, sometimes called biological chemistry, is the study of chemical processes in living organisms, including, but not limited to, living matter. Biochemistry governs all living organisms and living processes...

 and cell biology
Cell biology
Cell biology is a scientific discipline that studies cells – their physiological properties, their structure, the organelles they contain, interactions with their environment, their life cycle, division and death. This is done both on a microscopic and molecular level...

, though generally not for commercial applications. Chemical synthesis is inefficient for polypeptides longer than about 300 amino acids, and the synthesized proteins may not readily assume their native tertiary structure
Tertiary structure
In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...

. Most chemical synthesis methods proceed from C-terminus to N-terminus, opposite the biological reaction.

Structure



Most proteins fold
Protein folding
Protein folding is the process by which a protein structure assumes its functional shape or conformation. It is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil....

 into unique 3-dimensional structures. The shape into which a protein naturally folds is known as its native conformation. Although many proteins can fold unassisted, simply through the chemical properties of their amino acids, others require the aid of molecular chaperones to fold into their native states. Biochemists often refer to four distinct aspects of a protein's structure:
  • Primary structure
    Primary structure
    The primary structure of peptides and proteins refers to the linear sequence of its amino acid structural units. The term "primary structure" was first coined by Linderstrøm-Lang in 1951...

    : the amino acid sequence
    Peptide sequence
    Peptide sequence or amino acid sequence is the order in which amino acid residues, connected by peptide bonds, lie in the chain in peptides and proteins. The sequence is generally reported from the N-terminal end containing free amino group to the C-terminal end containing free carboxyl group...

    .
  • Secondary structure
    Secondary structure
    In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...

    : regularly repeating local structures stabilized by hydrogen bond
    Hydrogen bond
    A hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, such as nitrogen, oxygen or fluorine, that comes from another molecule or chemical group. The hydrogen must be covalently bonded to another electronegative atom to create the bond...

    s. The most common examples are the alpha helix
    Alpha helix
    A common motif in the secondary structure of proteins, the alpha helix is a right-handed coiled or spiral conformation, in which every backbone N-H group donates a hydrogen bond to the backbone C=O group of the amino acid four residues earlier...

    , beta sheet
    Beta sheet
    The β sheet is the second form of regular secondary structure in proteins, only somewhat less common than the alpha helix. Beta sheets consist of beta strands connected laterally by at least two or three backbone hydrogen bonds, forming a generally twisted, pleated sheet...

     and turns
    Turn (biochemistry)
    A turn is an element of secondary structure in proteins where the polypeptide chain reverses its overall direction.- Definition :According to the most common definition, a turn is a structural motif where the Cα atoms of two residues separated by few peptide bonds are in close approach A turn is...

    . Because secondary structures are local, many regions of different secondary structure can be present in the same protein molecule.
  • Tertiary structure
    Tertiary structure
    In biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...

    : the overall shape of a single protein molecule; the spatial relationship of the secondary structures to one another. Tertiary structure is generally stabilized by nonlocal interactions, most commonly the formation of a hydrophobic core, but also through salt bridge
    Salt bridge (protein)
    Salt bridges fall into the broader category of noncovalent interactions. A salt bridge is actually a combination of two noncovalent interactions: hydrogen bonding and electrostatic interactions . This is most commonly observed to contribute stability to the entropically unfavorable folded...

    s, hydrogen bonds, disulfide bond
    Disulfide bond
    In chemistry, a disulfide bond is a covalent bond, usually derived by the coupling of two thiol groups. The linkage is also called an SS-bond or disulfide bridge. The overall connectivity is therefore R-S-S-R. The terminology is widely used in biochemistry...

    s, and even posttranslational modification
    Posttranslational modification
    Posttranslational modification is the chemical modification of a protein after its translation. It is one of the later steps in protein biosynthesis, and thus gene expression, for many proteins....

    s. The term "tertiary structure" is often used as synonymous with the term fold. The tertiary structure is what controls the basic function of the protein.
  • Quaternary structure
    Quaternary structure
    In biochemistry, quaternary structure is the arrangement of multiple folded protein or coiling protein molecules in a multi-subunit complex.-Description and examples:...

    : the structure formed by several protein molecules (polypeptide chains), usually called protein subunit
    Protein subunit
    In structural biology, a protein subunit or subunit protein is a single protein molecule that assembles with other protein molecules to form a protein complex: a multimeric or oligomeric protein. Many naturally occurring proteins and enzymes are multimeric...

    s
    in this context, which function as a single protein complex
    Protein complex
    A multiprotein complex is a group of two or more associated polypeptide chains. If the different polypeptide chains contain different protein domain, the resulting multiprotein complex can have multiple catalytic functions...

    .


Proteins are not entirely rigid molecules. In addition to these levels of structure, proteins may shift between several related structures while they perform their functions. In the context of these functional rearrangements, these tertiary or quaternary structures are usually referred to as "conformations", and transitions between them are called conformational changes. Such changes are often induced by the binding of a substrate
Substrate (biochemistry)
In biochemistry, a substrate is a molecule upon which an enzyme acts. Enzymes catalyze chemical reactions involving the substrate. In the case of a single substrate, the substrate binds with the enzyme active site, and an enzyme-substrate complex is formed. The substrate is transformed into one or...

 molecule to an enzyme's active site
Active site
In biology the active site is part of an enzyme where substrates bind and undergo a chemical reaction. The majority of enzymes are proteins but RNA enzymes called ribozymes also exist. The active site of an enzyme is usually found in a cleft or pocket that is lined by amino acid residues that...

, or the physical region of the protein that participates in chemical catalysis. In solution proteins also undergo variation in structure through thermal vibration and the collision with other molecules.


Proteins can be informally divided into three main classes, which correlate with typical tertiary structures: globular protein
Globular protein
Globular proteins, or spheroproteins are one of the two main protein classes, comprising "globe"-like proteins that are more or less soluble in aqueous solutions...

s, fibrous protein
Fibrous protein
Scleroproteins, or fibrous proteins, constitute one of the three main classes of proteins, alongside globular proteins and conjugated proteins.Keratin, collagen, elastin, and fibroin are all scleroproteins...

s, and membrane protein
Membrane protein
A membrane protein is a protein molecule that is attached to, or associated with the membrane of a cell or an organelle. More than half of all proteins interact with membranes.-Function:...

s. Almost all globular proteins are soluble and many are enzymes. Fibrous proteins are often structural, such as collagen
Collagen
Collagen is a group of naturally occurring proteins found in animals, especially in the flesh and connective tissues of mammals. It is the main component of connective tissue, and is the most abundant protein in mammals, making up about 25% to 35% of the whole-body protein content...

, the major component of connective tissue, or keratin
Keratin
Keratin refers to a family of fibrous structural proteins. Keratin is the key of structural material making up the outer layer of human skin. It is also the key structural component of hair and nails...

, the protein component of hair and nails. Membrane proteins often serve as receptors
Receptor (biochemistry)
In biochemistry, a receptor is a molecule found on the surface of a cell, which receives specific chemical signals from neighbouring cells or the wider environment within an organism...

 or provide channels for polar or charged molecules to pass through the cell membrane
Cell membrane
The cell membrane or plasma membrane is a biological membrane that separates the interior of all cells from the outside environment. The cell membrane is selectively permeable to ions and organic molecules and controls the movement of substances in and out of cells. It basically protects the cell...

.

A special case of intramolecular hydrogen bonds within proteins, poorly shielded from water attack and hence promoting their own dehydration
Dehydration
In physiology and medicine, dehydration is defined as the excessive loss of body fluid. It is literally the removal of water from an object; however, in physiological terms, it entails a deficiency of fluid within an organism...

, are called dehydron
Dehydron
A dehydron is an intramolecular hydrogen bond incompletely shielded from water attack, with a propensity to promote its own dehydration. Dehydrons constitute a special kind of packing defect in soluble proteins and were named and characterized by Argentine-American scientist Ariel Fernandez, from ,...

s.

Structure determination


Discovering the tertiary structure of a protein, or the quaternary structure of its complexes, can provide important clues about how the protein performs its function. Common experimental methods of structure determination include X-ray crystallography
X-ray crystallography
X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and causes the beam of light to spread into many specific directions. From the angles and intensities of these diffracted beams, a crystallographer can produce a...

 and NMR spectroscopy, both of which can produce information at atom
Atom
The atom is a basic unit of matter that consists of a dense central nucleus surrounded by a cloud of negatively charged electrons. The atomic nucleus contains a mix of positively charged protons and electrically neutral neutrons...

ic resolution. However, NMR experiments are able to provide information from which a subset of distances between pairs of atoms can be estimated, and the final possible conformations for a protein are determined by solving a distance geometry
Distance geometry
Distance geometry is the characterization and study of sets of points based only on given values of the distances between member pairs. Therefore distance geometry has immediate relevance where distance values are determined or considered, such as in surveying, cartography and...

 problem. Dual polarisation interferometry
Dual Polarisation Interferometry
Dual polarization interferometry is an analytical technique that can probe molecular scale layers adsorbed to the surface of a waveguide by using the evanescent wave of a laser beam confined to the waveguide...

 is a quantitative analytical method for measuring the overall protein conformation and conformational change
Conformational change
A macromolecule is usually flexible and dynamic. It can change its shape in response to changes in its environment or other factors; each possible shape is called a conformation, and a transition between them is called a conformational change...

s due to interactions or other stimulus. Circular dichroism
Circular dichroism
Circular dichroism refers to the differential absorption of left and right circularly polarized light. This phenomenon was discovered by Jean-Baptiste Biot, Augustin Fresnel, and Aimé Cotton in the first half of the 19th century. It is exhibited in the absorption bands of optically active chiral...

 is another laboratory technique for determining internal beta sheet/ helical composition of proteins. Cryoelectron microscopy is used to produce lower-resolution structural information about very large protein complexes, including assembled virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

es; a variant known as electron crystallography
Electron crystallography
Electron crystallography is a method to determine the arrangement of atoms in solids using a transmission electron microscope .- Comparison with X-ray crystallography :...

 can also produce high-resolution information in some cases, especially for two-dimensional crystals of membrane proteins. Solved structures are usually deposited in the Protein Data Bank
Protein Data Bank
The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....

 (PDB), a freely available resource from which structural data about thousands of proteins can be obtained in the form of Cartesian coordinates for each atom in the protein.

Many more gene sequences are known than protein structures. Further, the set of solved structures is biased toward proteins that can be easily subjected to the conditions required in X-ray crystallography
X-ray crystallography
X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and causes the beam of light to spread into many specific directions. From the angles and intensities of these diffracted beams, a crystallographer can produce a...

, one of the major structure determination methods. In particular, globular proteins are comparatively easy to crystallize in preparation for X-ray crystallography. Membrane proteins, by contrast, are difficult to crystallize and are underrepresented in the PDB. Structural genomics
Structural genomics
Structural genomics seeks to describe the 3-dimensional structure of every protein encoded by a given genome. This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling approaches...

 initiatives have attempted to remedy these deficiencies by systematically solving representative structures of major fold classes. Protein structure prediction
Protein structure prediction
Protein structure prediction is the prediction of the three-dimensional structure of a protein from its amino acid sequence — that is, the prediction of its secondary, tertiary, and quaternary structure from its primary structure. Structure prediction is fundamentally different from the inverse...

 methods attempt to provide a means of generating a plausible structure for proteins whose structures have not been experimentally determined.

Cellular functions


Proteins are the chief actors within the cell, said to be carrying out the duties specified by the information encoded in genes. With the exception of certain types of RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

, most other biological molecules are relatively inert elements upon which proteins act. Proteins make up half the dry weight of an Escherichia coli
Escherichia coli
Escherichia coli is a Gram-negative, rod-shaped bacterium that is commonly found in the lower intestine of warm-blooded organisms . Most E. coli strains are harmless, but some serotypes can cause serious food poisoning in humans, and are occasionally responsible for product recalls...

cell, whereas other macromolecules such as DNA and RNA make up only 3% and 20%, respectively. The set of proteins expressed in a particular cell or cell type is known as its proteome
Proteome
The proteome is the entire set of proteins expressed by a genome, cell, tissue or organism. More specifically, it is the set of expressed proteins in a given type of cells or an organism at a given time under defined conditions. The term is a portmanteau of proteins and genome.The term has been...

.

The chief characteristic of proteins that also allows their diverse set of functions is their ability to bind other molecules specifically and tightly. The region of the protein responsible for binding another molecule is known as the binding site
Binding site
In biochemistry, a binding site is a region on a protein, DNA, or RNA to which specific other molecules and ions—in this context collectively called ligands—form a chemical bond...

 and is often a depression or "pocket" on the molecular surface. This binding ability is mediated by the tertiary structure of the protein, which defines the binding site pocket, and by the chemical properties of the surrounding amino acids' side chains. Protein binding can be extraordinarily tight and specific; for example, the ribonuclease inhibitor
Ribonuclease inhibitor
Ribonuclease inhibitor is a large , acidic , leucine-rich repeat protein that forms extremely tight complexes with certain ribonucleases. It is a major cellular protein, comprising ~0.1% of all cellular protein by weight, and appears to play an important role in regulating the lifetime of RNA.RI...

 protein binds to human angiogenin
Angiogenin
Angiogenin also known as ribonuclease 5 is a protein that in humans is encoded by the ANG gene. Angiogenin is a potent stimulator of new blood vessel formation...

 with a sub-femtomolar dissociation constant
Dissociation constant
In chemistry, biochemistry, and pharmacology, a dissociation constant is a specific type of equilibrium constant that measures the propensity of a larger object to separate reversibly into smaller components, as when a complex falls apart into its component molecules, or when a salt splits up into...

 (<10−15 M) but does not bind at all to its amphibian homolog onconase (>1 M). Extremely minor chemical changes such as the addition of a single methyl group to a binding partner can sometimes suffice to nearly eliminate binding; for example, the aminoacyl tRNA synthetase
Aminoacyl tRNA synthetase
An aminoacyl tRNA synthetase is an enzyme that catalyzes the esterification of a specific amino acid or its precursor to one of all its compatible cognate tRNAs to form an aminoacyl-tRNA. This is sometimes called "charging" the tRNA with the amino acid...

 specific to the amino acid valine
Valine
Valine is an α-amino acid with the chemical formula HO2CCHCH2. L-Valine is one of 20 proteinogenic amino acids. Its codons are GUU, GUC, GUA, and GUG. This essential amino acid is classified as nonpolar...

 discriminates against the very similar side chain of the amino acid isoleucine
Isoleucine
Isoleucine is an α-amino acid with the chemical formula HO2CCHCHCH2CH3. It is an essential amino acid, which means that humans cannot synthesize it, so it must be ingested. Its codons are AUU, AUC and AUA....

.

Proteins can bind to other proteins as well as to small-molecule
Small molecule
In the fields of pharmacology and biochemistry, a small molecule is a low molecular weight organic compound which is by definition not a polymer...

 substrates. When proteins bind specifically to other copies of the same molecule, they can oligomer
Oligomer
In chemistry, an oligomer is a molecule that consists of a few monomer units , in contrast to a polymer that, at least in principle, consists of an unlimited number of monomers. Dimers, trimers, and tetramers are oligomers. Many oils are oligomeric, such as liquid paraffin...

ize to form fibrils; this process occurs often in structural proteins that consist of globular monomers that self-associate to form rigid fibers. Protein–protein interactions also regulate enzymatic activity, control progression through the cell cycle
Cell cycle
The cell cycle, or cell-division cycle, is the series of events that takes place in a cell leading to its division and duplication . In cells without a nucleus , the cell cycle occurs via a process termed binary fission...

, and allow the assembly of large protein complex
Protein complex
A multiprotein complex is a group of two or more associated polypeptide chains. If the different polypeptide chains contain different protein domain, the resulting multiprotein complex can have multiple catalytic functions...

es that carry out many closely related reactions with a common biological function. Proteins can also bind to, or even be integrated into, cell membranes. The ability of binding partners to induce conformational changes in proteins allows the construction of enormously complex signaling
Cell signaling
Cell signaling is part of a complex system of communication that governs basic cellular activities and coordinates cell actions. The ability of cells to perceive and correctly respond to their microenvironment is the basis of development, tissue repair, and immunity as well as normal tissue...

 networks.
Importantly, as interactions between proteins are reversible, and depend heavily on the availability of different groups of partner proteins to form aggregates that are capable to carry out discrete sets of function, study of the interactions between specific proteins is a key to understand important aspects of cellular function, and ultimately the properties that distinguish particular cell types.

Enzymes



The best-known role of proteins in the cell is as enzyme
Enzyme
Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...

s, which catalyze
Catalysis
Catalysis is the change in rate of a chemical reaction due to the participation of a substance called a catalyst. Unlike other reagents that participate in the chemical reaction, a catalyst is not consumed by the reaction itself. A catalyst may participate in multiple chemical transformations....

 chemical reactions. Enzymes are usually highly specific and accelerate only one or a few chemical reactions. Enzymes carry out most of the reactions involved in metabolism
Metabolism
Metabolism is the set of chemical reactions that happen in the cells of living organisms to sustain life. These processes allow organisms to grow and reproduce, maintain their structures, and respond to their environments. Metabolism is usually divided into two categories...

, as well as manipulating DNA in processes such as DNA replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...

, DNA repair
DNA repair
DNA repair refers to a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as UV light and radiation can cause DNA damage, resulting in as many as 1...

, and transcription
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

. Some enzymes act on other proteins to add or remove chemical groups in a process known as posttranslational modification. About 4,000 reactions are known to be catalyzed by enzymes. The rate acceleration conferred by enzymatic catalysis is often enormous—as much as 1017-fold increase in rate over the uncatalyzed reaction in the case of orotate decarboxylase (78 million years without the enzyme, 18 milliseconds with the enzyme).

The molecules bound and acted upon by enzymes are called substrate
Substrate (biochemistry)
In biochemistry, a substrate is a molecule upon which an enzyme acts. Enzymes catalyze chemical reactions involving the substrate. In the case of a single substrate, the substrate binds with the enzyme active site, and an enzyme-substrate complex is formed. The substrate is transformed into one or...

s. Although enzymes can consist of hundreds of amino acids, it is usually only a small fraction of the residues that come in contact with the substrate, and an even smaller fraction—three to four residues on average—that are directly involved in catalysis. The region of the enzyme that binds the substrate and contains the catalytic residues is known as the active site
Active site
In biology the active site is part of an enzyme where substrates bind and undergo a chemical reaction. The majority of enzymes are proteins but RNA enzymes called ribozymes also exist. The active site of an enzyme is usually found in a cleft or pocket that is lined by amino acid residues that...

.

Cell signaling and ligand binding



Many proteins are involved in the process of cell signaling
Cell signaling
Cell signaling is part of a complex system of communication that governs basic cellular activities and coordinates cell actions. The ability of cells to perceive and correctly respond to their microenvironment is the basis of development, tissue repair, and immunity as well as normal tissue...

 and signal transduction
Signal transduction
Signal transduction occurs when an extracellular signaling molecule activates a cell surface receptor. In turn, this receptor alters intracellular molecules creating a response...

. Some proteins, such as insulin
Insulin
Insulin is a hormone central to regulating carbohydrate and fat metabolism in the body. Insulin causes cells in the liver, muscle, and fat tissue to take up glucose from the blood, storing it as glycogen in the liver and muscle....

, are extracellular proteins that transmit a signal from the cell in which they were synthesized to other cells in distant tissues
Biological tissue
Tissue is a cellular organizational level intermediate between cells and a complete organism. A tissue is an ensemble of cells, not necessarily identical, but from the same origin, that together carry out a specific function. These are called tissues because of their identical functioning...

. Others are membrane protein
Membrane protein
A membrane protein is a protein molecule that is attached to, or associated with the membrane of a cell or an organelle. More than half of all proteins interact with membranes.-Function:...

s that act as receptors
Receptor (biochemistry)
In biochemistry, a receptor is a molecule found on the surface of a cell, which receives specific chemical signals from neighbouring cells or the wider environment within an organism...

 whose main function is to bind a signaling molecule and induce a biochemical response in the cell. Many receptors have a binding site exposed on the cell surface and an effector domain within the cell, which may have enzymatic activity or may undergo a conformational change
Conformational change
A macromolecule is usually flexible and dynamic. It can change its shape in response to changes in its environment or other factors; each possible shape is called a conformation, and a transition between them is called a conformational change...

 detected by other proteins within the cell.

Antibodies are protein components of an adaptive immune system
Adaptive immune system
The adaptive immune system is composed of highly specialized, systemic cells and processes that eliminate or prevent pathogenic growth. Thought to have arisen in the first jawed vertebrates, the adaptive or "specific" immune system is activated by the “non-specific” and evolutionarily older innate...

 whose main function is to bind antigen
Antigen
An antigen is a foreign molecule that, when introduced into the body, triggers the production of an antibody by the immune system. The immune system will then kill or neutralize the antigen that is recognized as a foreign and potentially harmful invader. These invaders can be molecules such as...

s, or foreign substances in the body, and target them for destruction. Antibodies can be secreted into the extracellular environment or anchored in the membranes of specialized B cell
B cell
B cells are lymphocytes that play a large role in the humoral immune response . The principal functions of B cells are to make antibodies against antigens, perform the role of antigen-presenting cells and eventually develop into memory B cells after activation by antigen interaction...

s known as plasma cell
Plasma cell
Plasma cells, also called plasma B cells, plasmocytes, and effector B cells, are white blood cells which produce large volumes of antibodies. They are transported by the blood plasma and the lymphatic system...

s. Whereas enzymes are limited in their binding affinity for their substrates by the necessity of conducting their reaction, antibodies have no such constraints. An antibody's binding affinity to its target is extraordinarily high.

Many ligand transport proteins bind particular small biomolecules
Small molecule
In the fields of pharmacology and biochemistry, a small molecule is a low molecular weight organic compound which is by definition not a polymer...

 and transport them to other locations in the body of a multicellular organism. These proteins must have a high binding affinity when their ligand
Ligand
In coordination chemistry, a ligand is an ion or molecule that binds to a central metal atom to form a coordination complex. The bonding between metal and ligand generally involves formal donation of one or more of the ligand's electron pairs. The nature of metal-ligand bonding can range from...

 is present in high concentrations, but must also release the ligand when it is present at low concentrations in the target tissues. The canonical example of a ligand-binding protein is haemoglobin, which transports oxygen
Oxygen
Oxygen is the element with atomic number 8 and represented by the symbol O. Its name derives from the Greek roots ὀξύς and -γενής , because at the time of naming, it was mistakenly thought that all acids required oxygen in their composition...

 from the lung
Lung
The lung is the essential respiration organ in many air-breathing animals, including most tetrapods, a few fish and a few snails. In mammals and the more complex life forms, the two lungs are located near the backbone on either side of the heart...

s to other organs and tissues in all vertebrate
Vertebrate
Vertebrates are animals that are members of the subphylum Vertebrata . Vertebrates are the largest group of chordates, with currently about 58,000 species described. Vertebrates include the jawless fishes, bony fishes, sharks and rays, amphibians, reptiles, mammals, and birds...

s and has close homolog
Homology (biology)
Homology forms the basis of organization for comparative biology. In 1843, Richard Owen defined homology as "the same organ in different animals under every variety of form and function". Organs as different as a bat's wing, a seal's flipper, a cat's paw and a human hand have a common underlying...

s in every biological kingdom
Kingdom (biology)
In biology, kingdom is a taxonomic rank, which is either the highest rank or in the more recent three-domain system, the rank below domain. Kingdoms are divided into smaller groups called phyla or divisions in botany...

. Lectins are sugar-binding proteins which are highly specific for their sugar moieties. Lectins typically play a role in biological recognition
Molecular recognition
The term molecular recognition refers to the specific interaction between two or more molecules through noncovalent bonding such as hydrogen bonding, metal coordination, hydrophobic forces, van der Waals forces, π-π interactions, electrostatic and/or electromagnetic effects...

 phenomena involving cells and proteins. Receptor
Receptor (biochemistry)
In biochemistry, a receptor is a molecule found on the surface of a cell, which receives specific chemical signals from neighbouring cells or the wider environment within an organism...

s and hormone
Hormone
A hormone is a chemical released by a cell or a gland in one part of the body that sends out messages that affect cells in other parts of the organism. Only a small amount of hormone is required to alter cell metabolism. In essence, it is a chemical messenger that transports a signal from one...

s are highly specific binding proteins.

Transmembrane protein
Transmembrane protein
A transmembrane protein is a protein that goes from one side of a membrane through to the other side of the membrane. Many TPs function as gateways or "loading docks" to deny or permit the transport of specific substances across the biological membrane, to get into the cell, or out of the cell as...

s can also serve as ligand transport proteins that alter the permeability
Semipermeable membrane
A semipermeable membrane, also termed a selectively permeable membrane, a partially permeable membrane or a differentially permeable membrane, is a membrane that will allow certain molecules or ions to pass through it by diffusion and occasionally specialized "facilitated diffusion".The rate of...

 of the cell membrane to small molecule
Small molecule
In the fields of pharmacology and biochemistry, a small molecule is a low molecular weight organic compound which is by definition not a polymer...

s and ions. The membrane alone has a hydrophobic core through which polar
Chemical polarity
In chemistry, polarity refers to a separation of electric charge leading to a molecule or its chemical groups having an electric dipole or multipole moment. Polar molecules interact through dipole–dipole intermolecular forces and hydrogen bonds. Molecular polarity is dependent on the difference in...

 or charged molecules cannot diffuse
Diffusion
Molecular diffusion, often called simply diffusion, is the thermal motion of all particles at temperatures above absolute zero. The rate of this movement is a function of temperature, viscosity of the fluid and the size of the particles...

. Membrane proteins contain internal channels that allow such molecules to enter and exit the cell. Many ion channel
Ion channel
Ion channels are pore-forming proteins that help establish and control the small voltage gradient across the plasma membrane of cells by allowing the flow of ions down their electrochemical gradient. They are present in the membranes that surround all biological cells...

 proteins are specialized to select for only a particular ion; for example, potassium
Potassium
Potassium is the chemical element with the symbol K and atomic number 19. Elemental potassium is a soft silvery-white alkali metal that oxidizes rapidly in air and is very reactive with water, generating sufficient heat to ignite the hydrogen emitted in the reaction.Potassium and sodium are...

 and sodium
Sodium
Sodium is a chemical element with the symbol Na and atomic number 11. It is a soft, silvery-white, highly reactive metal and is a member of the alkali metals; its only stable isotope is 23Na. It is an abundant element that exists in numerous minerals, most commonly as sodium chloride...

 channels often discriminate for only one of the two ions.

Structural proteins


Structural proteins confer stiffness and rigidity to otherwise-fluid biological components. Most structural proteins are fibrous protein
Fibrous protein
Scleroproteins, or fibrous proteins, constitute one of the three main classes of proteins, alongside globular proteins and conjugated proteins.Keratin, collagen, elastin, and fibroin are all scleroproteins...

s; for example, actin
Actin
Actin is a globular, roughly 42-kDa moonlighting protein found in all eukaryotic cells where it may be present at concentrations of over 100 μM. It is also one of the most highly-conserved proteins, differing by no more than 20% in species as diverse as algae and humans...

 and tubulin
Tubulin
Tubulin is one of several members of a small family of globular proteins. The most common members of the tubulin family are α-tubulin and β-tubulin, the proteins that make up microtubules. Each has a molecular weight of approximately 55 kiloDaltons. Microtubules are assembled from dimers of α- and...

 are globular and soluble as monomers, but polymer
Polymer
A polymer is a large molecule composed of repeating structural units. These subunits are typically connected by covalent chemical bonds...

ize to form long, stiff fibers that make up the cytoskeleton
Cytoskeleton
The cytoskeleton is a cellular "scaffolding" or "skeleton" contained within a cell's cytoplasm and is made out of protein. The cytoskeleton is present in all cells; it was once thought to be unique to eukaryotes, but recent research has identified the prokaryotic cytoskeleton...

, which allows the cell to maintain its shape and size. Collagen
Collagen
Collagen is a group of naturally occurring proteins found in animals, especially in the flesh and connective tissues of mammals. It is the main component of connective tissue, and is the most abundant protein in mammals, making up about 25% to 35% of the whole-body protein content...

 and elastin
Elastin
Elastin is a protein in connective tissue that is elastic and allows many tissues in the body to resume their shape after stretching or contracting. Elastin helps skin to return to its original position when it is poked or pinched. Elastin is also an important load-bearing tissue in the bodies of...

 are critical components of connective tissue
Connective tissue
"Connective tissue" is a fibrous tissue. It is one of the four traditional classes of tissues . Connective Tissue is found throughout the body.In fact the whole framework of the skeleton and the different specialized connective tissues from the crown of the head to the toes determine the form of...

 such as cartilage
Cartilage
Cartilage is a flexible connective tissue found in many areas in the bodies of humans and other animals, including the joints between bones, the rib cage, the ear, the nose, the elbow, the knee, the ankle, the bronchial tubes and the intervertebral discs...

, and keratin
Keratin
Keratin refers to a family of fibrous structural proteins. Keratin is the key of structural material making up the outer layer of human skin. It is also the key structural component of hair and nails...

 is found in hard or filamentous structures such as hair
Hair
Hair is a filamentous biomaterial, that grows from follicles found in the dermis. Found exclusively in mammals, hair is one of the defining characteristics of the mammalian class....

, nails
Nail (anatomy)
A nail is a horn-like envelope covering the dorsal aspect of the terminal phalanges of fingers and toes in humans, most non-human primates, and a few other mammals. Nails are similar to claws, which are found on numerous other animals....

, feather
Feather
Feathers are one of the epidermal growths that form the distinctive outer covering, or plumage, on birds and some non-avian theropod dinosaurs. They are considered the most complex integumentary structures found in vertebrates, and indeed a premier example of a complex evolutionary novelty. They...

s, hooves
Hoof
A hoof , plural hooves or hoofs , is the tip of a toe of an ungulate mammal, strengthened by a thick horny covering. The hoof consists of a hard or rubbery sole, and a hard wall formed by a thick nail rolled around the tip of the toe. The weight of the animal is normally borne by both the sole...

, and some animal shells.

Other proteins that serve structural functions are motor proteins such as myosin
Myosin
Myosins comprise a family of ATP-dependent motor proteins and are best known for their role in muscle contraction and their involvement in a wide range of other eukaryotic motility processes. They are responsible for actin-based motility. The term was originally used to describe a group of similar...

, kinesin
Kinesin
A kinesin is a protein belonging to a class of motor proteins found in eukaryotic cells. Kinesins move along microtubule filaments, and are powered by the hydrolysis of ATP . The active movement of kinesins supports several cellular functions including mitosis, meiosis and transport of cellular...

, and dynein
Dynein
Dynein is a motor protein in cells which converts the chemical energy contained in ATP into the mechanical energy of movement. Dynein transports various cellular cargo by "walking" along cytoskeletal microtubules towards the minus-end of the microtubule, which is usually oriented towards the cell...

, which are capable of generating mechanical forces. These proteins are crucial for cellular motility
Motility
Motility is a biological term which refers to the ability to move spontaneously and actively, consuming energy in the process. Most animals are motile but the term applies to single-celled and simple multicellular organisms, as well as to some mechanisms of fluid flow in multicellular organs, in...

 of single celled organisms and the sperm
Spermatozoon
A spermatozoon is a motile sperm cell, or moving form of the haploid cell that is the male gamete. A spermatozoon joins an ovum to form a zygote...

 of many multicellular organisms which reproduce sexually
Sexual reproduction
Sexual reproduction is the creation of a new organism by combining the genetic material of two organisms. There are two main processes during sexual reproduction; they are: meiosis, involving the halving of the number of chromosomes; and fertilization, involving the fusion of two gametes and the...

. They also generate the forces exerted by contracting muscle
Muscle
Muscle is a contractile tissue of animals and is derived from the mesodermal layer of embryonic germ cells. Muscle cells contain contractile filaments that move past each other and change the size of the cell. They are classified as skeletal, cardiac, or smooth muscles. Their function is to...

s.

Methods of study


As some of the most commonly studied biological molecules, the activities and structures of proteins are examined both in vitro
In vitro
In vitro refers to studies in experimental biology that are conducted using components of an organism that have been isolated from their usual biological context in order to permit a more detailed or more convenient analysis than can be done with whole organisms. Colloquially, these experiments...

and in vivo
In vivo
In vivo is experimentation using a whole, living organism as opposed to a partial or dead organism, or an in vitro controlled environment. Animal testing and clinical trials are two forms of in vivo research...

. In vitro studies of purified proteins in controlled environments are useful for learning how a protein carries out its function: for example, enzyme kinetics
Enzyme kinetics
Enzyme kinetics is the study of the chemical reactions that are catalysed by enzymes. In enzyme kinetics, the reaction rate is measured and the effects of varying the conditions of the reaction investigated...

 studies explore the chemical mechanism
Reaction mechanism
In chemistry, a reaction mechanism is the step by step sequence of elementary reactions by which overall chemical change occurs.Although only the net chemical change is directly observable for most chemical reactions, experiments can often be designed that suggest the possible sequence of steps in...

 of an enzyme's catalytic activity and its relative affinity for various possible substrate molecules. By contrast, in vivo experiments on proteins' activities within cells or even within whole organisms can provide complementary information about where a protein functions and how it is regulated.

Protein purification



In order to perform in vitro
In vitro
In vitro refers to studies in experimental biology that are conducted using components of an organism that have been isolated from their usual biological context in order to permit a more detailed or more convenient analysis than can be done with whole organisms. Colloquially, these experiments...

analysis, a protein must be purified away from other cellular components. This process usually begins with cell lysis
Cytolysis
Cytolysis, or osmotic lysis, occurs when a cell bursts due to an osmotic imbalance that has caused excess water to move into the cell. It occurs in a hypotonic environment, where water diffuses into the cell and causes its volume to increase. If the volume of water exceeds the cell membrane's...

, in which a cell's membrane is disrupted and its internal contents released into a solution known as a crude lysate
Crude lysate
A crude lysate is the solution produced when cells are destroyed by disrupting their cell membranes, often with detergent or other chaotropic agent, in a process known as cytolysis. This releases the contents within the cell...

. The resulting mixture can be purified using ultracentrifugation, which fractionates the various cellular components into fractions containing soluble proteins; membrane lipid
Lipid
Lipids constitute a broad group of naturally occurring molecules that include fats, waxes, sterols, fat-soluble vitamins , monoglycerides, diglycerides, triglycerides, phospholipids, and others...

s and proteins; cellular organelle
Organelle
In cell biology, an organelle is a specialized subunit within a cell that has a specific function, and is usually separately enclosed within its own lipid bilayer....

s, and nucleic acid
Nucleic acid
Nucleic acids are biological molecules essential for life, and include DNA and RNA . Together with proteins, nucleic acids make up the most important macromolecules; each is found in abundance in all living things, where they function in encoding, transmitting and expressing genetic information...

s. Precipitation
Precipitation (chemistry)
Precipitation is the formation of a solid in a solution or inside anothersolid during a chemical reaction or by diffusion in a solid. When the reaction occurs in a liquid, the solid formed is called the precipitate, or when compacted by a centrifuge, a pellet. The liquid remaining above the solid...

 by a method known as salting out
Salting out
Salting out is a method of separating proteins based on the principle that proteins are less soluble at high salt concentrations. The salt concentration needed for the protein to precipitate out of the solution differs from protein to protein...

 can concentrate the proteins from this lysate. Various types of chromatography
Chromatography
Chromatography is the collective term for a set of laboratory techniques for the separation of mixtures....

 are then used to isolate the protein or proteins of interest based on properties such as molecular weight, net charge and binding affinity. The level of purification can be monitored using various types of gel electrophoresis
Gel electrophoresis
Gel electrophoresis is a method used in clinical chemistry to separate proteins by charge and or size and in biochemistry and molecular biology to separate a mixed population of DNA and RNA fragments by length, to estimate the size of DNA and RNA fragments or to separate proteins by charge...

 if the desired protein's molecular weight and isoelectric point
Isoelectric point
The isoelectric point , sometimes abbreviated to IEP, is the pH at which a particular molecule or surface carries no net electrical charge....

 are known, by spectroscopy
Spectroscopy
Spectroscopy is the study of the interaction between matter and radiated energy. Historically, spectroscopy originated through the study of visible light dispersed according to its wavelength, e.g., by a prism. Later the concept was expanded greatly to comprise any interaction with radiative...

 if the protein has distinguishable spectroscopic features, or by enzyme assay
Enzyme assay
Enzyme assays are laboratory methods for measuring enzymatic activity. They are vital for the study of enzyme kinetics and enzyme inhibition.-Enzyme units:...

s if the protein has enzymatic activity. Additionally, proteins can be isolated according their charge using electrofocusing.

For natural proteins, a series of purification steps may be necessary to obtain protein sufficiently pure for laboratory applications. To simplify this process, genetic engineering
Genetic engineering
Genetic engineering, also called genetic modification, is the direct human manipulation of an organism's genome using modern DNA technology. It involves the introduction of foreign DNA or synthetic genes into the organism of interest...

 is often used to add chemical features to proteins that make them easier to purify without affecting their structure or activity. Here, a "tag" consisting of a specific amino acid sequence, often a series of histidine
Histidine
Histidine Histidine, an essential amino acid, has a positively charged imidazole functional group. It is one of the 22 proteinogenic amino acids. Its codons are CAU and CAC. Histidine was first isolated by German physician Albrecht Kossel in 1896. Histidine is an essential amino acid in humans...

 residues (a "His-tag"), is attached to one terminus of the protein. As a result, when the lysate is passed over a chromatography column containing nickel
Nickel
Nickel is a chemical element with the chemical symbol Ni and atomic number 28. It is a silvery-white lustrous metal with a slight golden tinge. Nickel belongs to the transition metals and is hard and ductile...

, the histidine residues ligate the nickel and attach to the column while the untagged components of the lysate pass unimpeded. A number of different tags have been developed to help researchers purify specific proteins from complex mixtures.

Cellular localization



The study of proteins in vivo is often concerned with the synthesis and localization of the protein within the cell. Although many intracellular proteins are synthesized in the cytoplasm
Cytoplasm
The cytoplasm is a small gel-like substance residing between the cell membrane holding all the cell's internal sub-structures , except for the nucleus. All the contents of the cells of prokaryote organisms are contained within the cytoplasm...

 and membrane-bound or secreted proteins in the endoplasmic reticulum
Endoplasmic reticulum
The endoplasmic reticulum is an organelle of cells in eukaryotic organisms that forms an interconnected network of tubules, vesicles, and cisternae...

, the specifics of how proteins are targeted
Protein targeting
Protein targeting or protein sorting is the mechanism by which a cell transports proteins to the appropriate positions in the cell or outside of it. Sorting targets can be the inner space of an organelle, any of several interior membranes, the cell's outer membrane, or its exterior via secretion...

 to specific organelles or cellular structures is often unclear. A useful technique for assessing cellular localization uses genetic engineering to express in a cell a fusion protein
Fusion protein
Fusion proteins or chimeric proteins are proteins created through the joining of two or more genes which originally coded for separate proteins. Translation of this fusion gene results in a single polypeptide with functional properties derived from each of the original proteins...

 or chimera consisting of the natural protein of interest linked to a "reporter
Reporter gene
In molecular biology, a reporter gene is a gene that researchers attach to a regulatory sequence of another gene of interest in cell culture, animals or plants. Certain genes are chosen as reporters because the characteristics they confer on organisms expressing them are easily identified and...

" such as green fluorescent protein
Green fluorescent protein
The green fluorescent protein is a protein composed of 238 amino acid residues that exhibits bright green fluorescence when exposed to blue light. Although many other marine organisms have similar green fluorescent proteins, GFP traditionally refers to the protein first isolated from the...

 (GFP). The fused protein's position within the cell can be cleanly and efficiently visualized using microscopy
Microscopy
Microscopy is the technical field of using microscopes to view samples and objects that cannot be seen with the unaided eye...

, as shown in the figure opposite.

Other methods for elucidating the cellular location of proteins requires the use of known compartmental markers for regions such as the ER, the Golgi, lysosomes/vacuoles, mitochondria, chloroplasts, plasma membrane, etc. With the use of fluorescently tagged versions of these markers or of antibodies to known markers, it becomes much simpler to identify the localization of a protein of interest. For example, indirect immunofluorescence will allow for fluorescence colocalization and demonstration of location. Fluorescent dyes are used to label cellular compartments for a similar purpose.

Other possibilities exist, as well. For example, immunohistochemistry
Immunohistochemistry
Immunohistochemistry or IHC refers to the process of detecting antigens in cells of a tissue section by exploiting the principle of antibodies binding specifically to antigens in biological tissues. IHC takes its name from the roots "immuno," in reference to antibodies used in the procedure, and...

 usually utilizes an antibody to one or more proteins of interest that are conjugated to enzymes yielding either luminescent or chromogenic signals that can be compared between samples, allowing for localization information. Another applicable technique is cofractionation in sucrose (or other material) gradients using isopycnic centrifugation
Isopycnic centrifugation
Isopycnic centrifugation, also known as density gradient centrifugation or equilibrium sedimentation is a technique used to separate molecules on the basis of buoyant density...

. While this technique does not prove colocalization of a compartment of known density and the protein of interest, it does increase the likelihood, and is more amenable to large-scale studies.

Finally, the gold-standard method of cellular localization is immunoelectron microscopy. This technique also uses an antibody to the protein of interest, along with classical electron microscopy techniques. The sample is prepared for normal electron microscopic examination, and then treated with an antibody to the protein of interest that is conjugated to an extremely electro-dense material, usually gold. This allows for the localization of both ultrastructural details as well as the protein of interest.

Through another genetic engineering application known as site-directed mutagenesis
Site-directed mutagenesis
Site-directed mutagenesis, also called site-specific mutagenesis or oligonucleotide-directed mutagenesis, is a molecular biology technique in which a mutation is created at a defined site in a DNA molecule. In general, this form of mutagenesis requires that the wild type gene sequence be known...

, researchers can alter the protein sequence and hence its structure, cellular localization, and susceptibility to regulation. This technique even allows the incorporation of unnatural amino acids into proteins, using modified tRNAs, and may allow the rational design of new proteins with novel properties.

Proteomics and bioinformatics



The total complement of proteins present at a time in a cell or cell type is known as its proteome
Proteome
The proteome is the entire set of proteins expressed by a genome, cell, tissue or organism. More specifically, it is the set of expressed proteins in a given type of cells or an organism at a given time under defined conditions. The term is a portmanteau of proteins and genome.The term has been...

, and the study of such large-scale data sets defines the field of proteomics
Proteomics
Proteomics is the large-scale study of proteins, particularly their structures and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells. The term "proteomics" was first coined in 1997 to make an analogy with...

, named by analogy to the related field of genomics
Genomics
Genomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...

. Key experimental techniques in proteomics include 2D electrophoresis
Two-dimensional gel electrophoresis
Two-dimensional gel electrophoresis, abbreviated as 2-DE or 2-D electrophoresis, is a form of gel electrophoresis commonly used to analyze proteins...

, which allows the separation of a large number of proteins, mass spectrometry
Mass spectrometry
Mass spectrometry is an analytical technique that measures the mass-to-charge ratio of charged particles.It is used for determining masses of particles, for determining the elemental composition of a sample or molecule, and for elucidating the chemical structures of molecules, such as peptides and...

, which allows rapid high-throughput identification of proteins and sequencing of peptides (most often after in-gel digestion
In-gel digestion
The in-gel digestion is part of the sample preparation for the mass spectrometric identification of proteins in course of proteomic analysis. The method was introduced 1992 by Rosenfeld...

), protein microarray
Protein microarray
A protein microarray, sometimes referred to as a protein binding microarray,provides a multiplex approach to identify protein–protein interactions, to identify the substrates of protein kinases, to identify transcription factor protein-activation, or to identify the targets of biologically active...

s, which allow the detection of the relative levels of a large number of proteins present in a cell, and two-hybrid screening
Two-hybrid screening
Two-hybrid screening is a molecular biology technique used to discover protein–protein interactions and protein–DNA interactions by testing for physical interactions between two proteins or a single protein and a DNA molecule, respectively.The premise behind the test is the activation of...

, which allows the systematic exploration of protein–protein interactions. The total complement of biologically possible such interactions is known as the interactome
Interactome
Interactome is defined as the whole set of molecular interactions in cells. It is usually displayed as a directed graph. Molecular interactions can occur between molecules belonging to different biochemical families and also within a given family...

. A systematic attempt to determine the structures of proteins representing every possible fold is known as structural genomics
Structural genomics
Structural genomics seeks to describe the 3-dimensional structure of every protein encoded by a given genome. This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling approaches...

.

The large amount of genomic and proteomic data available for a variety of organisms, including the human genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...

, allows researchers to efficiently identify homologous
Homology (biology)
Homology forms the basis of organization for comparative biology. In 1843, Richard Owen defined homology as "the same organ in different animals under every variety of form and function". Organs as different as a bat's wing, a seal's flipper, a cat's paw and a human hand have a common underlying...

 proteins in distantly related organisms by sequence alignment
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...

. Sequence profiling tool
Sequence profiling tool
A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to...

s can perform more specific sequence manipulations such as restriction enzyme
Restriction enzyme
A Restriction Enzyme is an enzyme that cuts double-stranded DNA at specific recognition nucleotide sequences known as restriction sites. Such enzymes, found in bacteria and archaea, are thought to have evolved to provide a defense mechanism against invading viruses...

 maps, open reading frame
Open reading frame
In molecular genetics, an open reading frame is a DNA sequence that does not contain a stop codon in a given reading frame.Normally, inserts which interrupt the reading frame of a subsequent region after the start codon cause frameshift mutation of the sequence and dislocate the sequences for stop...

 analyses for nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

 sequences, and secondary structure
Secondary structure
In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...

 prediction. From this data phylogenetic tree
Phylogenetic tree
A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics...

s can be constructed and evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

ary hypotheses developed using special software like ClustalW regarding the ancestry of modern organisms and the genes they express. The field of bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

 seeks to assemble, annotate, and analyze genomic and proteomic data, applying computational
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

 techniques to biological problems such as gene finding and cladistics
Cladistics
Cladistics is a method of classifying species of organisms into groups called clades, which consist of an ancestor organism and all its descendants . For example, birds, dinosaurs, crocodiles, and all descendants of their most recent common ancestor form a clade...

.

Structure prediction and simulation



Complementary to the field of structural genomics, protein structure prediction seeks to develop efficient ways to provide plausible models for proteins whose structures have not yet been determined experimentally. The most successful type of structure prediction, known as homology modeling
Homology modeling
Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous protein...

, relies on the existence of a "template" structure with sequence similarity to the protein being modeled; structural genomics' goal is to provide sufficient representation in solved structures to model most of those that remain. Although producing accurate models remains a challenge when only distantly related template structures are available, it has been suggested that sequence alignment
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...

 is the bottleneck in this process, as quite accurate models can be produced if a "perfect" sequence alignment is known. Many structure prediction methods have served to inform the emerging field of protein engineering
Protein engineering
Protein engineering is the process of developing useful or valuable proteins. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles....

, in which novel protein folds have already been designed. A more complex computational problem is the prediction of intermolecular interactions, such as in molecular docking and protein–protein interaction prediction.

The processes of protein folding
Protein folding
Protein folding is the process by which a protein structure assumes its functional shape or conformation. It is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil....

 and binding can be simulated using such technique as molecular mechanics
Molecular mechanics
Molecular mechanics uses Newtonian mechanics to model molecular systems. The potential energy of all systems in molecular mechanics is calculated using force fields...

, in particular, molecular dynamics
Molecular dynamics
Molecular dynamics is a computer simulation of physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a period of time, giving a view of the motion of the atoms...

 and Monte Carlo
Monte Carlo method
Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used in computer simulations of physical and mathematical systems...

, which increasingly take advantage of parallel and distributed computing
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

 (Folding@Home
Folding@home
Folding@home is a distributed computing project designed to use spare processing power on personal computers to perform simulations of disease-relevant protein folding and other molecular dynamics, and to improve on the methods of doing so...

 project; molecular modeling on GPU
Molecular modeling on GPU
Molecular modeling on GPU is the technique of using a graphics processing unit for molecular simulations.In 2007, NVIDIA introduced video cards that could be used not only to show graphics but also for scientific calculations. These cards include many arithmetic units working in parallel...

). The folding of small alpha-helical protein domains such as the villin
Villin
Villin is a 92.5 kDa tissue-specific actin-binding protein associated with the actin core bundle of the brush border. Villin contains multiple gelsolin-like domains capped by a small "headpiece" at the C-terminus consisting of a fast and independently-folding three-helix bundle that is stabilized...

 headpiece and the HIV
HIV
Human immunodeficiency virus is a lentivirus that causes acquired immunodeficiency syndrome , a condition in humans in which progressive failure of the immune system allows life-threatening opportunistic infections and cancers to thrive...

 accessory protein have been successfully simulated in silico, and hybrid methods that combine standard molecular dynamics with quantum mechanics
Quantum mechanics
Quantum mechanics, also known as quantum physics or quantum theory, is a branch of physics providing a mathematical description of much of the dual particle-like and wave-like behavior and interactions of energy and matter. It departs from classical mechanics primarily at the atomic and subatomic...

 calculations have allowed exploration of the electronic states of rhodopsin
Rhodopsin
Rhodopsin, also known as visual purple, is a biological pigment of the retina that is responsible for both the formation of the photoreceptor cells and the first events in the perception of light. Rhodopsins belong to the G-protein coupled receptor family and are extremely sensitive to light,...

s.

Nutrition



Most microorganism
Microorganism
A microorganism or microbe is a microscopic organism that comprises either a single cell , cell clusters, or no cell at all...

s and plants can biosynthesize all 20 standard amino acids, while animals (including humans) must obtain some of the amino acids from the diet
Diet (nutrition)
In nutrition, diet is the sum of food consumed by a person or other organism. Dietary habits are the habitual decisions an individual or culture makes when choosing what foods to eat. With the word diet, it is often implied the use of specific intake of nutrition for health or weight-management...

. The amino acids that an organism cannot synthesize on its own are referred to as essential amino acids. Key enzymes that synthesize certain amino acids are not present in animals — such as aspartokinase
Aspartokinase
Aspartokinase is an enzyme that catalyzes the phosphorylation of the amino acid aspartate. This reaction is the first step in the biosynthesis of three essential amino acids: methionine, lysine, and threonine, known as the "aspartate family"...

, which catalyzes the first step in the synthesis of lysine
Lysine
Lysine is an α-amino acid with the chemical formula HO2CCH4NH2. It is an essential amino acid, which means that the human body cannot synthesize it. Its codons are AAA and AAG....

, methionine
Methionine
Methionine is an α-amino acid with the chemical formula HO2CCHCH2CH2SCH3. This essential amino acid is classified as nonpolar. This amino-acid is coded by the codon AUG, also known as the initiation codon, since it indicates mRNA's coding region where translation into protein...

, and threonine
Threonine
Threonine is an α-amino acid with the chemical formula HO2CCHCHCH3. Its codons are ACU, ACA, ACC, and ACG. This essential amino acid is classified as polar...

 from aspartate. If amino acids are present in the environment, microorganisms can conserve energy by taking up the amino acids from their surroundings and downregulating
Downregulation and upregulation
Downregulation is the process by which a cell decreases the quantity of a cellular component, such as RNA or protein, in response to an external variable...

 their biosynthetic pathways.

In animals, amino acids are obtained through the consumption of foods containing protein. Ingested proteins are then broken down into amino acids through digestion
Digestion
Digestion is the mechanical and chemical breakdown of food into smaller components that are more easily absorbed into a blood stream, for instance. Digestion is a form of catabolism: a breakdown of large food molecules to smaller ones....

, which typically involves denaturation
Denaturation (biochemistry)
Denaturation is a process in which proteins or nucleic acids lose their tertiary structure and secondary structure by application of some external stress or compound, such as a strong acid or base, a concentrated inorganic salt, an organic solvent , or heat...

 of the protein through exposure to acid
Acid
An acid is a substance which reacts with a base. Commonly, acids can be identified as tasting sour, reacting with metals such as calcium, and bases like sodium carbonate. Aqueous acids have a pH of less than 7, where an acid of lower pH is typically stronger, and turn blue litmus paper red...

 and hydrolysis
Hydrolysis
Hydrolysis is a chemical reaction during which molecules of water are split into hydrogen cations and hydroxide anions in the process of a chemical mechanism. It is the type of reaction that is used to break down certain polymers, especially those made by condensation polymerization...

 by enzymes called protease
Protease
A protease is any enzyme that conducts proteolysis, that is, begins protein catabolism by hydrolysis of the peptide bonds that link amino acids together in the polypeptide chain forming the protein....

s. Some ingested amino acids are used for protein biosynthesis, while others are converted to glucose
Glucose
Glucose is a simple sugar and an important carbohydrate in biology. Cells use it as the primary source of energy and a metabolic intermediate...

 through gluconeogenesis
Gluconeogenesis
Gluconeogenesis is a metabolic pathway that results in the generation of glucose from non-carbohydrate carbon substrates such as lactate, glycerol, and glucogenic amino acids....

, or fed into the citric acid cycle
Citric acid cycle
The citric acid cycle — also known as the tricarboxylic acid cycle , the Krebs cycle, or the Szent-Györgyi-Krebs cycle — is a series of chemical reactions which is used by all aerobic living organisms to generate energy through the oxidization of acetate derived from carbohydrates, fats and...

. This use of protein as a fuel is particularly important under starvation
Starvation
Starvation is a severe deficiency in caloric energy, nutrient and vitamin intake. It is the most extreme form of malnutrition. In humans, prolonged starvation can cause permanent organ damage and eventually, death...

 conditions as it allows the body's own proteins to be used to support life, particularly those found in muscle
Muscle
Muscle is a contractile tissue of animals and is derived from the mesodermal layer of embryonic germ cells. Muscle cells contain contractile filaments that move past each other and change the size of the cell. They are classified as skeletal, cardiac, or smooth muscles. Their function is to...

. Amino acids are also an important dietary source of nitrogen
Nitrogen
Nitrogen is a chemical element that has the symbol N, atomic number of 7 and atomic mass 14.00674 u. Elemental nitrogen is a colorless, odorless, tasteless, and mostly inert diatomic gas at standard conditions, constituting 78.08% by volume of Earth's atmosphere...

.

History and etymology



Proteins were recognized as a distinct class of biological molecules in the eighteenth century by Antoine Fourcroy
Antoine François, comte de Fourcroy
Antoine François, comte de Fourcroy was a French chemist and a contemporary of Antoine Lavoisier. Fourcroy collaborated with Lavoisier, Guyton de Morveau, and Claude Berthollet on the Méthode de nomenclature chimique, a work that helped standardize chemical nomenclature.-Life and work:Fourcroy...

 and others, distinguished by the molecules' ability to coagulate or flocculate
Flocculation
Flocculation, in the field of chemistry, is a process wherein colloids come out of suspension in the form of floc or flakes by the addition of a clarifying agent. The action differs from precipitation in that, prior to flocculation, colloids are merely suspended in a liquid and not actually...

 under treatments with heat or acid. Noted examples at the time included albumin from egg white
Egg white
Egg white is the common name for the clear liquid contained within an egg. In chickens it is formed from the layers of secretions of the anterior section of the hen's oviduct during the passage of the egg. It forms around either fertilized or unfertilized egg yolks...

s, blood serum albumin
Serum albumin
Serum albumin, often referred to simply as albumin is a protein that in humans is encoded by the ALB gene.Serum albumin is the most abundant plasma protein in mammals. Albumin is essential for maintaining the osmotic pressure needed for proper distribution of body fluids between intravascular...

, fibrin
Fibrin
Fibrin is a fibrous, non-globular protein involved in the clotting of blood. It is a fibrillar protein that is polymerised to form a "mesh" that forms a hemostatic plug or clot over a wound site....

, and wheat gluten
Gluten
Gluten is a protein composite found in foods processed from wheat and related grain species, including barley and rye...

. Dutch chemist Gerardus Johannes Mulder
Gerardus Johannes Mulder
Gerardus Johannes Mulder was a Dutch organic and analytical chemist-Biography:Mulder was born in Utrecht, and earned a medical degree from Utrecht University....

 carried out elemental analysis
Elemental analysis
Percent Composition is a process where a sample of some material is analyzed for its elemental and sometimes isotopic composition. Elemental analysis can be qualitative , and it can be quantitative...

 of common proteins and found that nearly all proteins had the same empirical formula
Empirical formula
In chemistry, the empirical formula of a chemical compound is the simplest positive integer ratio of atoms of each element present in a compound. An empirical formula makes no reference to isomerism, structure, or absolute number of atoms. The empirical formula is used as standard for most ionic...

, C400H620N100O120P1S1. He came to the erroneous conclusion that they might be composed of a single type of (very large) molecule. The term "protein" to describe these molecules was proposed in 1838 by Mulder's associate Jöns Jacob Berzelius; protein is derived from the Greek
Greek language
Greek is an independent branch of the Indo-European family of languages. Native to the southern Balkans, it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. Its writing system has been the Greek alphabet for the majority of its history;...

 word πρωτεῖος (proteios), meaning "primary", "in the lead", or "standing in front". Mulder went on to identify the products of protein degradation such as the amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

 leucine
Leucine
Leucine is a branched-chain α-amino acid with the chemical formula HO2CCHCH2CH2. Leucine is classified as a hydrophobic amino acid due to its aliphatic isobutyl side chain. It is encoded by six codons and is a major component of the subunits in ferritin, astacin and other 'buffer' proteins...

 for which he found a (nearly correct) molecular weight of 131 Da
Atomic mass unit
The unified atomic mass unit or dalton is a unit that is used for indicating mass on an atomic or molecular scale. It is defined as one twelfth of the rest mass of an unbound neutral atom of carbon-12 in its nuclear and electronic ground state, and has a value of...

.

The difficulty in purifying proteins in large quantities made them very difficult for early protein biochemists to study. Hence, early studies focused on proteins that could be purified in large quantities, e.g., those of blood
Blood
Blood is a specialized bodily fluid in animals that delivers necessary substances such as nutrients and oxygen to the cells and transports metabolic waste products away from those same cells....

, egg white
Egg white
Egg white is the common name for the clear liquid contained within an egg. In chickens it is formed from the layers of secretions of the anterior section of the hen's oviduct during the passage of the egg. It forms around either fertilized or unfertilized egg yolks...

, various toxin
Toxin
A toxin is a poisonous substance produced within living cells or organisms; man-made substances created by artificial processes are thus excluded...

s, and digestive/metabolic enzymes obtained from slaughterhouse
Slaughterhouse
A slaughterhouse or abattoir is a facility where animals are killed for consumption as food products.Approximately 45-50% of the animal can be turned into edible products...

s. In the 1950s, the Armour Hot Dog Co.
Armour and Company
Armour & Company was an American slaughterhouse and meatpacking company founded in Chicago, Illinois, in 1867 by the Armour brothers, led by Philip Danforth Armour. By 1880, the company was Chicago's most important business and helped make the city and its Union Stock Yards the center of the...

 purified 1 kg of pure bovine pancreatic ribonuclease A
Ribonuclease A
Ribonuclease A is a pancreatic ribonuclease that cleaves single-stranded RNA. Bovine pancreatic RNase A is one of the classic model systems of protein science.-History:...

 and made it freely available to scientists; this gesture helped ribonuclease A become a major target for biochemical study for the following decades.

Linus Pauling
Linus Pauling
Linus Carl Pauling was an American chemist, biochemist, peace activist, author, and educator. He was one of the most influential chemists in history and ranks among the most important scientists of the 20th century...

 is credited with the successful prediction of regular protein secondary structure
Secondary structure
In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...

s based on hydrogen bonding, an idea first put forth by William Astbury
William Astbury
William Thomas Astbury FRS was an English physicist and molecular biologist who made pioneering X-ray diffraction studies of biological molecules. His work on keratin provided the foundation for Linus Pauling's discovery of the alpha helix...

 in 1933. Later work by Walter Kauzmann
Walter Kauzmann
Walter J. Kauzmann was an American chemist and professor emeritus of Princeton University. He was noted for his work in both physical chemistry and biochemistry. His most important contribution was recognizing that the hydrophobic effect plays a key role in determining the three-dimensional...

 on denaturation
Denaturation (biochemistry)
Denaturation is a process in which proteins or nucleic acids lose their tertiary structure and secondary structure by application of some external stress or compound, such as a strong acid or base, a concentrated inorganic salt, an organic solvent , or heat...

, based partly on previous studies by Kaj Linderstrøm-Lang, contributed an understanding of protein folding
Protein folding
Protein folding is the process by which a protein structure assumes its functional shape or conformation. It is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil....

 and structure mediated by hydrophobic interactions. In 1949 Fred Sanger correctly determined the amino acid sequence of insulin
Insulin
Insulin is a hormone central to regulating carbohydrate and fat metabolism in the body. Insulin causes cells in the liver, muscle, and fat tissue to take up glucose from the blood, storing it as glycogen in the liver and muscle....

, thus conclusively demonstrating that proteins consisted of linear polymers of amino acids rather than branched chains, colloid
Colloid
A colloid is a substance microscopically dispersed evenly throughout another substance.A colloidal system consists of two separate phases: a dispersed phase and a continuous phase . A colloidal system may be solid, liquid, or gaseous.Many familiar substances are colloids, as shown in the chart below...

s, or cyclol
Cyclol
The cyclol hypothesis is the first structural model of a folded, globular protein. It was developed by Dorothy Wrinch in the late 1930s, and was based on three assumptions. Firstly, the hypothesis assumes that two peptide groups can be crosslinked by a cyclol reaction ; these crosslinks are...

s. The first atomic-resolution structures of proteins were solved by X-ray crystallography
X-ray crystallography
X-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and causes the beam of light to spread into many specific directions. From the angles and intensities of these diffracted beams, a crystallographer can produce a...

 in the 1960s and by NMR
Protein nuclear magnetic resonance spectroscopy
Nuclear magnetic resonance spectroscopy of proteins is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins. The field was pioneered by Richard R. Ernst and Kurt Wüthrich, among others...

 in the 1980s. , the Protein Data Bank
Protein Data Bank
The Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....

 has over 55,000 atomic-resolution structures of proteins. In more recent times, cryo-electron microscopy
Cryo-electron microscopy
Cryo-electron microscopy , or electron cryomicroscopy, is a form of transmission electron microscopy where the sample is studied at cryogenic temperatures...

 of large macromolecular assemblies and computational protein structure prediction
Protein structure prediction
Protein structure prediction is the prediction of the three-dimensional structure of a protein from its amino acid sequence — that is, the prediction of its secondary, tertiary, and quaternary structure from its primary structure. Structure prediction is fundamentally different from the inverse...

 of small protein domains are two methods approaching atomic resolution.

Footnotes



Databases and projects


Tutorials and educational websites