Protein sequencing
Encyclopedia
Protein sequencing is a technique to determine the amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

 sequence of a protein, as well as which conformation the protein adopts and the extent to which it is complexed with any non-peptide molecules. Discovering the structures and functions of proteins in living organisms is an important tool for understanding cellular processes, and allows drugs that target specific metabolic pathways to be invented more easily.

The two major direct methods of protein sequencing are mass spectrometry
Mass spectrometry
Mass spectrometry is an analytical technique that measures the mass-to-charge ratio of charged particles.It is used for determining masses of particles, for determining the elemental composition of a sample or molecule, and for elucidating the chemical structures of molecules, such as peptides and...

 and the Edman degradation
Edman degradation
Edman degradation, developed by Pehr Edman, is a method of sequencing amino acids in a peptide. In this method, the amino-terminal residue is labeled and cleaved from the peptide without disrupting the peptide bonds between other amino acid residues....

 reaction. It is also possible to generate an amino acid sequence from the DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 or mRNA sequence encoding the protein, if this is known. However, there are a number of other reactions which can be used to gain more limited information about protein sequences and can be used as preliminaries to the aforementioned methods of sequencing or to overcome specific inadequacies within them.

Determining amino acid composition

It is often desirable to know the unordered amino acid composition of a protein prior to attempting to find the ordered sequence, as this knowledge can be used to facilitate the discovery of errors in the sequencing process or to distinguish between ambiguous results. Knowledge of the frequency of certain amino acids may also be used to choose which protease
Protease
A protease is any enzyme that conducts proteolysis, that is, begins protein catabolism by hydrolysis of the peptide bonds that link amino acids together in the polypeptide chain forming the protein....

 to use for digestion of the protein. A generalized method often referred to as amino acid analysis for determining amino acid frequency is as follows:
  1. Hydrolyse a known quantity of protein into its constituent amino acids.
  2. Separate the amino acids in some way.

Hydrolysis

Hydrolysis is done by heating a sample of the protein in 6 Molar hydrochloric acid
Hydrochloric acid
Hydrochloric acid is a solution of hydrogen chloride in water, that is a highly corrosive, strong mineral acid with many industrial uses. It is found naturally in gastric acid....

 to 100-110 degrees Celsius for 24 hours or longer. Proteins with many bulky hydrophobic groups may require longer heating periods. However, these conditions are so vigorous that some amino acids (serine
Serine
Serine is an amino acid with the formula HO2CCHCH2OH. It is one of the proteinogenic amino acids. By virtue of the hydroxyl group, serine is classified as a polar amino acid.-Occurrence and biosynthesis:...

, threonine
Threonine
Threonine is an α-amino acid with the chemical formula HO2CCHCHCH3. Its codons are ACU, ACA, ACC, and ACG. This essential amino acid is classified as polar...

, tyrosinei, tryptophan
Tryptophan
Tryptophan is one of the 20 standard amino acids, as well as an essential amino acid in the human diet. It is encoded in the standard genetic code as the codon UGG...

, glutamine
Glutamine
Glutamine is one of the 20 amino acids encoded by the standard genetic code. It is not recognized as an essential amino acid but may become conditionally essential in certain situations, including intensive athletic training or certain gastrointestinal disorders...

 and cystine
Cystine
Cystine is a dimeric amino acid formed by the oxidation of two cysteine residues that covalently link to make a disulfide bond. This organosulfur compound has the formula 2. It is a white solid, and melts at 247-249 °C...

) are degraded. To circumvent this problem, Biochemistry Online suggests heating separate samples for different times, analysing each resulting solution, and extrapolating back to zero hydrolysis time. Rastall suggests a variety of reagents to prevent or reduce degradation - thiol
Thiol
In organic chemistry, a thiol is an organosulfur compound that contains a carbon-bonded sulfhydryl group...

 reagent
Reagent
A reagent is a "substance or compound that is added to a system in order to bring about a chemical reaction, or added to see if a reaction occurs." Although the terms reactant and reagent are often used interchangeably, a reactant is less specifically a "substance that is consumed in the course of...

s or phenol
Phenol
Phenol, also known as carbolic acid, phenic acid, is an organic compound with the chemical formula C6H5OH. It is a white crystalline solid. The molecule consists of a phenyl , bonded to a hydroxyl group. It is produced on a large scale as a precursor to many materials and useful compounds...

 to protect tryptophan and tyrosine from attack by chlorine, and pre-oxidising cysteine
Cysteine
Cysteine is an α-amino acid with the chemical formula HO2CCHCH2SH. It is a non-essential amino acid, which means that it is biosynthesized in humans. Its codons are UGU and UGC. The side chain on cysteine is thiol, which is polar and thus cysteine is usually classified as a hydrophilic amino acid...

. He also suggests measuring the quantity of ammonia
Ammonia
Ammonia is a compound of nitrogen and hydrogen with the formula . It is a colourless gas with a characteristic pungent odour. Ammonia contributes significantly to the nutritional needs of terrestrial organisms by serving as a precursor to food and fertilizers. Ammonia, either directly or...

 evolved to determine the extent of amide hydrolysis.

Separation

The amino acids can be separated by ion-exchange chromatography or hydrophobic interaction chromatography. An example of the former is given by the NTRC using sulfonated polystyrene as a matrix, adding the amino acids in acid solution and passing a buffer of steadily increasing pH
PH
In chemistry, pH is a measure of the acidity or basicity of an aqueous solution. Pure water is said to be neutral, with a pH close to 7.0 at . Solutions with a pH less than 7 are said to be acidic and solutions with a pH greater than 7 are basic or alkaline...

 through the column. Amino acids will be eluted when the pH reaches their respective isoelectric point
Isoelectric point
The isoelectric point , sometimes abbreviated to IEP, is the pH at which a particular molecule or surface carries no net electrical charge....

s. The latter technique may be employed through the use of reversed phase chromatography. Many commercially available C8 and C18 silica columns
Column chromatography
Column chromatography in chemistry is a method used to purify individual chemical compounds from mixtures of compounds. It is often used for preparative applications on scales from micrograms up to kilograms.The main advantage of column chromatography is the relatively low cost and disposability...

 have demonstrated successful separation of amino acids in solution in less than 40 minutes through the use of an optimised elution gradient.

Quantitative analysis

Once the amino acids have been separated, their respective quantities are determined by adding a reagent that will form a coloured derivative. If the amounts of amino acids are in excess of 10 nmol, ninhydrin
Ninhydrin
Ninhydrin is a chemical used to detect ammonia or primary and secondary amines. When reacting with these free amines, a deep blue or purple color known as Ruhemann's purple is produced...

 can be used for this - it gives a yellow colour when reacted with proline, and a vivid purple with other amino acids. The concentration of amino acid is proportional to the absorbance of the resulting solution. With very small quantities, down to 10 pmol, fluorescamine
Fluorescamine
Fluorescamine is a spiro compound that is not fluorescent itself, but reacts with primary amines to form highly fluorescent products. It hence has been used as a reagent for the detection of amines and peptides. 1-100µg of protein and down 10pg protein can be detected . This method is found to...

 can be used as a marker: this forms a fluorescent derivative on reacting with an amino acid.

N-terminal amino acid analysis

Determining which amino acid forms the N-terminus of a peptide
Peptide
Peptides are short polymers of amino acid monomers linked by peptide bonds. They are distinguished from proteins on the basis of size, typically containing less than 50 monomer units. The shortest peptides are dipeptides, consisting of two amino acids joined by a single peptide bond...

 chain is useful for two reasons: to aid the ordering of individual peptide fragments' sequences into a whole chain, and because the first round of Edman degradation
Edman degradation
Edman degradation, developed by Pehr Edman, is a method of sequencing amino acids in a peptide. In this method, the amino-terminal residue is labeled and cleaved from the peptide without disrupting the peptide bonds between other amino acid residues....

 is often contaminated by impurities and therefore does not give an accurate determination of the N-terminal amino acid. A generalised method for N-terminal amino acid analysis follows:
  1. React the peptide with a reagent which will selectively label the terminal amino acid.
  2. Hydrolyse the protein.
  3. Determine the amino acid by chromatography and comparison with standards.


There are many different reagents which can be used to label terminal amino acids. They all react with amine groups and will therefore also bind to amine groups in the side chains of amino acids such as lysine - for this reason it is necessary to be careful in interpreting chromatograms to ensure that the right spot is chosen. Two of the more common reagents are Sanger's reagent (1-fluoro-2,4-dinitrobenzene) and dansyl derivatives such as dansyl chloride
Dansyl chloride
Dansyl chloride or 5-naphthalene-1-sulfonyl chloride is a reagent that reacts with primary amino groups in both aliphatic and aromatic amines to produce stable blue- or blue-green–fluorescent sulfonamide adducts. Dansyl chloride is widely used to modify amino acids; specifically, protein sequencing...

. Phenylisothiocyanate
Phenylisothiocyanate
Phenyl isothiocyanate is a reagent used in reversed phase HPLC. PITC is less sensitive than o-phthaldehyde and cannot be fully automated...

, the reagent for the Edman degradation, can also be used. The same questions apply here as in the determination of amino acid composition, with the exception that no stain is needed, as the reagents produce coloured derivatives and only qualitative analysis is required, so the amino acid does not have to be eluted from the chromatography column, just compared with a standard. Another consideration to take into account is that, since any amine groups will have reacted with the labelling reagent, ion exchange chromatography cannot be used, and thin layer chromatography or high pressure liquid chromatography should be used instead.

C-terminal amino acid analysis

The number of methods available for C-terminal amino acid analysis is much smaller than the number of available methods of N-terminal analysis. The most common method is to add carboxypeptidase
Carboxypeptidase
A carboxypeptidase is a protease enzyme that hydrolyzes the peptide bond of an amino acid residue at the carboxy-terminal end...

s to a solution of the protein, take samples at regular intervals, and determine the terminal amino acid by analysing a plot of amino acid concentrations against time.

Edman degradation

The Edman degradation
Edman degradation
Edman degradation, developed by Pehr Edman, is a method of sequencing amino acids in a peptide. In this method, the amino-terminal residue is labeled and cleaved from the peptide without disrupting the peptide bonds between other amino acid residues....

 is a very important reaction for protein sequencing, because it allows the ordered amino acid composition of a protein to be discovered. Automated Edman sequencers are now in widespread use, and are able to sequence peptides up to approximately 50 amino acids long. A reaction scheme for sequencing a protein by the Edman degradation follows - some of the steps are elaborated on subsequently.
  1. Break any disulfide bridges in the protein with an oxidising agent like performic acid
    Performic acid
    Performic acid is an organic compound with the formula CH2O3. It is an unstable colorless liquid which can be produced by mixing formic acid with hydrogen peroxide...

     or reducing agent
    Reducing agent
    A reducing agent is the element or compound in a reduction-oxidation reaction that donates an electron to another species; however, since the reducer loses an electron we say it is "oxidized"...

     like 2-mercaptoethanol
    2-Mercaptoethanol
    2-Mercaptoethanol is the chemical compound with the formula HOCH2CH2SH. It is a hybrid of ethylene glycol, HOCH2CH2OH, and 1,2-ethanedithiol, HSCH2CH2SH...

    . A protecting group
    Protecting group
    A protecting group or protective group is introduced into a molecule by chemical modification of a functional group in order to obtain chemoselectivity in a subsequent chemical reaction...

     such as iodoacetic acid may be necessary to prevent the bonds from re-forming.
  2. Separate and purify the individual chains of the protein complex, if there are more than one.
  3. Determine the amino acid composition of each chain.
  4. Determine the terminal amino acids of each chain.
  5. Break each chain into fragments under 50 amino acids long.
  6. Separate and purify the fragments.
  7. Determine the sequence of each fragment.
  8. Repeat with a different pattern of cleavage.
  9. Construct the sequence of the overall protein.


Digestion into peptide fragments Peptides longer than about 50-70 amino acids long cannot be sequenced reliably by the Edman degradation. Because of this, long protein chains need to be broken up into small fragments which can then be sequenced individually. Digestion is done either by endopeptidase
Endopeptidase
Endopeptidase or endoproteinase are proteolytic peptidases that break peptide bonds of nonterminal amino acids , in contrast to exopeptidases, which break peptide bonds from their end-pieces. For this reason, endopeptidases cannot break down peptides into monomers, while exopeptidases can break...

s such as trypsin
Trypsin
Trypsin is a serine protease found in the digestive system of many vertebrates, where it hydrolyses proteins. Trypsin is produced in the pancreas as the inactive proenzyme trypsinogen. Trypsin cleaves peptide chains mainly at the carboxyl side of the amino acids lysine or arginine, except when...

 or pepsin
Pepsin
Pepsin is an enzyme whose precursor form is released by the chief cells in the stomach and that degrades food proteins into peptides. It was discovered in 1836 by Theodor Schwann who also coined its name from the Greek word pepsis, meaning digestion...

 or by chemical reagents such as cyanogen bromide. Different enzymes give different cleavage patterns, and the overlap between fragments can be used to construct an overall sequence.

The Edman degradation reaction

The peptide to be sequenced is adsorbed
Adsorption
Adsorption is the adhesion of atoms, ions, biomolecules or molecules of gas, liquid, or dissolved solids to a surface. This process creates a film of the adsorbate on the surface of the adsorbent. It differs from absorption, in which a fluid permeates or is dissolved by a liquid or solid...

 onto a solid surface - one common substrate
Substrate (biochemistry)
In biochemistry, a substrate is a molecule upon which an enzyme acts. Enzymes catalyze chemical reactions involving the substrate. In the case of a single substrate, the substrate binds with the enzyme active site, and an enzyme-substrate complex is formed. The substrate is transformed into one or...

 is glass fibre coated with polybrene
Polybrene
Polybrene is a cationic polymer used to increase the efficiency of infection of certain cells with a retrovirus in cell culture. Polybrene acts by neutralizing the charge repulsion between virions and sialic acid on the cell surface...

, a cationic polymer. The Edman reagent, phenylisothiocyanate
Phenylisothiocyanate
Phenyl isothiocyanate is a reagent used in reversed phase HPLC. PITC is less sensitive than o-phthaldehyde and cannot be fully automated...

 (PITC), is added to the adsorbed peptide, together with a mildly basic buffer solution
Buffer solution
A buffer solution is an aqueous solution consisting of a mixture of a weak acid and its conjugate base or a weak base and its conjugate acid. It has the property that the pH of the solution changes very little when a small amount of strong acid or base is added to it. Buffer solutions are used as a...

 of 12% trimethylamine
Trimethylamine
Trimethylamine is an organic compound with the formula N3. This colorless, hygroscopic, and flammable tertiary amine has a strong "fishy" odor in low concentrations and an ammonia-like odor at higher concentrations...

. This reacts with the amine group of the N-terminal amino acid.

The terminal amino acid can then be selectively detached by the addition of anhydrous
Anhydrous
As a general term, a substance is said to be anhydrous if it contains no water. The way of achieving the anhydrous form differs from one substance to another...

 acid. The derivative then isomerises to give a substituted phenylthiohydantoin which can be washed off and identified by chromatography, and the cycle can be repeated. The efficiency of each step is about 98%, which allows about 50 amino acids to be reliably determined.

Limitations of the Edman degradation

Because the Edman degradation proceeds from the N-terminus of the protein, it will not work if the N-terminal amino acid has been chemically modified or if it is concealed within the body of the protein. It also requires the use of either guesswork or a separate procedure to determine the positions of disulfide bridges.

Mass spectrometry

The other major direct method by which the sequence of a protein can be determined is mass spectrometry
Mass spectrometry
Mass spectrometry is an analytical technique that measures the mass-to-charge ratio of charged particles.It is used for determining masses of particles, for determining the elemental composition of a sample or molecule, and for elucidating the chemical structures of molecules, such as peptides and...

. This method has been gaining popularity in recent years as new techniques and increasing computing power have facilitated it. Mass spectrometry can, in principle, sequence any size of protein, but the problem becomes computationally more difficult as the size increases. Peptides are also easier to prepare for mass spectrometry than whole proteins, because they are more soluble. One method of delivering the peptides to the spectrometer is electrospray ionization
Electrospray ionization
Electrospray ionization is a technique used in mass spectrometry to produce ions. It is especially useful in producing ions from macromolecules because it overcomes the propensity of these molecules to fragment when ionized...

, for which John Bennett Fenn won the Nobel Prize in Chemistry
Nobel Prize in Chemistry
The Nobel Prize in Chemistry is awarded annually by the Royal Swedish Academy of Sciences to scientists in the various fields of chemistry. It is one of the five Nobel Prizes established by the will of Alfred Nobel in 1895, awarded for outstanding contributions in chemistry, physics, literature,...

 in 2002. The protein is digested by an endoprotease, and the resulting solution is passed through a high pressure liquid chromatography column. At the end of this column, the solution is sprayed out of a narrow nozzle charged to a high positive potential into the mass spectrometer. The charge on the droplets causes them to fragment until only single ions remain. The peptides are then fragmented
Fragmentation (chemistry)
Fragmentation is a type of chemical dissociation. Fragmentation of a molecule can take place by a process of heterolysis or homolysis.It is a phenomenon observed in mass spectrometry where it is used as a tool to find the structural formula of a molecule, process called structural elucidation.It...

 and the mass-to-charge ratio
Mass-to-charge ratio
The mass-to-charge ratio ratio is a physical quantity that is widely used in the electrodynamics of charged particles, e.g. in electron optics and ion optics. It appears in the scientific fields of lithography, electron microscopy, cathode ray tubes, accelerator physics, nuclear physics, Auger...

s of the fragments measured. (It is possible to detect which peaks correspond to multiply charged fragments, because these will have auxiliary peaks corresponding to other isotopes - the distance between these other peaks is inversely proportional to the charge on the fragment). The mass spectrum is analysed by computer and often compared against a database of previously sequenced proteins in order to determine the sequences of the fragments. This process is then repeated with a different digestion enzyme, and the overlaps in the sequences are used to construct a sequence for the protein.

Predicting protein sequence from DNA/RNA sequences

The amino acid sequence of a protein can also be determined indirectly from the mRNA
Messenger RNA
Messenger RNA is a molecule of RNA encoding a chemical "blueprint" for a protein product. mRNA is transcribed from a DNA template, and carries coding information to the sites of protein synthesis: the ribosomes. Here, the nucleic acid polymer is translated into a polymer of amino acids: a protein...

 or, in organisms that do not have intron
Intron
An intron is any nucleotide sequence within a gene that is removed by RNA splicing to generate the final mature RNA product of a gene. The term intron refers to both the DNA sequence within a gene, and the corresponding sequence in RNA transcripts. Sequences that are joined together in the final...

s (e.g. prokaryote
Prokaryote
The prokaryotes are a group of organisms that lack a cell nucleus , or any other membrane-bound organelles. The organisms that have a cell nucleus are called eukaryotes. Most prokaryotes are unicellular, but a few such as myxobacteria have multicellular stages in their life cycles...

s), the DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 that codes for the protein. If the sequence of the gene is already known, then this is all very easy. However, it is rare that the DNA sequence of a newly isolated protein will be known, and so if this method is to be used, it has to be found in some way. One way that this can be done is to sequence a short section, perhaps 15 amino acids long, of the protein by one of the above methods, and then use this sequence to generate a complementary marker for the protein's RNA. This can then be used to isolate the mRNA coding for the protein, which can then be replicated in a polymerase chain reaction
Polymerase chain reaction
The polymerase chain reaction is a scientific technique in molecular biology to amplify a single or a few copies of a piece of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence....

 to yield a significant amount of DNA, which can then be sequenced relatively easily. The amino acid sequence of the protein can then be deduced from this. However, it is necessary to take into account the possibility of amino acids being removed after the mRNA has been translated
Translation (genetics)
In molecular biology and genetics, translation is the third stage of protein biosynthesis . In translation, messenger RNA produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein...

.

See also

  • Henry Jakubowski. Biochemistry Online, chapter 2 B.http://employees.csbsju.edu/hjakubowski/classes/ch331/bcintro/default.html
  • Hanno Steen & Matthias Mann
    Matthias Mann
    Matthias Mann is a scientist in the area of mass spectrometry and proteomics. Born 1959 in Germany he studied mathematics and physics at the University of Göttingen. He received his Ph.D. in 1988 at Yale University where he worked in the group of John Fenn, who was later awarded the Nobel Prize in...

    . The abc's (and xyz's) of peptide sequencing. Nature Reviews Molecular Cell Biology, 5:699-711, 2004.
  • Sergio Marchesini Michael W. King. Analysis of protein.http://www.med.unibs.it/marchesi/analysis.html
  • R A Rastall. Investigating protein structure and function.http://www.food.rdg.ac.uk/online/fs460/index.htm
  • Alberts Bray Johnson Lewis Raff Roberts & Walter. 1998. Essential Cell Biology: An Introduction to the Molecular Biology of the Cell. Garland Publishing, New York.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK