GC-content (or guanine-cytosine content), in
molecular biologyMolecular biology is the study of biology at a molecular level. The field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...
, is the percentage of nitrogenous bases on a
DNADeoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information...
molecule which are either
guanineGuanine is one of the five main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, thymine, and uracil. In DNA, guanine is paired with cytosine. With the formula C
5H
5N
5O, guanine is a derivative of purine, consisting of a fused...
or
cytosineCytosine is one of the four main bases found in DNA and RNA. It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...
(from a possibility of four different ones, also including
adenineAdenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...
and
thymineThymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine always pairs with adenine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be...
). This may refer to a specific fragment of DNA or
RNARibonucleic acid is a biologically important type of molecule that consists of a long chain of nucleotide units. Each nucleotide consists of a nitrogenous base, a ribose sugar, and a phosphate...
, or that of the
whole genomeIn modern molecular biology the genome refers to all of its hereditary information encoded in DNA .The genome includes both the genes and the non-coding sequences of the DNA. The term was adapted in 1920 by Hans Winkler, Professor of Botany at the University of Hamburg, Germany...
. When it refers to a fragment of the genetic material, it may denote the GC-content of part of a gene (domain), single gene, group of genes (or gene clusters) or even a non-coding region.
GC-content (or guanine-cytosine content), in
molecular biologyMolecular biology is the study of biology at a molecular level. The field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...
, is the percentage of nitrogenous bases on a
DNADeoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information...
molecule which are either
guanineGuanine is one of the five main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, thymine, and uracil. In DNA, guanine is paired with cytosine. With the formula C
5H
5N
5O, guanine is a derivative of purine, consisting of a fused...
or
cytosineCytosine is one of the four main bases found in DNA and RNA. It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...
(from a possibility of four different ones, also including
adenineAdenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...
and
thymineThymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine always pairs with adenine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be...
). This may refer to a specific fragment of DNA or
RNARibonucleic acid is a biologically important type of molecule that consists of a long chain of nucleotide units. Each nucleotide consists of a nitrogenous base, a ribose sugar, and a phosphate...
, or that of the
whole genomeIn modern molecular biology the genome refers to all of its hereditary information encoded in DNA .The genome includes both the genes and the non-coding sequences of the DNA. The term was adapted in 1920 by Hans Winkler, Professor of Botany at the University of Hamburg, Germany...
. When it refers to a fragment of the genetic material, it may denote the GC-content of part of a gene (domain), single gene, group of genes (or gene clusters) or even a non-coding region. G (guanine) and C (cytosine) undergo a specific hydrogen bonding whereas A (adenine) bonds specifically with T (thymine).
The GC pair is bound by three
hydrogen bondA hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, like nitrogen, oxygen or fluorine . The hydrogen must be covalently bonded to another electronegative atom to create the bond...
s, while AT pairs are bound by two hydrogen bonds. DNA with high GC-content is more stable than DNA with low GC-content, but contrary to popular belief, the hydrogen bonds do not stabilize the DNA significantly and stabilization is mainly due to stacking interactions.
In spite of the higher
thermostabilityThermostability is the quality of a substance to resist irreversible change in its chemical or physical structure at a high relative temperature.Thermostable materials may be used industrially as fire retardants...
conferred to the genetic material, it is envisaged that cells with DNA with high GC-content undergo autolysis, thereby reducing the longevity of the cell
per se. Due to the robustness endowed to the genetic materials in high GC organisms it was commonly believed that the GC content played a vital part in adaptation temperatures, a hypothesis which has recently been refuted.
In PCR experiments, the GC-content of
primersA primer is a strand of nucleic acid that serves as a starting point for DNA replication. They are required because the enzymes that catalyze replication, DNA polymerases, can only add new nucleotides to an existing strand of DNA...
are used to predict their
annealing temperatureAnnealing temperature may refer to:*Annealing *Polymerase chain reaction...
to the template DNA. A higher GC-content level indicates a higher melting temperature.
Determination of GC content
GC content is usually expressed as a percentage value, but sometimes as a ratio (called
G+C ratio or
GC-ratio). GC-content percentage is calculated as
whereas the G+C ratio is calculated as
The GC-content percentages as well as GC-ratio can be measured by several means but one of the simplest methods is to measure what is called the melting temperature of the
DNADeoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information...
double helixIn geometry a double helix typically consists of two congruent helices with the same axis, differing by a translation along the axis, which may or may not be half-way....
using
spectrophotometryIn physics, spectrophotometry is the quantifiable study of electromagnetic spectra. It is more specific than the general term electromagnetic spectroscopy in that spectrophotometry deals with visible light, near-ultraviolet, and near-infrared. Also, the term does not cover time-resolved...
. The
absorbanceIn spectroscopy, the absorbance A is defined as,where is the intensity of light at a specified wavelength λ that has passed through a sample and is the intensity of the light before it enters the sample or incident light intensity...
of
DNADeoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information...
at a
wavelengthIn physics, the wavelength of a sinusoidal wave is the spatial period of the wave – the distance over which the wave's shape repeats.It is usually determined by considering the distance between consecutive corresponding points of the same phase, such as crests, troughs, or zero crossings, and is a...
of 260 nm increases fairly sharply when the double-stranded
DNADeoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information...
separates into two single strands when sufficiently heated. The most commonly used protocol for determining GC ratios uses
flow cytometryFlow cytometry is a technique for counting and examining microscopic particles, such as cells and chromosomes, by suspending them in a stream of fluid and passing them by an electronic detection apparatus. It allows simultaneous multiparametric analysis of the physical and/or chemical...
for large number of samples.
Alternatively, if the DNA or RNA molecule under investigation has been sequenced then the GC-content can be accurately calculated by simple arithmetic or by using the
free online GC calculator.
GC ratio of genomes
GC ratios within a genome is found to be markedly variable. These variations in GC ratio within a genome of higher organisms results in a mosaic like formation with islet regions called
isochoresIn genetics, an isochore is a large region of DNA with a high degree uniformity in G-C and C-G which tends to have more genes, higher local melting or denaturation temperatures, and different flexibility...
. This results in the variations in staining intensity in the chromosomes. The isochores include in them essential protein coding genes, termed
housekeeping genes and thus determination of ratio of these specific regions contributes in mapping these essential genes.
GC ratios and coding sequence
Within a long region of genomic sequence, genes are often characterised by having a higher GC-content in contrast to the background GC-content for the entire genome. Evidence of GC ratio with that of length of the
coding regionCoding region, depending on context, can refer to an organism or a geneThe coding region of a gene is that portion of a gene's DNA or RNA that is composed of exons....
of a
geneA gene is the basic unit of heredity in a living organism. All living things depend on genes. Genes hold the information to build and maintain their cells and pass genetic traits to offspring...
has shown that the length of the coding sequence is directly proportional to higher G+C content. This has been pointed to the fact that the
stop codonIn the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are unique sequences of amino acids, and most codons in messenger RNA correspond to the addition of an amino acid to a growing protein chain — stop codons signal...
has a bias towards A and T nucleotides and thus the shorter the sequence the higher the AT bias.
Application in systematics
GC content is found to be variable with different organisms, the process of which is envisaged to be contributed to by variation in
selectionThe gene-centered view of evolution, gene selection theory or selfish gene theory holds that natural selection acts through differential survival of competing genes, increasing the frequency of those alleles whose phenotypic effects successfully promote their own propagation...
, mutational bias and biased recombination-associated
DNA repairDNA repair refers to a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as UV light and Radiation can cause DNA damage, resulting in as many as 1...
. The
species problemThe species problem is a mixture of difficult, related questions that often come up when biologists identify species and when they define the word "species"....
in prokaryotic taxonomy has led to various suggestions in classifying bacteria and the
ad hoc committee on reconciliation of approaches to bacterial systematics has recommended use of GC ratios in higher level hierarchical classification. For example, the
ActinobacteriaActinobacteria are a group of Gram-positive bacteria with high G+C ratio. They can be terrestrial or aquatic.Use of the ferric uptake regulator has been suggested for classification...
are characterised as "high GC-content bacteria". In
Streptomyces coelicolor A3(2), GC content is 72%. The GC-content of
YeastYeasts are eukaryotic micro-organisms classified in the kingdom Fungi, with about 1,500 species currently described; they dominate fungal diversity in the oceans. Most reproduce asexually by budding, although a few do so by binary fission...
(
Saccharomyces cerevisiaeSaccharomyces cerevisiae is a species of budding yeast. It is perhaps the most useful yeast owing to its use since ancient times in baking and brewing. It is believed that it was originally isolated from the skins of grapes...
) is 38%, and that of another common
model organismA model organism is a species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the organism model will provide insight into the workings of other organisms...
Thale Cress (
Arabidopsis thalianaArabidopsis thaliana , is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, Arabidopsis is popular as a model organism in plant biology and genetics...
) is 36%. Because of the nature of the
genetic codeThe genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells. A more precise term for the concept might be "genetic cipher". The code defines a mapping between tri-nucleotide sequences, called codons, and amino acids...
, it is virtually impossible for an organism to have a genome with a GC-content approaching either 0% or 100%. A species with an extremely low GC-content is
Plasmodium falciparumPlasmodium falciparum is a protozoan parasite, one of the species of Plasmodium that cause malaria in humans. It is transmitted by the female...
(GC% = ~20%), and it is usually common to refer to such examples as being AT-rich instead of GC-poor.
External links
- Table with GC-content of all sequenced prokaryotes
- Taxonomic browser of bacteria based on GC ratio on NCBI website.