In
molecular biologyMolecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...
and
geneticsGenetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....
, the linking between two nitrogenous bases on opposite
complementaryIn molecular biology, complementarity is a property of double-stranded nucleic acids such as DNA, as well as DNA:RNA duplexes. Each strand is complementary to the other in that the base pairs between them are non-covalently connected via two or three hydrogen bonds...
DNADeoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
or certain types of
RNARibonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....
strands that are connected via
hydrogen bondA hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, such as nitrogen, oxygen or fluorine, that comes from another molecule or chemical group. The hydrogen must be covalently bonded to another electronegative atom to create the bond...
s is called a
base pair (often abbreviated bp). In the canonical Watson-Crick DNA base pairing,
adenineAdenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...
(
A) forms a base pair with
thymineThymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be derived by methylation of uracil at...
(
T) and
guanineGuanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...
(
G) forms a base pair with
cytosineCytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...
(
C). In RNA, thymine is replaced by
uracilUracil is one of the four nucleobases in the nucleic acid of RNA that are represented by the letters A, G, C and U. The others are adenine, cytosine, and guanine. In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by thymine.Uracil is a common and...
(
U). Alternate hydrogen bonding patterns, such as the
wobble base pairIn molecular biology, a wobble base pair is a non-Watson-Crick base pairing between two nucleotides in RNA molecules. The four main wobble base pairs are guanine-uracil, inosine-uracil, inosine-adenine, and inosine-cytosine . The thermodynamic stability of a wobble base pair is comparable to that...
and
Hoogsteen base pairA Hoogsteen base pair is a variation of base-pairing in nucleic acids such as the A•T pair. In this manner, two nucleobases on each strand can be held together by hydrogen bonds in the major groove...
, also occur—in particular, in RNA—giving rise to complex and functional
tertiary structures300px|thumb|upright|alt = Colored dice with checkered background|Example of a large catalytic RNA. The self-splicing group II intron from Oceanobacillus iheyensis....
. Pairing is the mechanism by which codons on
messenger RNAMessenger RNA is a molecule of RNA encoding a chemical "blueprint" for a protein product. mRNA is transcribed from a DNA template, and carries coding information to the sites of protein synthesis: the ribosomes. Here, the nucleic acid polymer is translated into a polymer of amino acids: a protein...
molecules are recognized by anticodons on
transfer RNATransfer RNA is an adaptor molecule composed of RNA, typically 73 to 93 nucleotides in length, that is used in biology to bridge the three-letter genetic code in messenger RNA with the twenty-letter code of amino acids in proteins. The role of tRNA as an adaptor is best understood by...
during protein
translationIn molecular biology and genetics, translation is the third stage of protein biosynthesis . In translation, messenger RNA produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein...
. Some DNA- or RNA-binding enzymes can recognize specific base pairing patterns that identify particular regulatory regions of genes.
The size of an individual
geneA gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
or an organism's entire
genomeIn modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
is often measured in base pairs because DNA is usually double-stranded. Hence, the number of total base pairs is equal to the number of nucleotides in one of the strands (with the exception of non-coding single-stranded regions of
telomereA telomere is a region of repetitive DNA sequences at the end of a chromosome, which protects the end of the chromosome from deterioration or from fusion with neighboring chromosomes. Its name is derived from the Greek nouns telos "end" and merοs "part"...
s). The haploid
human genomeThe human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...
(23 chromosomes) is estimated to be about 3 billion base pairs long and to contain 20,000–25,000 distinct genes. A kilobase (kb) is a unit of measurement in
molecular biologyMolecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...
equal to 1000 base pairs of DNA or RNA.
Hydrogen bonding and stability
Hydrogen bondA hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, such as nitrogen, oxygen or fluorine, that comes from another molecule or chemical group. The hydrogen must be covalently bonded to another electronegative atom to create the bond...
ing is the chemical interaction that underlies the base-pairing rules described above. Appropriate geometrical correspondence of hydrogen bond donors and acceptors allows only the "right" pairs to form stably. DNA with high GC-content is more stable than DNA with low
GC-contentIn molecular biology and genetics, GC-content is the percentage of nitrogenous bases on a DNA molecule that are either guanine or cytosine . This may refer to a specific fragment of DNA or RNA, or that of the whole genome...
, but, contrary to popular belief, the hydrogen bonds do not stabilize the DNA significantly, and stabilization is mainly due to stacking interactions.
The larger
nucleobaseNucleobases are a group of nitrogen-based molecules that are required to form nucleotides, the basic building blocks of DNA and RNA. Nucleobases provide the molecular structure necessary for the hydrogen bonding of complementary DNA and RNA strands, and are key components in the formation of stable...
s, adenine and guanine, are members of a class of double-ringed chemical structures called
purineA purine is a heterocyclic aromatic organic compound, consisting of a pyrimidine ring fused to an imidazole ring. Purines, including substituted purines and their tautomers, are the most widely distributed kind of nitrogen-containing heterocycle in nature....
s; the bigger or smaller nucleobases, cytosine and thymine (and uracil), are members of a class of single-ringed chemical structures called
pyrimidinePyrimidine is a heterocyclic aromatic organic compound similar to benzene and pyridine, containing two nitrogen atoms at positions 1 and 3 of the six-member ring...
s. Purines are complementary only with pyrimidines: pyrimidine-pyrimidine pairings are energetically unfavorable because the molecules are too far apart for hydrogen bonding to be established; purine-purine pairings are energetically unfavorable because the molecules are too close, leading to overlap repulsion. The only other possible pairings are GT and AC; these pairings are mismatches because the pattern of hydrogen donors and acceptors do not correspond. The GU pairing, with two hydrogen bonds, does occur fairly often in
RNARibonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....
(see
wobble base pairIn molecular biology, a wobble base pair is a non-Watson-Crick base pairing between two nucleotides in RNA molecules. The four main wobble base pairs are guanine-uracil, inosine-uracil, inosine-adenine, and inosine-cytosine . The thermodynamic stability of a wobble base pair is comparable to that...
).
Paired DNA and RNA molecules are comparatively stable at room temperature but the two nucleotide strands will separate above a melting point that is determined by the length of the molecules, the extent of mispairing (if any), and the GC content. Higher GC content results in higher melting temperatures; it is, therefore, unsurprising that the genomes of
extremophileAn extremophile is an organism that thrives in physically or geochemically extreme conditions that are detrimental to most life on Earth. In contrast, organisms that live in more moderate environments may be termed mesophiles or neutrophiles...
organisms such as
Thermus thermophilusThermus thermophilus is a Gram negative eubacterium used in a range of biotechnological applications, including as a model organism for genetic manipulation, structural genomics, and systems biology. The bacterium is extremely thermophilic, with an optimal growth temperature of about...
are particularly GC-rich. On the converse, regions of a genome that need to separate frequently — for example, the promoter regions for often-
transcribedTranscription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...
genes — are comparatively GC-poor (for example, see
TATA boxThe TATA box is a DNA sequence found in the promoter region of genes in archaea and eukaryotes; approximately 24% of human genes contain a TATA box within the core promoter....
). GC content and melting temperature must also be taken into account when designing
primersA primer is a strand of nucleic acid that serves as a starting point for DNA synthesis. They are required for DNA replication because the enzymes that catalyze this process, DNA polymerases, can only add new nucleotides to an existing strand of DNA...
for PCR reactions.
Base stacking
Base
stackingIn chemistry, pi stacking refers to attractive, noncovalent interactions between aromatic rings. These interactions are historically thought to be important in to base stacking of DNA nucleotides, protein folding, template-directed synthesis, materials science, and molecular recognition, although...
interactions in DNA and RNA are due to dispersion attraction, short-range exchange repulsion, and electrostatic interactions, which also contribute to stability. Again, GC stacking interactions with adjacent bases tend to be more favorable. (Note, however, that a GC stacking interaction with the next base pair is geometrically different from a CG interaction.) Base stacking effects are especially important in the secondary structure and tertiary structure of RNA; for example, RNA
stem-loopStem-loop intramolecular base pairing is a pattern that can occur in single-stranded DNA or, more commonly, in RNA. The structure is also known as a hairpin or hairpin loop. It occurs when two regions of the same strand, usually complementary in nucleotide sequence when read in opposite directions,...
structures are stabilized by base stacking in the loop region.
Base analogs and intercalators
Chemical analogs of nucleotides can take the place of proper nucleotides and establish non-canonical base-pairing, leading to errors (mostly
point mutationA point mutation, or single base substitution, is a type of mutation that causes the replacement of a single base nucleotide with another nucleotide of the genetic material, DNA or RNA. Often the term point mutation also includes insertions or deletions of a single base pair...
s) in
DNA replicationDNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...
and
DNA transcriptionTranscription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...
. This is due to their isosteric chemistry. One common mutagenic base analog is
5-bromouracil5-Bromouracil is a brominated derivative of uracil that acts as an antimetabolite or base analog, substituting for thymine in DNA, and can induce DNA mutation in the same way as 2-aminopurine...
, which resembles thymine but can base-pair to guanine in its
enolEnols are alkenes with a hydroxyl group affixed to one of the carbon atoms composing the double bond. Alkenes with a hydroxyl group on both sides of the double bond are called enediols. Deprotonated anions of enols are called enolates...
form.
Other chemicals, known as DNA intercalators, fit into the gap between adjacent bases on a single strand and induce
frameshift mutationA frameshift mutation is a genetic mutation caused by indels of a number of nucleotides that is not evenly divisible by three from a DNA sequence...
s by "masquerading" as a base, causing the DNA replication machinery to skip or insert additional nucleotides at the intercalated site. Most intercalators are large polyaromatic compounds and are known or suspected
carcinogenA carcinogen is any substance, radionuclide, or radiation that is an agent directly involved in causing cancer. This may be due to the ability to damage the genome or to the disruption of cellular metabolic processes...
s. Examples include
ethidium bromideEthidium bromide is an intercalating agent commonly used as a fluorescent tag in molecular biology laboratories for techniques such as agarose gel electrophoresis. It is commonly abbreviated as "EtBr", which is also an abbreviation for bromoethane...
and
acridineAcridine, C13H9N, is an organic compound and a nitrogen heterocycle. Acridine is also used to describe compounds containing the C13N tricycle....
.
Examples
The following DNA sequences illustrate pair double-stranded patterns. By convention, the top strand is written from the 5' end to the 3' end; thus, the bottom strand is written 3' to 5'.
- A base-paired DNA sequence:
- The corresponding RNA sequence, in which uracil
Uracil is one of the four nucleobases in the nucleic acid of RNA that are represented by the letters A, G, C and U. The others are adenine, cytosine, and guanine. In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by thymine.Uracil is a common and...
is substituted for thymine where uracil takes its place in the RNA strand:
Length measurements
The following abbreviations are commonly used to describe the length of a D/R
NA moleculeDeoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
:
- bp = base pair(s)—one bp corresponds to circa 3.4 Å
The angstrom or ångström, is a unit of length equal to 1/10,000,000,000 of a meter . Its symbol is the Swedish letter Å....
of length along the strand
- kb (= kbp) = kilo base pairs = 1,000 bp
- Mb = mega base pairs = 1,000,000 bp
- Gb = giga base pairs = 1,000,000,000 bp.
In case of single stranded DNA/RNA units of
nucleotideNucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...
s are used, abbreviated nt (or knt, Mnt, Gnt), as they are not paired.
For distinction between units of
computer storageComputer data storage, often called storage or memory, refers to computer components and recording media that retain digital data. Data storage is one of the core functions and fundamental components of computers....
and bases kbp, Mbp, Gbp, etc. may be used for basepairs. The length of 16S rDNA for bacteria is 1542 base-pairs in length.
The
CentimorganIn genetics, a centimorgan or map unit is a unit of recombinant frequency for measuring genetic linkage, defined as that distance between chromosome positions for which the expected average number of intervening chromosomal crossovers in a single generation is 0.01. It is often used to infer...
is also often used to imply distance along a chromosome, but the number of base-pairs it corresponds to varies widely. In the Human genome, the centimorgan is about 1 million base pairs.
External links
- DAN—webserver version of the EMBOSS tool for calculating melting temperatures