Base pair

Base pair

Overview
In molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...

 and genetics
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

, the linking between two nitrogenous bases on opposite complementary
Complementarity (molecular biology)
In molecular biology, complementarity is a property of double-stranded nucleic acids such as DNA, as well as DNA:RNA duplexes. Each strand is complementary to the other in that the base pairs between them are non-covalently connected via two or three hydrogen bonds...

 DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 or certain types of RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

 strands that are connected via hydrogen bond
Hydrogen bond
A hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, such as nitrogen, oxygen or fluorine, that comes from another molecule or chemical group. The hydrogen must be covalently bonded to another electronegative atom to create the bond...

s is called a base pair (often abbreviated bp). In the canonical Watson-Crick DNA base pairing, adenine
Adenine
Adenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...

 (A) forms a base pair with thymine
Thymine
Thymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be derived by methylation of uracil at...

 (T) and guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...

 (G) forms a base pair with cytosine
Cytosine
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...

 (C). In RNA, thymine is replaced by uracil
Uracil
Uracil is one of the four nucleobases in the nucleic acid of RNA that are represented by the letters A, G, C and U. The others are adenine, cytosine, and guanine. In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by thymine.Uracil is a common and...

 (U).
Discussion
Ask a question about 'Base pair'
Start a new discussion about 'Base pair'
Answer questions from other users
Full Discussion Forum
 
Unanswered Questions
Encyclopedia
In molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...

 and genetics
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

, the linking between two nitrogenous bases on opposite complementary
Complementarity (molecular biology)
In molecular biology, complementarity is a property of double-stranded nucleic acids such as DNA, as well as DNA:RNA duplexes. Each strand is complementary to the other in that the base pairs between them are non-covalently connected via two or three hydrogen bonds...

 DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 or certain types of RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

 strands that are connected via hydrogen bond
Hydrogen bond
A hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, such as nitrogen, oxygen or fluorine, that comes from another molecule or chemical group. The hydrogen must be covalently bonded to another electronegative atom to create the bond...

s is called a base pair (often abbreviated bp). In the canonical Watson-Crick DNA base pairing, adenine
Adenine
Adenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...

 (A) forms a base pair with thymine
Thymine
Thymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be derived by methylation of uracil at...

 (T) and guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...

 (G) forms a base pair with cytosine
Cytosine
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...

 (C). In RNA, thymine is replaced by uracil
Uracil
Uracil is one of the four nucleobases in the nucleic acid of RNA that are represented by the letters A, G, C and U. The others are adenine, cytosine, and guanine. In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by thymine.Uracil is a common and...

 (U). Alternate hydrogen bonding patterns, such as the wobble base pair
Wobble base pair
In molecular biology, a wobble base pair is a non-Watson-Crick base pairing between two nucleotides in RNA molecules. The four main wobble base pairs are guanine-uracil, inosine-uracil, inosine-adenine, and inosine-cytosine . The thermodynamic stability of a wobble base pair is comparable to that...

 and Hoogsteen base pair
Hoogsteen base pair
A Hoogsteen base pair is a variation of base-pairing in nucleic acids such as the A•T pair. In this manner, two nucleobases on each strand can be held together by hydrogen bonds in the major groove...

, also occur—in particular, in RNA—giving rise to complex and functional tertiary structures
Nucleic acid tertiary structure
300px|thumb|upright|alt = Colored dice with checkered background|Example of a large catalytic RNA. The self-splicing group II intron from Oceanobacillus iheyensis....

. Pairing is the mechanism by which codons on messenger RNA
Messenger RNA
Messenger RNA is a molecule of RNA encoding a chemical "blueprint" for a protein product. mRNA is transcribed from a DNA template, and carries coding information to the sites of protein synthesis: the ribosomes. Here, the nucleic acid polymer is translated into a polymer of amino acids: a protein...

 molecules are recognized by anticodons on transfer RNA
Transfer RNA
Transfer RNA is an adaptor molecule composed of RNA, typically 73 to 93 nucleotides in length, that is used in biology to bridge the three-letter genetic code in messenger RNA with the twenty-letter code of amino acids in proteins. The role of tRNA as an adaptor is best understood by...

 during protein translation
Translation (genetics)
In molecular biology and genetics, translation is the third stage of protein biosynthesis . In translation, messenger RNA produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein...

. Some DNA- or RNA-binding enzymes can recognize specific base pairing patterns that identify particular regulatory regions of genes.

The size of an individual gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

 or an organism's entire genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

 is often measured in base pairs because DNA is usually double-stranded. Hence, the number of total base pairs is equal to the number of nucleotides in one of the strands (with the exception of non-coding single-stranded regions of telomere
Telomere
A telomere is a region of repetitive DNA sequences at the end of a chromosome, which protects the end of the chromosome from deterioration or from fusion with neighboring chromosomes. Its name is derived from the Greek nouns telos "end" and merοs "part"...

s). The haploid human genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...

 (23 chromosomes) is estimated to be about 3 billion base pairs long and to contain 20,000–25,000 distinct genes. A kilobase (kb) is a unit of measurement in molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...

 equal to 1000 base pairs of DNA or RNA.

Hydrogen bonding and stability


Hydrogen bond
Hydrogen bond
A hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, such as nitrogen, oxygen or fluorine, that comes from another molecule or chemical group. The hydrogen must be covalently bonded to another electronegative atom to create the bond...

ing is the chemical interaction that underlies the base-pairing rules described above. Appropriate geometrical correspondence of hydrogen bond donors and acceptors allows only the "right" pairs to form stably. DNA with high GC-content is more stable than DNA with low GC-content
GC-content
In molecular biology and genetics, GC-content is the percentage of nitrogenous bases on a DNA molecule that are either guanine or cytosine . This may refer to a specific fragment of DNA or RNA, or that of the whole genome...

, but, contrary to popular belief, the hydrogen bonds do not stabilize the DNA significantly, and stabilization is mainly due to stacking interactions.

The larger nucleobase
Nucleobase
Nucleobases are a group of nitrogen-based molecules that are required to form nucleotides, the basic building blocks of DNA and RNA. Nucleobases provide the molecular structure necessary for the hydrogen bonding of complementary DNA and RNA strands, and are key components in the formation of stable...

s, adenine and guanine, are members of a class of double-ringed chemical structures called purine
Purine
A purine is a heterocyclic aromatic organic compound, consisting of a pyrimidine ring fused to an imidazole ring. Purines, including substituted purines and their tautomers, are the most widely distributed kind of nitrogen-containing heterocycle in nature....

s; the bigger or smaller nucleobases, cytosine and thymine (and uracil), are members of a class of single-ringed chemical structures called pyrimidine
Pyrimidine
Pyrimidine is a heterocyclic aromatic organic compound similar to benzene and pyridine, containing two nitrogen atoms at positions 1 and 3 of the six-member ring...

s. Purines are complementary only with pyrimidines: pyrimidine-pyrimidine pairings are energetically unfavorable because the molecules are too far apart for hydrogen bonding to be established; purine-purine pairings are energetically unfavorable because the molecules are too close, leading to overlap repulsion. The only other possible pairings are GT and AC; these pairings are mismatches because the pattern of hydrogen donors and acceptors do not correspond. The GU pairing, with two hydrogen bonds, does occur fairly often in RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

 (see wobble base pair
Wobble base pair
In molecular biology, a wobble base pair is a non-Watson-Crick base pairing between two nucleotides in RNA molecules. The four main wobble base pairs are guanine-uracil, inosine-uracil, inosine-adenine, and inosine-cytosine . The thermodynamic stability of a wobble base pair is comparable to that...

).

Paired DNA and RNA molecules are comparatively stable at room temperature but the two nucleotide strands will separate above a melting point that is determined by the length of the molecules, the extent of mispairing (if any), and the GC content. Higher GC content results in higher melting temperatures; it is, therefore, unsurprising that the genomes of extremophile
Extremophile
An extremophile is an organism that thrives in physically or geochemically extreme conditions that are detrimental to most life on Earth. In contrast, organisms that live in more moderate environments may be termed mesophiles or neutrophiles...

 organisms such as Thermus thermophilus
Thermus thermophilus
Thermus thermophilus is a Gram negative eubacterium used in a range of biotechnological applications, including as a model organism for genetic manipulation, structural genomics, and systems biology. The bacterium is extremely thermophilic, with an optimal growth temperature of about...

are particularly GC-rich. On the converse, regions of a genome that need to separate frequently — for example, the promoter regions for often-transcribed
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

 genes — are comparatively GC-poor (for example, see TATA box
TATA box
The TATA box is a DNA sequence found in the promoter region of genes in archaea and eukaryotes; approximately 24% of human genes contain a TATA box within the core promoter....

). GC content and melting temperature must also be taken into account when designing primers
Primer (molecular biology)
A primer is a strand of nucleic acid that serves as a starting point for DNA synthesis. They are required for DNA replication because the enzymes that catalyze this process, DNA polymerases, can only add new nucleotides to an existing strand of DNA...

 for PCR reactions.

Base stacking


Base stacking
Stacking (chemistry)
In chemistry, pi stacking refers to attractive, noncovalent interactions between aromatic rings. These interactions are historically thought to be important in to base stacking of DNA nucleotides, protein folding, template-directed synthesis, materials science, and molecular recognition, although...

 interactions in DNA and RNA are due to dispersion attraction, short-range exchange repulsion, and electrostatic interactions, which also contribute to stability. Again, GC stacking interactions with adjacent bases tend to be more favorable. (Note, however, that a GC stacking interaction with the next base pair is geometrically different from a CG interaction.) Base stacking effects are especially important in the secondary structure and tertiary structure of RNA; for example, RNA stem-loop
Stem-loop
Stem-loop intramolecular base pairing is a pattern that can occur in single-stranded DNA or, more commonly, in RNA. The structure is also known as a hairpin or hairpin loop. It occurs when two regions of the same strand, usually complementary in nucleotide sequence when read in opposite directions,...

 structures are stabilized by base stacking in the loop region.

Base analogs and intercalators



Chemical analogs of nucleotides can take the place of proper nucleotides and establish non-canonical base-pairing, leading to errors (mostly point mutation
Point mutation
A point mutation, or single base substitution, is a type of mutation that causes the replacement of a single base nucleotide with another nucleotide of the genetic material, DNA or RNA. Often the term point mutation also includes insertions or deletions of a single base pair...

s) in DNA replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...

 and DNA transcription
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

. This is due to their isosteric chemistry. One common mutagenic base analog is 5-bromouracil
5-Bromouracil
5-Bromouracil is a brominated derivative of uracil that acts as an antimetabolite or base analog, substituting for thymine in DNA, and can induce DNA mutation in the same way as 2-aminopurine...

, which resembles thymine but can base-pair to guanine in its enol
Enol
Enols are alkenes with a hydroxyl group affixed to one of the carbon atoms composing the double bond. Alkenes with a hydroxyl group on both sides of the double bond are called enediols. Deprotonated anions of enols are called enolates...

 form.

Other chemicals, known as DNA intercalators, fit into the gap between adjacent bases on a single strand and induce frameshift mutation
Frameshift mutation
A frameshift mutation is a genetic mutation caused by indels of a number of nucleotides that is not evenly divisible by three from a DNA sequence...

s by "masquerading" as a base, causing the DNA replication machinery to skip or insert additional nucleotides at the intercalated site. Most intercalators are large polyaromatic compounds and are known or suspected carcinogen
Carcinogen
A carcinogen is any substance, radionuclide, or radiation that is an agent directly involved in causing cancer. This may be due to the ability to damage the genome or to the disruption of cellular metabolic processes...

s. Examples include ethidium bromide
Ethidium bromide
Ethidium bromide is an intercalating agent commonly used as a fluorescent tag in molecular biology laboratories for techniques such as agarose gel electrophoresis. It is commonly abbreviated as "EtBr", which is also an abbreviation for bromoethane...

 and acridine
Acridine
Acridine, C13H9N, is an organic compound and a nitrogen heterocycle. Acridine is also used to describe compounds containing the C13N tricycle....

.

Examples


The following DNA sequences illustrate pair double-stranded patterns. By convention, the top strand is written from the 5' end to the 3' end; thus, the bottom strand is written 3' to 5'.
A base-paired DNA sequence:

The corresponding RNA sequence, in which uracil
Uracil
Uracil is one of the four nucleobases in the nucleic acid of RNA that are represented by the letters A, G, C and U. The others are adenine, cytosine, and guanine. In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by thymine.Uracil is a common and...

 is substituted for thymine where uracil takes its place in the RNA strand:

Length measurements


The following abbreviations are commonly used to describe the length of a D/RNA molecule
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

:
  • bp = base pair(s)—one bp corresponds to circa 3.4 Å
    Ångström
    The angstrom or ångström, is a unit of length equal to 1/10,000,000,000 of a meter . Its symbol is the Swedish letter Å....

     of length along the strand
  • kb (= kbp) = kilo base pairs = 1,000 bp
  • Mb = mega base pairs = 1,000,000 bp
  • Gb = giga base pairs = 1,000,000,000 bp.


In case of single stranded DNA/RNA units of nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

s are used, abbreviated nt (or knt, Mnt, Gnt), as they are not paired.
For distinction between units of computer storage
Computer storage
Computer data storage, often called storage or memory, refers to computer components and recording media that retain digital data. Data storage is one of the core functions and fundamental components of computers....

 and bases kbp, Mbp, Gbp, etc. may be used for basepairs. The length of 16S rDNA for bacteria is 1542 base-pairs in length.

The Centimorgan
Centimorgan
In genetics, a centimorgan or map unit is a unit of recombinant frequency for measuring genetic linkage, defined as that distance between chromosome positions for which the expected average number of intervening chromosomal crossovers in a single generation is 0.01. It is often used to infer...

is also often used to imply distance along a chromosome, but the number of base-pairs it corresponds to varies widely. In the Human genome, the centimorgan is about 1 million base pairs.

External links

  • DAN—webserver version of the EMBOSS tool for calculating melting temperatures