DNA

DNA

Overview
Deoxyribonucleic acid (diˌɒksiˌraɪbɵ.njuːˌkleɪ.ɨk ˈæsɪd; DNA) is a nucleic acid
Nucleic acid
Nucleic acids are biological molecules essential for life, and include DNA and RNA . Together with proteins, nucleic acids make up the most important macromolecules; each is found in abundance in all living things, where they function in encoding, transmitting and expressing genetic information...

 that contains the genetic
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

 instructions used in the development and functioning of all known living organism
Organism
In biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...

s (with the exception of RNA virus
RNA virus
An RNA virus is a virus that has RNA as its genetic material. This nucleic acid is usually single-stranded RNA but may be double-stranded RNA...

es). The DNA segments that carry this genetic information are called gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information. Along with RNA and proteins, DNA is one of the three major macromolecules that are essential for all known forms of life.

DNA consists of two long polymers of simple units called nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

s, with backbone
Backbone chain
In polymer science, the backbone chain or main chain of a polymer is the series of covalently bonded atoms that together create the continuous chain of the molecule....

s made of sugar
Monosaccharide
Monosaccharides are the most basic units of biologically important carbohydrates. They are the simplest form of sugar and are usually colorless, water-soluble, crystalline solids. Some monosaccharides have a sweet taste. Examples of monosaccharides include glucose , fructose , galactose, xylose...

s and phosphate
Phosphate
A phosphate, an inorganic chemical, is a salt of phosphoric acid. In organic chemistry, a phosphate, or organophosphate, is an ester of phosphoric acid. Organic phosphates are important in biochemistry and biogeochemistry or ecology. Inorganic phosphates are mined to obtain phosphorus for use in...

 groups joined by ester
Ester
Esters are chemical compounds derived by reacting an oxoacid with a hydroxyl compound such as an alcohol or phenol. Esters are usually derived from an inorganic acid or organic acid in which at least one -OH group is replaced by an -O-alkyl group, and most commonly from carboxylic acids and...

 bonds.
Discussion
Ask a question about 'DNA'
Start a new discussion about 'DNA'
Answer questions from other users
Full Discussion Forum
 
Unanswered Questions
Recent Discussions
Encyclopedia
Deoxyribonucleic acid (diˌɒksiˌraɪbɵ.njuːˌkleɪ.ɨk ˈæsɪd; DNA) is a nucleic acid
Nucleic acid
Nucleic acids are biological molecules essential for life, and include DNA and RNA . Together with proteins, nucleic acids make up the most important macromolecules; each is found in abundance in all living things, where they function in encoding, transmitting and expressing genetic information...

 that contains the genetic
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

 instructions used in the development and functioning of all known living organism
Organism
In biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...

s (with the exception of RNA virus
RNA virus
An RNA virus is a virus that has RNA as its genetic material. This nucleic acid is usually single-stranded RNA but may be double-stranded RNA...

es). The DNA segments that carry this genetic information are called gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information. Along with RNA and proteins, DNA is one of the three major macromolecules that are essential for all known forms of life.

DNA consists of two long polymers of simple units called nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

s, with backbone
Backbone chain
In polymer science, the backbone chain or main chain of a polymer is the series of covalently bonded atoms that together create the continuous chain of the molecule....

s made of sugar
Monosaccharide
Monosaccharides are the most basic units of biologically important carbohydrates. They are the simplest form of sugar and are usually colorless, water-soluble, crystalline solids. Some monosaccharides have a sweet taste. Examples of monosaccharides include glucose , fructose , galactose, xylose...

s and phosphate
Phosphate
A phosphate, an inorganic chemical, is a salt of phosphoric acid. In organic chemistry, a phosphate, or organophosphate, is an ester of phosphoric acid. Organic phosphates are important in biochemistry and biogeochemistry or ecology. Inorganic phosphates are mined to obtain phosphorus for use in...

 groups joined by ester
Ester
Esters are chemical compounds derived by reacting an oxoacid with a hydroxyl compound such as an alcohol or phenol. Esters are usually derived from an inorganic acid or organic acid in which at least one -OH group is replaced by an -O-alkyl group, and most commonly from carboxylic acids and...

 bonds. These two strands run in opposite directions to each other and are therefore anti-parallel
Antiparallel (biochemistry)
In biochemistry, two molecules are antiparallel if they run side-by-side in opposite directions or when both strands are complimentary to each other....

. Attached to each sugar is one of four types of molecules called nucleobases (informally, bases). It is the sequence of these four nucleobases along the backbone that encodes information. This information is read using the genetic code
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....

, which specifies the sequence of the amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

s within proteins. The code is read by copying stretches of DNA into the related nucleic acid RNA in a process called transcription
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

.

Within cells DNA is organized into long structures called chromosome
Chromosome
A chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences. Chromosomes also contain DNA-bound proteins, which serve to package the DNA and control its functions.Chromosomes...

s. During cell division
Cell division
Cell division is the process by which a parent cell divides into two or more daughter cells . Cell division is usually a small segment of a larger cell cycle. This type of cell division in eukaryotes is known as mitosis, and leaves the daughter cell capable of dividing again. The corresponding sort...

 these chromosomes are duplicated in the process of DNA replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...

, providing each cell its own complete set of chromosomes. Eukaryotic organisms
Eukaryote
A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...

 (animal
Animal
Animals are a major group of multicellular, eukaryotic organisms of the kingdom Animalia or Metazoa. Their body plan eventually becomes fixed as they develop, although some undergo a process of metamorphosis later on in their life. Most animals are motile, meaning they can move spontaneously and...

s, plant
Plant
Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or...

s, fungi
Fungus
A fungus is a member of a large group of eukaryotic organisms that includes microorganisms such as yeasts and molds , as well as the more familiar mushrooms. These organisms are classified as a kingdom, Fungi, which is separate from plants, animals, and bacteria...

, and protist
Protist
Protists are a diverse group of eukaryotic microorganisms. Historically, protists were treated as the kingdom Protista, which includes mostly unicellular organisms that do not fit into the other kingdoms, but this group is contested in modern taxonomy...

s) store most of their DNA inside the cell nucleus
Cell nucleus
In cell biology, the nucleus is a membrane-enclosed organelle found in eukaryotic cells. It contains most of the cell's genetic material, organized as multiple long linear DNA molecules in complex with a large variety of proteins, such as histones, to form chromosomes. The genes within these...

 and some of their DNA in organelle
Organelle
In cell biology, an organelle is a specialized subunit within a cell that has a specific function, and is usually separately enclosed within its own lipid bilayer....

s, such as mitochondria or chloroplasts. In contrast, prokaryote
Prokaryote
The prokaryotes are a group of organisms that lack a cell nucleus , or any other membrane-bound organelles. The organisms that have a cell nucleus are called eukaryotes. Most prokaryotes are unicellular, but a few such as myxobacteria have multicellular stages in their life cycles...

s (bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...

 and archaea
Archaea
The Archaea are a group of single-celled microorganisms. A single individual or species from this domain is called an archaeon...

) store their DNA only in the cytoplasm
Cytoplasm
The cytoplasm is a small gel-like substance residing between the cell membrane holding all the cell's internal sub-structures , except for the nucleus. All the contents of the cells of prokaryote organisms are contained within the cytoplasm...

. Within the chromosomes, chromatin
Chromatin
Chromatin is the combination of DNA and proteins that make up the contents of the nucleus of a cell. The primary functions of chromatin are; to package DNA into a smaller volume to fit in the cell, to strengthen the DNA to allow mitosis and meiosis and prevent DNA damage, and to control gene...

 proteins such as histone
Histone
In biology, histones are highly alkaline proteins found in eukaryotic cell nuclei that package and order the DNA into structural units called nucleosomes. They are the chief protein components of chromatin, acting as spools around which DNA winds, and play a role in gene regulation...

s compact and organize DNA. These compact structures guide the interactions between DNA and other proteins, helping control which parts of the DNA are transcribed.

Properties


DNA is a long polymer
Polymer
A polymer is a large molecule composed of repeating structural units. These subunits are typically connected by covalent chemical bonds...

 made from repeating units called nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

s. As first discovered by James D. Watson
James D. Watson
James Dewey Watson is an American molecular biologist, geneticist, and zoologist, best known as one of the co-discoverers of the structure of DNA in 1953 with Francis Crick...

 and Francis Crick
Francis Crick
Francis Harry Compton Crick OM FRS was an English molecular biologist, biophysicist, and neuroscientist, and most noted for being one of two co-discoverers of the structure of the DNA molecule in 1953, together with James D. Watson...

, the structure of DNA of all species comprises two helical chains each coiled round the same axis, and each with a pitch of 34 Ångström
Ångström
The angstrom or ångström, is a unit of length equal to 1/10,000,000,000 of a meter . Its symbol is the Swedish letter Å....

s (3.4 nanometre
Nanometre
A nanometre is a unit of length in the metric system, equal to one billionth of a metre. The name combines the SI prefix nano- with the parent unit name metre .The nanometre is often used to express dimensions on the atomic scale: the diameter...

s) and a radius of 10 Ångström
Ångström
The angstrom or ångström, is a unit of length equal to 1/10,000,000,000 of a meter . Its symbol is the Swedish letter Å....

s (1.0 nanometre
Nanometre
A nanometre is a unit of length in the metric system, equal to one billionth of a metre. The name combines the SI prefix nano- with the parent unit name metre .The nanometre is often used to express dimensions on the atomic scale: the diameter...

s). According to another study, when measured in a particular solution, the DNA chain measured 22 to 26 Ångström
Ångström
The angstrom or ångström, is a unit of length equal to 1/10,000,000,000 of a meter . Its symbol is the Swedish letter Å....

s wide (2.2 to 2.6 nanometre
Nanometre
A nanometre is a unit of length in the metric system, equal to one billionth of a metre. The name combines the SI prefix nano- with the parent unit name metre .The nanometre is often used to express dimensions on the atomic scale: the diameter...

s), and one nucleotide unit measured 3.3 Å (0.33 nm) long. Although each individual repeating unit is very small, DNA polymers can be very large molecules containing millions of nucleotides. For instance, the largest human chromosome, chromosome number 1, is approximately 220 million base pair
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

s long.

In living organisms DNA does not usually exist as a single molecule, but instead as a pair of molecules that are held tightly together. These two long strands entwine like vines, in the shape of a double helix. The nucleotide repeats contain both the segment of the backbone of the molecule, which holds the chain together, and a nucleobase, which interacts with the other DNA strand in the helix. A nucleobase linked to a sugar is called a nucleoside
Nucleoside
Nucleosides are glycosylamines consisting of a nucleobase bound to a ribose or deoxyribose sugar via a beta-glycosidic linkage...

 and a base linked to a sugar and one or more phosphate groups is called a nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

. Polymers comprising multiple linked nucleotides (as in DNA) are called a polynucleotide
Polynucleotide
A polynucleotide molecule is a biopolymer composed of 13 or more nucleotide monomers covalently bonded in a chain. DNA and RNA are examples of polynucleotides with distinct biological function. The prefix poly comes from the ancient Greek πολυς...

.

The backbone of the DNA strand is made from alternating phosphate
Phosphate
A phosphate, an inorganic chemical, is a salt of phosphoric acid. In organic chemistry, a phosphate, or organophosphate, is an ester of phosphoric acid. Organic phosphates are important in biochemistry and biogeochemistry or ecology. Inorganic phosphates are mined to obtain phosphorus for use in...

 and sugar
Carbohydrate
A carbohydrate is an organic compound with the empirical formula ; that is, consists only of carbon, hydrogen, and oxygen, with a hydrogen:oxygen atom ratio of 2:1 . However, there are exceptions to this. One common example would be deoxyribose, a component of DNA, which has the empirical...

 residues. The sugar in DNA is 2-deoxyribose
Deoxyribose
Deoxyribose, more, precisely 2-deoxyribose, is a monosaccharide with idealized formula H---3-H. Its name indicates that it is a deoxy sugar, meaning that it is derived from the sugar ribose by loss of an oxygen atom...

, which is a pentose
Pentose
A pentose is a monosaccharide with five carbon atoms. Pentoses are organized into two groups. Aldopentoses have an aldehyde functional group at position 1...

 (five-carbon
Carbon
Carbon is the chemical element with symbol C and atomic number 6. As a member of group 14 on the periodic table, it is nonmetallic and tetravalent—making four electrons available to form covalent chemical bonds...

) sugar. The sugars are joined together by phosphate groups that form phosphodiester bond
Phosphodiester bond
A phosphodiester bond is a group of strong covalent bonds between a phosphate group and two 5-carbon ring carbohydrates over two ester bonds. Phosphodiester bonds are central to all known life, as they make up the backbone of each helical strand of DNA...

s between the third and fifth carbon atom
Atom
The atom is a basic unit of matter that consists of a dense central nucleus surrounded by a cloud of negatively charged electrons. The atomic nucleus contains a mix of positively charged protons and electrically neutral neutrons...

s of adjacent sugar rings. These asymmetric bonds
Covalent bond
A covalent bond is a form of chemical bonding that is characterized by the sharing of pairs of electrons between atoms. The stable balance of attractive and repulsive forces between atoms when they share electrons is known as covalent bonding....

 mean a strand of DNA has a direction. In a double helix the direction of the nucleotides in one strand is opposite to their direction in the other strand: the strands are antiparallel. The asymmetric ends of DNA strands are called the 5′
Directionality (molecular biology)
Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. The chemical convention of naming carbon atoms in the nucleotide sugar-ring numerically gives rise to a 5′-end and a 3′-end...

 (five prime) and 3′
Directionality (molecular biology)
Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. The chemical convention of naming carbon atoms in the nucleotide sugar-ring numerically gives rise to a 5′-end and a 3′-end...

 (three prime) ends, with the 5' end having a terminal phosphate group and the 3' end a terminal hydroxyl group. One major difference between DNA and RNA is the sugar, with the 2-deoxyribose in DNA being replaced by the alternative pentose sugar ribose
Ribose
Ribose is an organic compound with the formula C5H10O5; specifically, a monosaccharide with linear form H––4–H, which has all the hydroxyl groups on the same side in the Fischer projection....

 in RNA.

The DNA double helix is stabilized primarily by two forces: hydrogen bond
Hydrogen bond
A hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, such as nitrogen, oxygen or fluorine, that comes from another molecule or chemical group. The hydrogen must be covalently bonded to another electronegative atom to create the bond...

s between nucleotides and base-stacking
Stacking (chemistry)
In chemistry, pi stacking refers to attractive, noncovalent interactions between aromatic rings. These interactions are historically thought to be important in to base stacking of DNA nucleotides, protein folding, template-directed synthesis, materials science, and molecular recognition, although...

 interactions among the aromatic nucleobases. In the aqueous environment of the cell, the conjugated π bonds
Pi bond
In chemistry, pi bonds are covalent chemical bonds where two lobes of one involved atomic orbital overlap two lobes of the other involved atomic orbital...

 of nucleotide bases align perpendicular to the axis of the DNA molecule, minimizing their interaction with the solvation shell
Solvation shell
A Solvation shell is a shell of any chemical species acting as a solvent, surrounding a solute species. When the solvent is water it is often referred to as a hydration shell or hydration sphere....

 and therefore, the Gibbs free energy
Gibbs free energy
In thermodynamics, the Gibbs free energy is a thermodynamic potential that measures the "useful" or process-initiating work obtainable from a thermodynamic system at a constant temperature and pressure...

. The four bases found in DNA are adenine
Adenine
Adenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...

 (abbreviated A), cytosine
Cytosine
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...

 (C), guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...

 (G) and thymine
Thymine
Thymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be derived by methylation of uracil at...

 (T). These four bases are attached to the sugar/phosphate to form the complete nucleotide, as shown for adenosine monophosphate
Adenosine monophosphate
Adenosine monophosphate , also known as 5'-adenylic acid, is a nucleotide that is used as a monomer in RNA. It is an ester of phosphoric acid and the nucleoside adenosine. AMP consists of a phosphate group, the sugar ribose, and the nucleobase adenine...

.

The nucleobases are classified into two types: the purine
Purine
A purine is a heterocyclic aromatic organic compound, consisting of a pyrimidine ring fused to an imidazole ring. Purines, including substituted purines and their tautomers, are the most widely distributed kind of nitrogen-containing heterocycle in nature....

s, A and G, being fused five- and six-membered heterocyclic compound
Heterocyclic compound
A heterocyclic compound is a cyclic compound which has atoms of at least two different elements as members of its ring. The counterparts of heterocyclic compounds are homocyclic compounds, the rings of which are made of a single element....

s, and the pyrimidine
Pyrimidine
Pyrimidine is a heterocyclic aromatic organic compound similar to benzene and pyridine, containing two nitrogen atoms at positions 1 and 3 of the six-member ring...

s, the six-membered rings C and T. A fifth pyrimidine nucleobase, uracil
Uracil
Uracil is one of the four nucleobases in the nucleic acid of RNA that are represented by the letters A, G, C and U. The others are adenine, cytosine, and guanine. In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by thymine.Uracil is a common and...

 (U), usually takes the place of thymine in RNA and differs from thymine by lacking a methyl group
Methyl group
Methyl group is a functional group derived from methane, containing one carbon atom bonded to three hydrogen atoms —CH3. The group is often abbreviated Me. Such hydrocarbon groups occur in many organic compounds. The methyl group can be found in three forms: anion, cation and radical. The anion...

 on its ring. Uracil is not usually found in DNA, occurring only as a breakdown product of cytosine. In addition to RNA and DNA a large number of artificial nucleic acid analogues
Nucleic acid analogues
Nucleic acid analogues are compounds structurally similar to naturally occurring RNA and DNA, used in medicine and in molecular biology research....

 have also been created to study the proprieties of nucleic acids, or for use in biotechnology.

Grooves


Twin helical strands form the DNA backbone. Another double helix may be found by tracing the spaces, or grooves, between the strands. These voids are adjacent to the base pairs and may provide a binding site
Binding site
In biochemistry, a binding site is a region on a protein, DNA, or RNA to which specific other molecules and ions—in this context collectively called ligands—form a chemical bond...

. As the strands are not directly opposite each other, the grooves are unequally sized. One groove, the major groove, is 22 Å wide and the other, the minor groove, is 12 Å wide. The narrowness of the minor groove means that the edges of the bases are more accessible in the major groove. As a result, proteins like transcription factor
Transcription factor
In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences, thereby controlling the flow of genetic information from DNA to mRNA...

s that can bind to specific sequences in double-stranded DNA usually make contacts to the sides of the bases exposed in the major groove. This situation varies in unusual conformations of DNA within the cell (see below), but the major and minor grooves are always named to reflect the differences in size that would be seen if the DNA is twisted back into the ordinary B form.

Base pairing


In a DNA double helix, each type of nucleobase on one strand normally interacts with just one type of nucleobase on the other strand. This is called complementary base pair
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

ing. Here, purines form hydrogen bond
Hydrogen bond
A hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, such as nitrogen, oxygen or fluorine, that comes from another molecule or chemical group. The hydrogen must be covalently bonded to another electronegative atom to create the bond...

s to pyrimidines, with A bonding only to T, and C bonding only to G. This arrangement of two nucleotides binding together across the double helix is called a base pair. As hydrogen bonds are not covalent
Covalent bond
A covalent bond is a form of chemical bonding that is characterized by the sharing of pairs of electrons between atoms. The stable balance of attractive and repulsive forces between atoms when they share electrons is known as covalent bonding....

, they can be broken and rejoined relatively easily. The two strands of DNA in a double helix can therefore be pulled apart like a zipper, either by a mechanical force or high temperature
Temperature
Temperature is a physical property of matter that quantitatively expresses the common notions of hot and cold. Objects of low temperature are cold, while various degrees of higher temperatures are referred to as warm or hot...

. As a result of this complementarity, all the information in the double-stranded sequence of a DNA helix is duplicated on each strand, which is vital in DNA replication. Indeed, this reversible and specific interaction between complementary base pairs is critical for all the functions of DNA in living organisms.

The two types of base pairs form different numbers of hydrogen bonds, AT forming two hydrogen bonds, and GC forming three hydrogen bonds (see figures, left).
DNA with high GC-content
GC-content
In molecular biology and genetics, GC-content is the percentage of nitrogenous bases on a DNA molecule that are either guanine or cytosine . This may refer to a specific fragment of DNA or RNA, or that of the whole genome...

 is more stable than DNA with low GC-content. Although it is often stated that this is due to the added stability of an additional hydrogen bond, this is incorrect. DNA with high GC-content is more stable due to intra-strand base stacking interactions.

As noted above, most DNA molecules are actually two polymer strands, bound together in a helical fashion by noncovalent bonds; this double stranded structure (dsDNA) is maintained largely by the intrastrand base stacking interactions, which are strongest for G,C stacks. The two strands can come apart – a process known as melting – to form two ss DNA molecules. Melting occurs when conditions favor ssDNA; such conditions are high temperature, low salt and high pH (low pH also melts DNA, but since DNA is unstable due to acid depurination, low pH is rarely used).
The stability of the dsDNA form depends not only on the GC-content (% G,C basepairs) but also on sequence (since stacking is sequence specific) and also length (longer molecules are more stable). The stability can be measured in various ways; a common way is the "melting temperature", which is the temperature at which 50% of the ds molecules are converted to ss molecules; melting temperature is dependent on ionic strength and the concentration of DNA.
As a result, it is both the percentage of GC base pairs and the overall length of a DNA double helix that determine the strength of the association between the two strands of DNA. Long DNA helices with a high GC-content have stronger-interacting strands, while short helices with high AT content have weaker-interacting strands. In biology, parts of the DNA double helix that need to separate easily, such as the TATAAT Pribnow box
Pribnow box
The Pribnow box is the sequence TATAAT of six nucleotides that is an essential part of a promoter site on DNA for transcription to occur in bacteria...

 in some promoters, tend to have a high AT content, making the strands easier to pull apart.

In the laboratory, the strength of this interaction can be measured by finding the temperature required to break the hydrogen bonds, their melting temperature (also called Tm value). When all the base pairs in a DNA double helix melt, the strands separate and exist in solution as two entirely independent molecules. These single-stranded DNA molecules (ssDNA) have no single common shape, but some conformations are more stable than others.

Sense and antisense


A DNA sequence is called "sense" if its sequence is the same as that of a messenger RNA
Messenger RNA
Messenger RNA is a molecule of RNA encoding a chemical "blueprint" for a protein product. mRNA is transcribed from a DNA template, and carries coding information to the sites of protein synthesis: the ribosomes. Here, the nucleic acid polymer is translated into a polymer of amino acids: a protein...

 copy that is translated into protein. The sequence on the opposite strand is called the "antisense" sequence. Both sense and antisense sequences can exist on different parts of the same strand of DNA (i.e. both strands contain both sense and antisense sequences). In both prokaryotes and eukaryotes, antisense RNA sequences are produced, but the functions of these RNAs are not entirely clear. One proposal is that antisense RNAs are involved in regulating gene expression
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...

 through RNA-RNA base pairing.

A few DNA sequences in prokaryotes and eukaryotes, and more in plasmid
Plasmid
In microbiology and genetics, a plasmid is a DNA molecule that is separate from, and can replicate independently of, the chromosomal DNA. They are double-stranded and, in many cases, circular...

s and virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

es, blur the distinction between sense and antisense strands by having overlapping genes. In these cases, some DNA sequences do double duty, encoding one protein when read along one strand, and a second protein when read in the opposite direction along the other strand. In bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...

, this overlap may be involved in the regulation of gene transcription, while in viruses, overlapping genes increase the amount of information that can be encoded within the small viral genome.

Supercoiling



DNA can be twisted like a rope in a process called DNA supercoil
DNA supercoil
DNA supercoiling refers to the over- or under-winding of a DNA strand, and is an expression of the strain on the polymer. Supercoiling is important in a number of biological processes, such as compacting DNA. Additionally, certain enzymes such as topoisomerases are able to change DNA topology to...

ing. With DNA in its "relaxed" state, a strand usually circles the axis of the double helix once every 10.4 base pairs, but if the DNA is twisted the strands become more tightly or more loosely wound. If the DNA is twisted in the direction of the helix, this is positive supercoiling, and the bases are held more tightly together. If they are twisted in the opposite direction, this is negative supercoiling, and the bases come apart more easily. In nature, most DNA has slight negative supercoiling that is introduced by enzyme
Enzyme
Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...

s called topoisomerase
Topoisomerase
Topoisomerases are enzymes that regulate the overwinding or underwinding of DNA. The winding problem of DNA arises due to the intertwined nature of its double helical structure. For example, during DNA replication, DNA becomes overwound ahead of a replication fork...

s. These enzymes are also needed to relieve the twisting stresses introduced into DNA strands during processes such as transcription
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

 and DNA replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...

.

Alternate DNA structures



DNA exists in many possible conformations
Conformational isomerism
In chemistry, conformational isomerism is a form of stereoisomerism in which the isomers can be interconverted exclusively by rotations about formally single bonds...

 that include A-DNA
A-DNA
A-DNA is one of the many possible double helical structures of DNA. A-DNA is thought to be one of three biologically active double helical structures along with B- and Z-DNA. It is a right-handed double helix fairly similar to the more common and well-known B-DNA form, but with a shorter more...

, B-DNA, and Z-DNA
Z-DNA
Z-DNA is one of the many possible double helical structures of DNA. It is a left-handed double helical structure in which the double helix winds to the left in a zig-zag pattern...

 forms, although, only B-DNA and Z-DNA have been directly observed in functional organisms. The conformation that DNA adopts depends on the hydration level, DNA sequence, the amount and direction of supercoiling, chemical modifications of the bases, the type and concentration of metal ion
Ion
An ion is an atom or molecule in which the total number of electrons is not equal to the total number of protons, giving it a net positive or negative electrical charge. The name was given by physicist Michael Faraday for the substances that allow a current to pass between electrodes in a...

s, as well as the presence of polyamine
Polyamine
A polyamine is an organic compound having two or more primary amino groups .This class of compounds includes several synthetic substances that are important feedstocks for the chemical industry, such as ethylene diamine , 1,3-diaminopropane , and hexamethylenediamine...

s in solution.

The first published reports of A-DNA X-ray diffraction patterns
X-ray scattering techniques
X-ray scattering techniques are a family of non-destructive analytical techniques which reveal information about the crystallographic structure, chemical composition, and physical properties of materials and thin films...

— and also B-DNA used analyses based on Patterson transforms
Patterson function
The Patterson function is used to solve the phase problem in X-ray crystallography. It was introduced in 1935 by Arthur Lindo Patterson while he was a visiting researcher in the laboratory of Bertram Eugene Warren at MIT....

 that provided only a limited amount of structural information for oriented fibers of DNA. An alternate analysis was then proposed by Wilkins et al., in 1953, for the in vivo B-DNA X-ray diffraction/scattering patterns of highly hydrated DNA fibers in terms of squares of Bessel function
Bessel function
In mathematics, Bessel functions, first defined by the mathematician Daniel Bernoulli and generalized by Friedrich Bessel, are canonical solutions y of Bessel's differential equation:...

s. In the same journal, James D. Watson
James D. Watson
James Dewey Watson is an American molecular biologist, geneticist, and zoologist, best known as one of the co-discoverers of the structure of DNA in 1953 with Francis Crick...

 and Francis Crick
Francis Crick
Francis Harry Compton Crick OM FRS was an English molecular biologist, biophysicist, and neuroscientist, and most noted for being one of two co-discoverers of the structure of the DNA molecule in 1953, together with James D. Watson...

 presented their molecular modeling
Molecular models of DNA
Molecular models of DNA structures are representations of the molecular geometry and topology of Deoxyribonucleic acid molecules using one of several means, with the aim of simplifying and presenting the essential, physical and chemical, properties of DNA molecular structures either in vivo or in...

 analysis of the DNA X-ray diffraction patterns to suggest that the structure was a double-helix.

Although the `B-DNA form' is most common under the conditions found in cells, it is not a well-defined conformation but a family of related DNA conformations that occur at the high hydration levels present in living cells. Their corresponding X-ray diffraction and scattering patterns are characteristic of molecular paracrystals
Paracrystalline
Paracrystalline materials are defined as having short and medium range ordering in their lattice but lacking long-range ordering at least in one direction....

 with a significant degree of disorder.

Compared to B-DNA, the A-DNA form is a wider right-handed spiral, with a shallow, wide minor groove and a narrower, deeper major groove. The A form occurs under non-physiological conditions in partially dehydrated samples of DNA, while in the cell it may be produced in hybrid pairings of DNA and RNA strands, as well as in enzyme-DNA complexes. Segments of DNA where the bases have been chemically modified by methylation
Methylation
In the chemical sciences, methylation denotes the addition of a methyl group to a substrate or the substitution of an atom or group by a methyl group. Methylation is a form of alkylation with, to be specific, a methyl group, rather than a larger carbon chain, replacing a hydrogen atom...

 may undergo a larger change in conformation and adopt the Z form
Z-DNA
Z-DNA is one of the many possible double helical structures of DNA. It is a left-handed double helical structure in which the double helix winds to the left in a zig-zag pattern...

. Here, the strands turn about the helical axis in a left-handed
Left-handed
Left-handedness is the preference for the left hand over the right for everyday activities such as writing. In ancient times it was seen as a sign of the devil, and was abhorred in many cultures...

 spiral, the opposite of the more common B form. These unusual structures can be recognized by specific Z-DNA binding proteins and may be involved in the regulation of transcription.

Alternate DNA chemistry


For a number of years exobiologists have proposed the existence of a shadow biosphere
Shadow biosphere
The term "shadow biosphere" was coined by Carol Cleland and Shelley Copley. A shadow biosphere is a hypothetical microbial biosphere of Earth that uses radically different biochemical and molecular processes than currently known life...

, a postulated microbial biosphere of Earth that uses radically different biochemical and molecular processes than currently known life. One of the proposals was the existence of lifeforms that use arsenic instead of phosphorus in DNA.

A December 2010 NASA
NASA
The National Aeronautics and Space Administration is the agency of the United States government that is responsible for the nation's civilian space program and for aeronautics and aerospace research...

 press conference stated that the bacterium GFAJ-1
GFAJ-1
GFAJ-1 is a strain of rod-shaped bacterium in the family Halomonadaceae. The extremophile was isolated from the hypersaline and alkaline Mono Lake in eastern California by a research team led by NASA astrobiologist Felisa Wolfe-Simon...

, which has evolved in an arsenic-rich environment, is the first terrestrial lifeform found which may have this ability. The bacterium was found in Mono Lake
Mono Lake
Mono Lake is a large, shallow saline lake in Mono County, California, formed at least 760,000 years ago as a terminal lake in a basin that has no outlet to the ocean...

, east of Yosemite National Park
Yosemite National Park
Yosemite National Park is a United States National Park spanning eastern portions of Tuolumne, Mariposa and Madera counties in east central California, United States. The park covers an area of and reaches across the western slopes of the Sierra Nevada mountain chain...

. GFAJ-1 is a rod-shaped extremophile
Extremophile
An extremophile is an organism that thrives in physically or geochemically extreme conditions that are detrimental to most life on Earth. In contrast, organisms that live in more moderate environments may be termed mesophiles or neutrophiles...

 bacterium in the family Halomonadaceae
Halomonadaceae
The Halomonadaceae are a family of halophilic Proteobacteria.-History:The family was originally created in 1988 to contain the genera Halomonas and Deleya....

 that, when starved of phosphorus
Phosphorus
Phosphorus is the chemical element that has the symbol P and atomic number 15. A multivalent nonmetal of the nitrogen group, phosphorus as a mineral is almost always present in its maximally oxidized state, as inorganic phosphate rocks...

, may be capable of incorporating the usually poisonous element arsenic
Arsenic
Arsenic is a chemical element with the symbol As, atomic number 33 and relative atomic mass 74.92. Arsenic occurs in many minerals, usually in conjunction with sulfur and metals, and also as a pure elemental crystal. It was first documented by Albertus Magnus in 1250.Arsenic is a metalloid...

 in its DNA. This discovery may lend weight to the long-standing idea that extraterrestrial life
Life on Other Planets
Life On Other Planets, or L.O.O.P as it is often abbreviated to, is the fourth album from English rock band Supergrass. It is the first album that includes Rob Coombes as an official member of the band, and originally went under the working title of 'Get Lost'...

 could have a different chemical makeup from life on Earth
Earth
Earth is the third planet from the Sun, and the densest and fifth-largest of the eight planets in the Solar System. It is also the largest of the Solar System's four terrestrial planets...

. The research was carried out by a team led by Felisa Wolfe-Simon
Felisa Wolfe-Simon
Felisa Wolfe-Simon is an American microbial geobiologist and biogeochemist. As a NASA research fellow in residence at the US Geological Survey and a member of the NASA Astrobiology Institute, Wolfe-Simon led the team that discovered GFAJ-1, an extremophile bacterium that they claim is capable of...

, a geomicrobiologist and geobiochemist, a Postdoctoral Fellow of the NASA Astrobiology Institute
NASA Astrobiology Institute
The NASA Astrobiology Institute was established in 1998 by the National Aeronautics and Space Administration "to develop the field of astrobiology and provide a scientific framework for flight missions". The NAI is a virtual, distributed organization that integrates astrobiology research and...

 with Arizona State University
Arizona State University
Arizona State University is a public research university located in the Phoenix Metropolitan Area of the State of Arizona...

. This finding has, however, faced strong criticism from the scientific community; scientists have argued that there is no evidence that arsenic is actually incorporated into biomolecules. Independent conformation of this finding has also not yet been possible.

Quadruplex structures


At the ends of the linear chromosomes are specialized regions of DNA called telomere
Telomere
A telomere is a region of repetitive DNA sequences at the end of a chromosome, which protects the end of the chromosome from deterioration or from fusion with neighboring chromosomes. Its name is derived from the Greek nouns telos "end" and merοs "part"...

s. The main function of these regions is to allow the cell to replicate chromosome ends using the enzyme telomerase
Telomerase
Telomerase is an enzyme that adds DNA sequence repeats to the 3' end of DNA strands in the telomere regions, which are found at the ends of eukaryotic chromosomes. This region of repeated nucleotide called telomeres contains non-coding DNA material and prevents constant loss of important DNA from...

, as the enzymes that normally replicate DNA cannot copy the extreme 3′ ends of chromosomes. These specialized chromosome caps also help protect the DNA ends, and stop the DNA repair
DNA repair
DNA repair refers to a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as UV light and radiation can cause DNA damage, resulting in as many as 1...

 systems in the cell from treating them as damage to be corrected. In human cells, telomeres are usually lengths of single-stranded DNA containing several thousand repeats of a simple TTAGGG sequence.

These guanine-rich sequences may stabilize chromosome ends by forming structures of stacked sets of four-base units, rather than the usual base pairs found in other DNA molecules. Here, four guanine bases form a flat plate and these flat four-base units then stack on top of each other, to form a stable G-quadruplex
G-quadruplex
In molecular biology, G-quadruplexes are nucleic acid sequences that are rich in guanine and are capable of forming a four-stranded structure...

structure. These structures are stabilized by hydrogen bonding between the edges of the bases and chelation
Chelation
Chelation is the formation or presence of two or more separate coordinate bonds between apolydentate ligand and a single central atom....

 of a metal ion in the centre of each four-base unit. Other structures can also be formed, with the central set of four bases coming from either a single strand folded around the bases, or several different parallel strands, each contributing one base to the central structure.

In addition to these stacked structures, telomeres also form large loop structures called telomere loops, or T-loops. Here, the single-stranded DNA curls around in a long circle stabilized by telomere-binding proteins. At the very end of the T-loop, the single-stranded telomere DNA is held onto a region of double-stranded DNA by the telomere strand disrupting the double-helical DNA and base pairing to one of the two strands. This triple-stranded
Triple-stranded DNA
A triple-stranded DNA is a structure of DNA in which three oligonucleotides wind around each other and form a triple helix. In this structure, one strand binds to a B-form DNA double helix through Hoogsteen or reversed Hoogsteen hydrogen bonds....

 structure is called a displacement loop or D-loop
D-loop
In molecular biology, a displacement loop or D-loop is a DNA structure where the two strands of a double-stranded DNA molecule are separated for a stretch and held apart by a third strand of DNA. The third strand has a base sequence which is complementary to one of the main strands and pairs with...

.

Branched DNA


In DNA fraying occurs when non-complementary regions exist at the end of an otherwise complementary double-strand of DNA. However, branched DNA can occur if a third strand of DNA is introduced and contains adjoining regions able to hybridize with the frayed regions of the pre-existing double-strand. Although the simplest example of branched DNA involves only three strands of DNA, complexes involving additional strands and multiple branches are also possible. Branched DNA can be used in nanotechnology
Nanotechnology
Nanotechnology is the study of manipulating matter on an atomic and molecular scale. Generally, nanotechnology deals with developing materials, devices, or other structures possessing at least one dimension sized from 1 to 100 nanometres...

 to construct geometric shapes, see the section on uses in technology below.

Vibration


DNA may carry out low-frequency collective motion as observed by the Raman spectroscopy
Raman spectroscopy
Raman spectroscopy is a spectroscopic technique used to study vibrational, rotational, and other low-frequency modes in a system.It relies on inelastic scattering, or Raman scattering, of monochromatic light, usually from a laser in the visible, near infrared, or near ultraviolet range...

 and analyzed with a quasi-continuum model.

Base modifications



The expression of genes is influenced by how the DNA is packaged in chromosomes, in a structure called chromatin
Chromatin
Chromatin is the combination of DNA and proteins that make up the contents of the nucleus of a cell. The primary functions of chromatin are; to package DNA into a smaller volume to fit in the cell, to strengthen the DNA to allow mitosis and meiosis and prevent DNA damage, and to control gene...

. Base modifications can be involved in packaging, with regions that have low or no gene expression usually containing high levels of methylation
Methylation
In the chemical sciences, methylation denotes the addition of a methyl group to a substrate or the substitution of an atom or group by a methyl group. Methylation is a form of alkylation with, to be specific, a methyl group, rather than a larger carbon chain, replacing a hydrogen atom...

 of cytosine
Cytosine
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...

 bases. For example, cytosine methylation, produces 5-methylcytosine
5-Methylcytosine
5-Methylcytosine is a methylated form of the DNA base cytosine that may be involved in the regulation of gene transcription. When cytosine is methylated, the DNA maintains the same sequence, but the expression of methylated genes can be altered .In the figure on the right, a methyl group, is...

, which is important for X-chromosome inactivation
X-inactivation
X-inactivation is a process by which one of the two copies of the X chromosome present in female mammals is inactivated. The inactive X chromosome is silenced by packaging into transcriptionally inactive heterochromatin...

. The average level of methylation varies between organisms – the worm Caenorhabditis elegans
Caenorhabditis elegans
Caenorhabditis elegans is a free-living, transparent nematode , about 1 mm in length, which lives in temperate soil environments. Research into the molecular and developmental biology of C. elegans was begun in 1974 by Sydney Brenner and it has since been used extensively as a model...

lacks cytosine methylation, while vertebrate
Vertebrate
Vertebrates are animals that are members of the subphylum Vertebrata . Vertebrates are the largest group of chordates, with currently about 58,000 species described. Vertebrates include the jawless fishes, bony fishes, sharks and rays, amphibians, reptiles, mammals, and birds...

s have higher levels, with up to 1% of their DNA containing 5-methylcytosine. Despite the importance of 5-methylcytosine, it can deaminate
Deamination
Deamination is the removal of an amine group from a molecule. Enzymes which catalyse this reaction are called deaminases.In the human body, deamination takes place primarily in the liver, however glutamate is also deaminated in the kidneys. Deamination is the process by which amino acids are...

 to leave a thymine base, so methylated cytosines are particularly prone to mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...

s. Other base modifications include adenine methylation in bacteria, the presence of 5-hydroxymethylcytosine
5-Hydroxymethylcytosine
5-Hydroxymethylcytosine is a DNA pyrimidine nitrogen base. It is formed from the DNA base cytosine by adding a methyl group and then a hydroxy group. It is important in epigenetics, because the hydroxymethyl group on the cytosine can possibly switch a gene on and off. It was first seen in...

 in the brain
Brain
The brain is the center of the nervous system in all vertebrate and most invertebrate animals—only a few primitive invertebrates such as sponges, jellyfish, sea squirts and starfishes do not have one. It is located in the head, usually close to primary sensory apparatus such as vision, hearing,...

, and the glycosylation
Glycosylation
Glycosylation is the reaction in which a carbohydrate, i.e. a glycosyl donor, is attached to a hydroxyl or other functional group of another molecule . In biology glycosylation refers to the enzymatic process that attaches glycans to proteins, lipids, or other organic molecules...

 of uracil to produce the "J-base" in kinetoplastid
Kinetoplastid
The kinetoplastids are a group of single-cell flagellate protozoa, including a number of parasites responsible for serious diseases in humans and other animals, as well as various forms found in soil and aquatic environments...

s.

Damage



DNA can be damaged by many sorts of mutagen
Mutagen
In genetics, a mutagen is a physical or chemical agent that changes the genetic material, usually DNA, of an organism and thus increases the frequency of mutations above the natural background level. As many mutations cause cancer, mutagens are therefore also likely to be carcinogens...

s, which change the DNA sequence. Mutagens include oxidizing agent
Oxidizing agent
An oxidizing agent can be defined as a substance that removes electrons from another reactant in a redox chemical reaction...

s, alkylating agents
Alkylation
Alkylation is the transfer of an alkyl group from one molecule to another. The alkyl group may be transferred as an alkyl carbocation, a free radical, a carbanion or a carbene . Alkylating agents are widely used in chemistry because the alkyl group is probably the most common group encountered in...

 and also high-energy electromagnetic radiation
Electromagnetic radiation
Electromagnetic radiation is a form of energy that exhibits wave-like behavior as it travels through space...

 such as ultraviolet
Ultraviolet
Ultraviolet light is electromagnetic radiation with a wavelength shorter than that of visible light, but longer than X-rays, in the range 10 nm to 400 nm, and energies from 3 eV to 124 eV...

 light and X-ray
X-ray
X-radiation is a form of electromagnetic radiation. X-rays have a wavelength in the range of 0.01 to 10 nanometers, corresponding to frequencies in the range 30 petahertz to 30 exahertz and energies in the range 120 eV to 120 keV. They are shorter in wavelength than UV rays and longer than gamma...

s. The type of DNA damage produced depends on the type of mutagen. For example, UV light can damage DNA by producing thymine dimers, which are cross-links between pyrimidine bases. On the other hand, oxidants such as free radicals
Radical (chemistry)
Radicals are atoms, molecules, or ions with unpaired electrons on an open shell configuration. Free radicals may have positive, negative, or zero charge...

 or hydrogen peroxide
Hydrogen peroxide
Hydrogen peroxide is the simplest peroxide and an oxidizer. Hydrogen peroxide is a clear liquid, slightly more viscous than water. In dilute solution, it appears colorless. With its oxidizing properties, hydrogen peroxide is often used as a bleach or cleaning agent...

 produce multiple forms of damage, including base modifications, particularly of guanosine, and double-strand breaks. A typical human cell contains about 150,000 bases that have suffered oxidative damage. Of these oxidative lesions, the most dangerous are double-strand breaks, as these are difficult to repair and can produce point mutation
Point mutation
A point mutation, or single base substitution, is a type of mutation that causes the replacement of a single base nucleotide with another nucleotide of the genetic material, DNA or RNA. Often the term point mutation also includes insertions or deletions of a single base pair...

s, insertions and deletions from the DNA sequence, as well as chromosomal translocation
Chromosomal translocation
In genetics, a chromosome translocation is a chromosome abnormality caused by rearrangement of parts between nonhomologous chromosomes. A gene fusion may be created when the translocation joins two otherwise separated genes, the occurrence of which is common in cancer. It is detected on...

s.

Many mutagens fit into the space between two adjacent base pairs, this is called intercalation
Intercalation (chemistry)
In chemistry, intercalation is the reversible inclusion of a molecule between two other molecules . Examples include DNA intercalation and graphite intercalation compounds.- DNA intercalation :...

. Most intercalators are aromatic
Aromaticity
In organic chemistry, Aromaticity is a chemical property in which a conjugated ring of unsaturated bonds, lone pairs, or empty orbitals exhibit a stabilization stronger than would be expected by the stabilization of conjugation alone. The earliest use of the term was in an article by August...

 and planar molecules; examples include ethidium bromide
Ethidium bromide
Ethidium bromide is an intercalating agent commonly used as a fluorescent tag in molecular biology laboratories for techniques such as agarose gel electrophoresis. It is commonly abbreviated as "EtBr", which is also an abbreviation for bromoethane...

, acridine
Acridine
Acridine, C13H9N, is an organic compound and a nitrogen heterocycle. Acridine is also used to describe compounds containing the C13N tricycle....

s, daunomycin
Daunorubicin
Daunorubicin or daunomycin is chemotherapeutic of the anthracycline family that is given as a treatment for some types of cancer. It is most commonly used to treat specific types of leukaemia...

, and doxorubicin
Doxorubicin
Doxorubicin INN is a drug used in cancer chemotherapy. It is an anthracycline antibiotic, closely related to the natural product daunomycin, and like all anthracyclines, it works by intercalating DNA....

. In order for an intercalator to fit between base pairs, the bases must separate, distorting the DNA strands by unwinding of the double helix. This inhibits both transcription and DNA replication, causing toxicity and mutations. As a result, DNA intercalators may be carcinogen
Carcinogen
A carcinogen is any substance, radionuclide, or radiation that is an agent directly involved in causing cancer. This may be due to the ability to damage the genome or to the disruption of cellular metabolic processes...

s, and in the case of thalidomide, a teratogen. Others such as benzo[a]pyrene diol epoxide and aflatoxin
Aflatoxin
Aflatoxins are naturally occurring mycotoxins that are produced by many species of Aspergillus, a fungus, the most notable ones being Aspergillus flavus and Aspergillus parasiticus. Aflatoxins are toxic and among the most carcinogenic substances known...

 form DNA adducts which induce errors in replication. Nevertheless, due to their ability to inhibit DNA transcription and replication, other similar toxins are also used in chemotherapy
Chemotherapy
Chemotherapy is the treatment of cancer with an antineoplastic drug or with a combination of such drugs into a standardized treatment regimen....

 to inhibit rapidly growing cancer
Cancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...

 cells.

Biological functions


DNA usually occurs as linear chromosome
Chromosome
A chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences. Chromosomes also contain DNA-bound proteins, which serve to package the DNA and control its functions.Chromosomes...

s in eukaryote
Eukaryote
A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...

s, and circular chromosomes in prokaryote
Prokaryote
The prokaryotes are a group of organisms that lack a cell nucleus , or any other membrane-bound organelles. The organisms that have a cell nucleus are called eukaryotes. Most prokaryotes are unicellular, but a few such as myxobacteria have multicellular stages in their life cycles...

s. The set of chromosomes in a cell makes up its genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

; the human genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...

 has approximately 3 billion base pairs of DNA arranged into 46 chromosomes. The information carried by DNA is held in the sequence
DNA sequence
The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...

 of pieces of DNA called gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s. Transmission
Transmission (genetics)
Genetic transmission is the transfer of genetic information from genes to another generation , almost synonymous with heredity, or from one location in a cell to another....

 of genetic information in genes is achieved via complementary base pairing. For example, in transcription, when a cell uses the information in a gene, the DNA sequence is copied into a complementary RNA sequence through the attraction between the DNA and the correct RNA nucleotides. Usually, this RNA copy is then used to make a matching protein sequence
Peptide sequence
Peptide sequence or amino acid sequence is the order in which amino acid residues, connected by peptide bonds, lie in the chain in peptides and proteins. The sequence is generally reported from the N-terminal end containing free amino group to the C-terminal end containing free carboxyl group...

 in a process called translation, which depends on the same interaction between RNA nucleotides. In alternative fashion, a cell may simply copy its genetic information in a process called DNA replication. The details of these functions are covered in other articles; here we focus on the interactions between DNA and other molecules that mediate the function of the genome.

Genes and genomes



Genomic DNA is tightly and orderly packed in the process called DNA condensation
DNA condensation
DNA condensation refers to the process of compacting DNA molecules in vitro or in vivo. Mechanistic details of DNA packing are essential for its functioning in the process of gene regulation in living systems. Condensed DNA often has surprising properties, which one would not predict from classical...

 to fit the small available volumes of the cell. In eukaryotes, DNA is located in the cell nucleus
Cell nucleus
In cell biology, the nucleus is a membrane-enclosed organelle found in eukaryotic cells. It contains most of the cell's genetic material, organized as multiple long linear DNA molecules in complex with a large variety of proteins, such as histones, to form chromosomes. The genes within these...

, as well as small amounts in mitochondria
Mitochondrion
In cell biology, a mitochondrion is a membrane-enclosed organelle found in most eukaryotic cells. These organelles range from 0.5 to 1.0 micrometers in diameter...

 and chloroplast
Chloroplast
Chloroplasts are organelles found in plant cells and other eukaryotic organisms that conduct photosynthesis. Chloroplasts capture light energy to conserve free energy in the form of ATP and reduce NADP to NADPH through a complex set of processes called photosynthesis.Chloroplasts are green...

s. In prokaryotes, the DNA is held within an irregularly shaped body in the cytoplasm called the nucleoid
Nucleoid
The nucleoid is an irregularly-shaped region within the cell of a prokaryote that contains all or most of the genetic material. In contrast to the nucleus of a eukaryotic cell, it is not surrounded by a nuclear membrane. The genome of prokaryotic organisms generally is a circular, double-stranded...

. The genetic information in a genome is held within genes, and the complete set of this information in an organism is called its genotype
Genotype
The genotype is the genetic makeup of a cell, an organism, or an individual usually with reference to a specific character under consideration...

. A gene is a unit of heredity
Heredity
Heredity is the passing of traits to offspring . This is the process by which an offspring cell or organism acquires or becomes predisposed to the characteristics of its parent cell or organism. Through heredity, variations exhibited by individuals can accumulate and cause some species to evolve...

 and is a region of DNA that influences a particular characteristic in an organism. Genes contain an open reading frame
Open reading frame
In molecular genetics, an open reading frame is a DNA sequence that does not contain a stop codon in a given reading frame.Normally, inserts which interrupt the reading frame of a subsequent region after the start codon cause frameshift mutation of the sequence and dislocate the sequences for stop...

 that can be transcribed, as well as regulatory sequence
Regulatory sequence
A regulatory sequence is a segment of DNA where regulatory proteins such as transcription factors bind preferentially. These regulatory proteins bind to short stretches of DNA called regulatory regions, which are appropriately positioned in the genome, usually a short distance 'upstream' of the...

s such as promoters and enhancers
Enhancer (genetics)
In genetics, an enhancer is a short region of DNA that can be bound with proteins to enhance transcription levels of genes in a gene cluster...

, which control the transcription of the open reading frame.

In many species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...

, only a small fraction of the total sequence of the genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

 encodes protein. For example, only about 1.5% of the human genome consists of protein-coding exon
Exon
An exon is a nucleic acid sequence that is represented in the mature form of an RNA molecule either after portions of a precursor RNA have been removed by cis-splicing or when two or more precursor RNA molecules have been ligated by trans-splicing. The mature RNA molecule can be a messenger RNA...

s, with over 50% of human DNA consisting of non-coding repetitive sequences
Repeated sequence (DNA)
In the study of DNA sequences, one can distinguish two main types of repeated sequence:*Tandem repeats:**Satellite DNA**Minisatellite**Microsatellite*Interspersed repeats:**SINEs...

. The reasons for the presence of so much noncoding DNA
Noncoding DNA
In genetics, noncoding DNA describes components of an organism's DNA sequences that do not encode for protein sequences. In many eukaryotes, a large percentage of an organism's total genome size is noncoding DNA, although the amount of noncoding DNA, and the proportion of coding versus noncoding...

 in eukaryotic genomes and the extraordinary differences in genome size
Genome size
Genome size is the total amount of DNA contained within one copy of a single genome. It is typically measured in terms of mass in picograms or less frequently in Daltons or as the total number of nucleotide base pairs typically in megabases . One picogram equals 978 megabases...

, or C-value
C-value
The term C-value refers to the amount of DNA contained within a haploid nucleus or one half the amount in a diploid somatic cell of a eukaryotic organism, expressed in picograms...

, among species represent a long-standing puzzle known as the "C-value enigma
C-value enigma
The C-value enigma or C-value paradox is a term used to describe the complex puzzle surrounding the extensive variation in nuclear genome size among eukaryotic species...

". However, DNA sequences that do not code protein may still encode functional non-coding RNA
Non-coding RNA
A non-coding RNA is a functional RNA molecule that is not translated into a protein. Less-frequently used synonyms are non-protein-coding RNA , non-messenger RNA and functional RNA . The term small RNA is often used for short bacterial ncRNAs...

 molecules, which are involved in the regulation of gene expression
Regulation of gene expression
Gene modulation redirects here. For information on therapeutic regulation of gene expression, see therapeutic gene modulation.Regulation of gene expression includes the processes that cells and viruses use to regulate the way that the information in genes is turned into gene products...

.

Some noncoding DNA sequences play structural roles in chromosomes. Telomere
Telomere
A telomere is a region of repetitive DNA sequences at the end of a chromosome, which protects the end of the chromosome from deterioration or from fusion with neighboring chromosomes. Its name is derived from the Greek nouns telos "end" and merοs "part"...

s and centromere
Centromere
A centromere is a region of DNA typically found near the middle of a chromosome where two identical sister chromatids come closest in contact. It is involved in cell division as the point of mitotic spindle attachment...

s typically contain few genes, but are important for the function and stability of chromosomes. An abundant form of noncoding DNA in humans are pseudogene
Pseudogene
Pseudogenes are dysfunctional relatives of known genes that have lost their protein-coding ability or are otherwise no longer expressed in the cell...

s, which are copies of genes that have been disabled by mutation. These sequences are usually just molecular fossil
Fossil
Fossils are the preserved remains or traces of animals , plants, and other organisms from the remote past...

s, although they can occasionally serve as raw genetic material for the creation of new genes through the process of gene duplication
Gene duplication
Gene duplication is any duplication of a region of DNA that contains a gene; it may occur as an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome.The second copy of the gene is often free from selective pressure — that is, mutations of it have no...

 and divergence
Divergent evolution
Divergent evolution is the accumulation of differences between groups which can lead to the formation of new species, usually a result of diffusion of the same species to different and isolated environments which blocks the gene flow among the distinct populations allowing differentiated fixation...

.

Transcription and translation



A gene is a sequence of DNA that contains genetic information and can influence the phenotype
Phenotype
A phenotype is an organism's observable characteristics or traits: such as its morphology, development, biochemical or physiological properties, behavior, and products of behavior...

 of an organism. Within a gene, the sequence of bases along a DNA strand defines a messenger RNA
Messenger RNA
Messenger RNA is a molecule of RNA encoding a chemical "blueprint" for a protein product. mRNA is transcribed from a DNA template, and carries coding information to the sites of protein synthesis: the ribosomes. Here, the nucleic acid polymer is translated into a polymer of amino acids: a protein...

 sequence, which then defines one or more protein sequences. The relationship between the nucleotide sequences of genes and the amino-acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

 sequences of proteins is determined by the rules of translation, known collectively as the genetic code
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....

. The genetic code consists of three-letter 'words' called codons formed from a sequence of three nucleotides (e.g. ACT, CAG, TTT).

In transcription, the codons of a gene are copied into messenger RNA by RNA polymerase
RNA polymerase
RNA polymerase is an enzyme that produces RNA. In cells, RNAP is needed for constructing RNA chains from DNA genes as templates, a process called transcription. RNA polymerase enzymes are essential to life and are found in all organisms and many viruses...

. This RNA copy is then decoded by a ribosome
Ribosome
A ribosome is a component of cells that assembles the twenty specific amino acid molecules to form the particular protein molecule determined by the nucleotide sequence of an RNA molecule....

 that reads the RNA sequence by base-pairing the messenger RNA to transfer RNA
Transfer RNA
Transfer RNA is an adaptor molecule composed of RNA, typically 73 to 93 nucleotides in length, that is used in biology to bridge the three-letter genetic code in messenger RNA with the twenty-letter code of amino acids in proteins. The role of tRNA as an adaptor is best understood by...

, which carries amino acids. Since there are 4 bases in 3-letter combinations, there are 64 possible codons ( combinations). These encode the twenty standard amino acids, giving most amino acids more than one possible codon. There are also three 'stop' or 'nonsense' codons signifying the end of the coding region; these are the TAA, TGA and TAG codons.


Replication


Cell division
Cell division
Cell division is the process by which a parent cell divides into two or more daughter cells . Cell division is usually a small segment of a larger cell cycle. This type of cell division in eukaryotes is known as mitosis, and leaves the daughter cell capable of dividing again. The corresponding sort...

 is essential for an organism to grow, but, when a cell divides, it must replicate the DNA in its genome so that the two daughter cells have the same genetic information as their parent. The double-stranded structure of DNA provides a simple mechanism for DNA replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...

. Here, the two strands are separated and then each strand's complementary DNA
Complementary DNA
In genetics, complementary DNA is DNA synthesized from a messenger RNA template in a reaction catalyzed by the enzyme reverse transcriptase and the enzyme DNA polymerase. cDNA is often used to clone eukaryotic genes in prokaryotes...

 sequence is recreated by an enzyme
Enzyme
Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...

 called DNA polymerase
DNA polymerase
A DNA polymerase is an enzyme that helps catalyze in the polymerization of deoxyribonucleotides into a DNA strand. DNA polymerases are best known for their feedback role in DNA replication, in which the polymerase "reads" an intact DNA strand as a template and uses it to synthesize the new strand....

. This enzyme makes the complementary strand by finding the correct base through complementary base pairing, and bonding it onto the original strand. As DNA polymerases can only extend a DNA strand in a 5′ to 3′ direction, different mechanisms are used to copy the antiparallel strands of the double helix. In this way, the base on the old strand dictates which base appears on the new strand, and the cell ends up with a perfect copy of its DNA.

Interactions with proteins


All the functions of DNA depend on interactions with proteins. These protein interactions
Protein interactions
Proteins can interact with many types of molecules. Such interactions are related to their function and are therefore an object of study in molecular biology, and of computational methods of prediction in bioinformatics.Protein interactions can be classified as:...

 can be non-specific, or the protein can bind specifically to a single DNA sequence. Enzymes can also bind to DNA and of these, the polymerases that copy the DNA base sequence in transcription and DNA replication are particularly important.

DNA-binding proteins


Structural proteins that bind DNA are well-understood examples of non-specific DNA-protein interactions. Within chromosomes, DNA is held in complexes with structural proteins. These proteins organize the DNA into a compact structure called chromatin
Chromatin
Chromatin is the combination of DNA and proteins that make up the contents of the nucleus of a cell. The primary functions of chromatin are; to package DNA into a smaller volume to fit in the cell, to strengthen the DNA to allow mitosis and meiosis and prevent DNA damage, and to control gene...

. In eukaryotes this structure involves DNA binding to a complex of small basic proteins called histone
Histone
In biology, histones are highly alkaline proteins found in eukaryotic cell nuclei that package and order the DNA into structural units called nucleosomes. They are the chief protein components of chromatin, acting as spools around which DNA winds, and play a role in gene regulation...

s, while in prokaryotes multiple types of proteins are involved. The histones form a disk-shaped complex called a nucleosome
Nucleosome
Nucleosomes are the basic unit of DNA packaging in eukaryotes, consisting of a segment of DNA wound around a histone protein core. This structure is often compared to thread wrapped around a spool....

, which contains two complete turns of double-stranded DNA wrapped around its surface. These non-specific interactions are formed through basic residues in the histones making ionic bond
Ionic bond
An ionic bond is a type of chemical bond formed through an electrostatic attraction between two oppositely charged ions. Ionic bonds are formed between a cation, which is usually a metal, and an anion, which is usually a nonmetal. Pure ionic bonding cannot exist: all ionic compounds have some...

s to the acidic sugar-phosphate backbone of the DNA, and are therefore largely independent of the base sequence. Chemical modifications of these basic amino acid residues include methylation
Methylation
In the chemical sciences, methylation denotes the addition of a methyl group to a substrate or the substitution of an atom or group by a methyl group. Methylation is a form of alkylation with, to be specific, a methyl group, rather than a larger carbon chain, replacing a hydrogen atom...

, phosphorylation
Phosphorylation
Phosphorylation is the addition of a phosphate group to a protein or other organic molecule. Phosphorylation activates or deactivates many protein enzymes....

 and acetylation
Acetylation
Acetylation describes a reaction that introduces an acetyl functional group into a chemical compound...

. These chemical changes alter the strength of the interaction between the DNA and the histones, making the DNA more or less accessible to transcription factor
Transcription factor
In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences, thereby controlling the flow of genetic information from DNA to mRNA...

s and changing the rate of transcription. Other non-specific DNA-binding proteins in chromatin include the high-mobility group proteins, which bind to bent or distorted DNA. These proteins are important in bending arrays of nucleosomes and arranging them into the larger structures that make up chromosomes.

A distinct group of DNA-binding proteins are the DNA-binding proteins that specifically bind single-stranded DNA. In humans, replication protein A
Protein A
Protein A is a 40-60 kDa MSCRAMM surface protein originally found in the cell wall of the bacterium Staphylococcus aureus. It is encoded by the spa gene and its regulation is controlled by DNA topology, cellular osmolarity, and a two-component system called ArlS-ArlR. It has found use in...

 is the best-understood member of this family and is used in processes where the double helix is separated, including DNA replication, recombination and DNA repair. These binding proteins seem to stabilize single-stranded DNA and protect it from forming stem-loop
Stem-loop
Stem-loop intramolecular base pairing is a pattern that can occur in single-stranded DNA or, more commonly, in RNA. The structure is also known as a hairpin or hairpin loop. It occurs when two regions of the same strand, usually complementary in nucleotide sequence when read in opposite directions,...

s or being degraded by nuclease
Nuclease
A nuclease is an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of nucleic acids. Older publications may use terms such as "polynucleotidase" or "nucleodepolymerase"....

s.


In contrast, other proteins have evolved to bind to particular DNA sequences. The most intensively studied of these are the various transcription factor
Transcription factor
In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences, thereby controlling the flow of genetic information from DNA to mRNA...

s, which are proteins that regulate transcription. Each transcription factor binds to one particular set of DNA sequences and activates or inhibits the transcription of genes that have these sequences close to their promoters. The transcription factors do this in two ways. Firstly, they can bind the RNA polymerase responsible for transcription, either directly or through other mediator proteins; this locates the polymerase at the promoter and allows it to begin transcription. Alternatively, transcription factors can bind enzyme
Enzyme
Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...

s that modify the histones at the promoter; this will change the accessibility of the DNA template to the polymerase.

As these DNA targets can occur throughout an organism's genome, changes in the activity of one type of transcription factor can affect thousands of genes. Consequently, these proteins are often the targets of the signal transduction
Signal transduction
Signal transduction occurs when an extracellular signaling molecule activates a cell surface receptor. In turn, this receptor alters intracellular molecules creating a response...

 processes that control responses to environmental changes or cellular differentiation
Cellular differentiation
In developmental biology, cellular differentiation is the process by which a less specialized cell becomes a more specialized cell type. Differentiation occurs numerous times during the development of a multicellular organism as the organism changes from a simple zygote to a complex system of...

 and development. The specificity of these transcription factors' interactions with DNA come from the proteins making multiple contacts to the edges of the DNA bases, allowing them to "read" the DNA sequence. Most of these base-interactions are made in the major groove, where the bases are most accessible.


Nucleases and ligases


Nuclease
Nuclease
A nuclease is an enzyme capable of cleaving the phosphodiester bonds between the nucleotide subunits of nucleic acids. Older publications may use terms such as "polynucleotidase" or "nucleodepolymerase"....

s are enzyme
Enzyme
Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...

s that cut DNA strands by catalyzing the hydrolysis
Hydrolysis
Hydrolysis is a chemical reaction during which molecules of water are split into hydrogen cations and hydroxide anions in the process of a chemical mechanism. It is the type of reaction that is used to break down certain polymers, especially those made by condensation polymerization...

 of the phosphodiester bond
Phosphodiester bond
A phosphodiester bond is a group of strong covalent bonds between a phosphate group and two 5-carbon ring carbohydrates over two ester bonds. Phosphodiester bonds are central to all known life, as they make up the backbone of each helical strand of DNA...

s. Nucleases that hydrolyse nucleotides from the ends of DNA strands are called exonuclease
Exonuclease
Exonucleases are enzymes that work by cleaving nucleotides one at a time from the end of a polynucleotide chain. A hydrolyzing reaction that breaks phosphodiester bonds at either the 3’ or the 5’ end occurs. Its close relative is the endonuclease, which cleaves phosphodiester bonds in the middle ...

s, while endonuclease
Endonuclease
Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain, in contrast to exonucleases, which cleave phosphodiester bonds at the end of a polynucleotide chain. Typically, a restriction site will be a palindromic sequence four to six nucleotides long. Most...

s cut within strands. The most frequently used nucleases in molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...

 are the restriction endonucleases
Restriction enzyme
A Restriction Enzyme is an enzyme that cuts double-stranded DNA at specific recognition nucleotide sequences known as restriction sites. Such enzymes, found in bacteria and archaea, are thought to have evolved to provide a defense mechanism against invading viruses...

, which cut DNA at specific sequences. For instance, the EcoRV enzyme shown to the left recognizes the 6-base sequence 5′-GAT|ATC-3′ and makes a cut at the vertical line. In nature, these enzymes protect bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...

 against phage
Bacteriophage
A bacteriophage is any one of a number of viruses that infect bacteria. They do this by injecting genetic material, which they carry enclosed in an outer protein capsid...

 infection by digesting the phage DNA when it enters the bacterial cell, acting as part of the restriction modification system
Restriction modification system
The restriction modification system is used by bacteria, and perhaps other prokaryotic organisms to protect themselves from foreign DNA, such as the one borne by bacteriophages. This phenomenon was first noticed in the 1950s. Certain bacteria strains were found to inhibit the growth of viruses...

. In technology, these sequence-specific nucleases are used in molecular cloning
Molecular cloning
Molecular cloning refers to a set of experimental methods in molecular biology that are used to assemble recombinant DNA molecules and to direct their replication within host organisms...

 and DNA fingerprinting
Genetic fingerprinting
DNA profiling is a technique employed by forensic scientists to assist in the identification of individuals by their respective DNA profiles. DNA profiles are encrypted sets of numbers that reflect a person's DNA makeup, which can also be used as the person's identifier...

.

Enzymes called DNA ligase
DNA ligase
In molecular biology, DNA ligase is a specific type of enzyme, a ligase, that repairs single-stranded discontinuities in double stranded DNA molecules, in simple words strands that have double-strand break . Purified DNA ligase is used in gene cloning to join DNA molecules together...

s can rejoin cut or broken DNA strands. Ligases are particularly important in lagging strand
Replication fork
The replication fork is a structure that forms within the nucleus during DNA replication. It is created by helicases, which break the hydrogen bonds holding the two DNA strands together. The resulting structure has two branching "prongs", each one made up of a single strand of DNA...

 DNA replication, as they join together the short segments of DNA produced at the replication fork
Replication fork
The replication fork is a structure that forms within the nucleus during DNA replication. It is created by helicases, which break the hydrogen bonds holding the two DNA strands together. The resulting structure has two branching "prongs", each one made up of a single strand of DNA...

 into a complete copy of the DNA template. They are also used in DNA repair
DNA repair
DNA repair refers to a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as UV light and radiation can cause DNA damage, resulting in as many as 1...

 and genetic recombination
Genetic recombination
Genetic recombination is a process by which a molecule of nucleic acid is broken and then joined to a different one. Recombination can occur between similar molecules of DNA, as in homologous recombination, or dissimilar molecules, as in non-homologous end joining. Recombination is a common method...

.

Topoisomerases and helicases


Topoisomerase
Topoisomerase
Topoisomerases are enzymes that regulate the overwinding or underwinding of DNA. The winding problem of DNA arises due to the intertwined nature of its double helical structure. For example, during DNA replication, DNA becomes overwound ahead of a replication fork...

s are enzymes with both nuclease and ligase activity. These proteins change the amount of supercoiling
DNA supercoil
DNA supercoiling refers to the over- or under-winding of a DNA strand, and is an expression of the strain on the polymer. Supercoiling is important in a number of biological processes, such as compacting DNA. Additionally, certain enzymes such as topoisomerases are able to change DNA topology to...

 in DNA. Some of these enzymes work by cutting the DNA helix and allowing one section to rotate, thereby reducing its level of supercoiling; the enzyme then seals the DNA break. Other types of these enzymes are capable of cutting one DNA helix and then passing a second strand of DNA through this break, before rejoining the helix. Topoisomerases are required for many processes involving DNA, such as DNA replication and transcription.

Helicase
Helicase
Helicases are a class of enzymes vital to all living organisms. They are motor proteins that move directionally along a nucleic acid phosphodiester backbone, separating two annealed nucleic acid strands using energy derived from ATP hydrolysis.-Function:Many cellular processes Helicases are a...

s are proteins that are a type of molecular motor. They use the chemical energy in nucleoside triphosphate
Nucleoside triphosphate
Nucleoside triphosphate is a nucleoside with three phosphates. Natural nucleoside triphosphates include adenosine triphosphate , guanosine triphosphate , cytidine triphosphate , 5-methyluridine triphosphate , and uridine triphosphate . These terms refer to those nucleoside triphosphates that...

s, predominantly ATP
Adenosine triphosphate
Adenosine-5'-triphosphate is a multifunctional nucleoside triphosphate used in cells as a coenzyme. It is often called the "molecular unit of currency" of intracellular energy transfer. ATP transports chemical energy within cells for metabolism...

, to break hydrogen bonds between bases and unwind the DNA double helix into single strands. These enzymes are essential for most processes where enzymes need to access the DNA bases.

Polymerases


Polymerase
Polymerase
A polymerase is an enzyme whose central function is associated with polymers of nucleic acids such as RNA and DNA.The primary function of a polymerase is the polymerization of new DNA or RNA against an existing DNA or RNA template in the processes of replication and transcription...

s are enzyme
Enzyme
Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...

s that synthesize polynucleotide chains from nucleoside triphosphate
Nucleoside triphosphate
Nucleoside triphosphate is a nucleoside with three phosphates. Natural nucleoside triphosphates include adenosine triphosphate , guanosine triphosphate , cytidine triphosphate , 5-methyluridine triphosphate , and uridine triphosphate . These terms refer to those nucleoside triphosphates that...

s. The sequence of their products are copies of existing polynucleotide chains – which are called templates. These enzymes function by adding nucleotides onto the 3′ hydroxyl group
Hydroxyl
A hydroxyl is a chemical group containing an oxygen atom covalently bonded with a hydrogen atom. In inorganic chemistry, the hydroxyl group is known as the hydroxide ion, and scientists and reference works generally use these different terms though they refer to the same chemical structure in...

 of the previous nucleotide in a DNA strand. As a consequence, all polymerases work in a 5′ to 3′ direction. In the active site
Active site
In biology the active site is part of an enzyme where substrates bind and undergo a chemical reaction. The majority of enzymes are proteins but RNA enzymes called ribozymes also exist. The active site of an enzyme is usually found in a cleft or pocket that is lined by amino acid residues that...

 of these enzymes, the incoming nucleoside triphosphate base-pairs to the template: this allows polymerases to accurately synthesize the complementary strand of their template. Polymerases are classified according to the type of template that they use.

In DNA replication, a DNA-dependent DNA polymerase
DNA polymerase
A DNA polymerase is an enzyme that helps catalyze in the polymerization of deoxyribonucleotides into a DNA strand. DNA polymerases are best known for their feedback role in DNA replication, in which the polymerase "reads" an intact DNA strand as a template and uses it to synthesize the new strand....

 makes a copy of a DNA sequence. Accuracy is vital in this process, so many of these polymerases have a proofreading
Proofreading (biology)
The term proofreading is used in genetics to refer to the error-correcting processes, first proposed by John Hopfield and Jacques Ninio, involved in DNA replication, immune system specificity, enzyme-substrate recognition among many other processes that require enhanced specificity...

 activity. Here, the polymerase recognizes the occasional mistakes in the synthesis reaction by the lack of base pairing between the mismatched nucleotides. If a mismatch is detected, a 3′ to 5′ exonuclease
Exonuclease
Exonucleases are enzymes that work by cleaving nucleotides one at a time from the end of a polynucleotide chain. A hydrolyzing reaction that breaks phosphodiester bonds at either the 3’ or the 5’ end occurs. Its close relative is the endonuclease, which cleaves phosphodiester bonds in the middle ...

 activity is activated and the incorrect base removed. In most organisms, DNA polymerases function in a large complex called the replisome
Replisome
The replisome is a complex molecular machine that carries out replication of DNA. It is made up of a number of subcomponents that each provide a specific function during the process of replication.-Major components:...

 that contains multiple accessory subunits, such as the DNA clamp
DNA clamp
A DNA clamp, also known as a sliding clamp, is a protein fold that serves as a processivity-promoting factor in DNA replication. As a critical component of the DNA polymerase III holoenzyme, the clamp protein binds DNA polymerase and prevents this enzyme from dissociating from the template DNA strand...

 or helicase
Helicase
Helicases are a class of enzymes vital to all living organisms. They are motor proteins that move directionally along a nucleic acid phosphodiester backbone, separating two annealed nucleic acid strands using energy derived from ATP hydrolysis.-Function:Many cellular processes Helicases are a...

s.

RNA-dependent DNA polymerases are a specialized class of polymerases that copy the sequence of an RNA strand into DNA. They include reverse transcriptase
Reverse transcriptase
In the fields of molecular biology and biochemistry, a reverse transcriptase, also known as RNA-dependent DNA polymerase, is a DNA polymerase enzyme that transcribes single-stranded RNA into single-stranded DNA. It also helps in the formation of a double helix DNA once the RNA has been reverse...

, which is a viral
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

 enzyme involved in the infection of cells by retrovirus
Retrovirus
A retrovirus is an RNA virus that is duplicated in a host cell using the reverse transcriptase enzyme to produce DNA from its RNA genome. The DNA is then incorporated into the host's genome by an integrase enzyme. The virus thereafter replicates as part of the host cell's DNA...

es, and telomerase
Telomerase
Telomerase is an enzyme that adds DNA sequence repeats to the 3' end of DNA strands in the telomere regions, which are found at the ends of eukaryotic chromosomes. This region of repeated nucleotide called telomeres contains non-coding DNA material and prevents constant loss of important DNA from...

, which is required for the replication of telomeres. Telomerase is an unusual polymerase because it contains its own RNA template as part of its structure.

Transcription is carried out by a DNA-dependent RNA polymerase
RNA polymerase
RNA polymerase is an enzyme that produces RNA. In cells, RNAP is needed for constructing RNA chains from DNA genes as templates, a process called transcription. RNA polymerase enzymes are essential to life and are found in all organisms and many viruses...

 that copies the sequence of a DNA strand into RNA. To begin transcribing a gene, the RNA polymerase binds to a sequence of DNA called a promoter and separates the DNA strands. It then copies the gene sequence into a messenger RNA
Messenger RNA
Messenger RNA is a molecule of RNA encoding a chemical "blueprint" for a protein product. mRNA is transcribed from a DNA template, and carries coding information to the sites of protein synthesis: the ribosomes. Here, the nucleic acid polymer is translated into a polymer of amino acids: a protein...

 transcript until it reaches a region of DNA called the terminator
Terminator (genetics)
In genetics, a terminator, or transcription terminator is a section of genetic sequence that marks the end of gene or operon on genomic DNA for transcription.In prokaryotes, two classes of transcription terminators are known:...

, where it halts and detaches from the DNA. As with human DNA-dependent DNA polymerases, RNA polymerase II
RNA polymerase II
RNA polymerase II is an enzyme found in eukaryotic cells. It catalyzes the transcription of DNA to synthesize precursors of mRNA and most snRNA and microRNA. A 550 kDa complex of 12 subunits, RNAP II is the most studied type of RNA polymerase...

, the enzyme that transcribes most of the genes in the human genome, operates as part of a large protein complex
Protein complex
A multiprotein complex is a group of two or more associated polypeptide chains. If the different polypeptide chains contain different protein domain, the resulting multiprotein complex can have multiple catalytic functions...

 with multiple regulatory and accessory subunits.

Genetic recombination



A DNA helix usually does not interact with other segments of DNA, and in human cells the different chromosomes even occupy separate areas in the nucleus called "chromosome territories". This physical separation of different chromosomes is important for the ability of DNA to function as a stable repository for information, as one of the few times chromosomes interact is during chromosomal crossover
Chromosomal crossover
Chromosomal crossover is an exchange of genetic material between homologous chromosomes. It is one of the final phases of genetic recombination, which occurs during prophase I of meiosis in a process called synapsis. Synapsis begins before the synaptonemal complex develops, and is not completed...

 when they recombine
Genetic recombination
Genetic recombination is a process by which a molecule of nucleic acid is broken and then joined to a different one. Recombination can occur between similar molecules of DNA, as in homologous recombination, or dissimilar molecules, as in non-homologous end joining. Recombination is a common method...

. Chromosomal crossover is when two DNA helices break, swap a section and then rejoin.

Recombination allows chromosomes to exchange genetic information and produces new combinations of genes, which increases the efficiency of natural selection
Natural selection
Natural selection is the nonrandom process by which biologic traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution....

 and can be important in the rapid evolution of new proteins. Genetic recombination can also be involved in DNA repair, particularly in the cell's response to double-strand breaks.

The most common form of chromosomal crossover is homologous recombination
Homologous recombination
Homologous recombination is a type of genetic recombination in which nucleotide sequences are exchanged between two similar or identical molecules of DNA. It is most widely used by cells to accurately repair harmful breaks that occur on both strands of DNA, known as double-strand breaks...

, where the two chromosomes involved share very similar sequences. Non-homologous recombination can be damaging to cells, as it can produce chromosomal translocation
Chromosomal translocation
In genetics, a chromosome translocation is a chromosome abnormality caused by rearrangement of parts between nonhomologous chromosomes. A gene fusion may be created when the translocation joins two otherwise separated genes, the occurrence of which is common in cancer. It is detected on...

s and genetic abnormalities. The recombination reaction is catalyzed by enzymes known as recombinases, such as RAD51
RAD51
RAD51 is a human gene. The protein encoded by this gene is a member of the RAD51 protein family which assist in repair of DNA double strand breaks. RAD51 family members are homologous to the bacterial RecA and yeast Rad51...

. The first step in recombination is a double-stranded break either caused by an endonuclease
Endonuclease
Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain, in contrast to exonucleases, which cleave phosphodiester bonds at the end of a polynucleotide chain. Typically, a restriction site will be a palindromic sequence four to six nucleotides long. Most...

 or damage to the DNA. A series of steps catalyzed in part by the recombinase then leads to joining of the two helices by at least one Holliday junction
Holliday junction
A Holliday junction is a mobile junction between four strands of DNA. The structure is named after Robin Holliday, who proposed it in 1964 to account for a particular type of exchange of genetic information he observed in yeast known as homologous recombination...

, in which a segment of a single strand in each helix is annealed to the complementary strand in the other helix. The Holliday junction is a tetrahedral junction structure that can be moved along the pair of chromosomes, swapping one strand for another. The recombination reaction is then halted by cleavage of the junction and re-ligation of the released DNA.

Evolution



DNA contains the genetic information that allows all modern living things to function, grow and reproduce. However, it is unclear how long in the 4-billion-year history of life
Timeline of evolution
This timeline of evolution of life outlines the major events in the development of life on planet Earth since it first originated until the present day. In biology, evolution is any change across successive generations in the heritable characteristics of biological populations...

 DNA has performed this function, as it has been proposed that the earliest forms of life may have used RNA as their genetic material. RNA may have acted as the central part of early cell metabolism as it can both transmit genetic information and carry out catalysis
Catalysis
Catalysis is the change in rate of a chemical reaction due to the participation of a substance called a catalyst. Unlike other reagents that participate in the chemical reaction, a catalyst is not consumed by the reaction itself. A catalyst may participate in multiple chemical transformations....

 as part of ribozyme
Ribozyme
A ribozyme is an RNA molecule with a well defined tertiary structure that enables it to catalyze a chemical reaction. Ribozyme means ribonucleic acid enzyme. It may also be called an RNA enzyme or catalytic RNA. Many natural ribozymes catalyze either the hydrolysis of one of their own...

s. This ancient RNA world
RNA world hypothesis
The RNA world hypothesis proposes that life based on ribonucleic acid pre-dates the current world of life based on deoxyribonucleic acid , RNA and proteins. RNA is able both to store genetic information, like DNA, and to catalyze chemical reactions, like an enzyme protein...

 where nucleic acid would have been used for both catalysis and genetics may have influenced the evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

 of the current genetic code based on four nucleotide bases. This would occur, since the number of different bases in such an organism is a trade-off between a small number of bases increasing replication accuracy and a large number of bases increasing the catalytic efficiency of ribozymes.

However, there is no direct evidence of ancient genetic systems, as recovery of DNA from most fossils is impossible. This is because DNA will survive in the environment for less than one million years and slowly degrades into short fragments in solution. Claims for older DNA have been made, most notably a report of the isolation of a viable bacterium from a salt crystal 250 million years old, but these claims are controversial.

On August 8, 2011, a report, based on NASA
NASA
The National Aeronautics and Space Administration is the agency of the United States government that is responsible for the nation's civilian space program and for aeronautics and aerospace research...

 studies with meteorites found on Earth
Earth
Earth is the third planet from the Sun, and the densest and fifth-largest of the eight planets in the Solar System. It is also the largest of the Solar System's four terrestrial planets...

, was published suggesting building blocks of DNA (adenine
Adenine
Adenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...

, guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...

 and related organic molecules) may have been formed extraterrestrially in outer space
Outer space
Outer space is the void that exists between celestial bodies, including the Earth. It is not completely empty, but consists of a hard vacuum containing a low density of particles: predominantly a plasma of hydrogen and helium, as well as electromagnetic radiation, magnetic fields, and neutrinos....

.

Genetic engineering



Methods have been developed to purify DNA from organisms, such as phenol-chloroform extraction
Phenol-chloroform extraction
Phenol–chloroform extraction is a liquid–liquid extraction technique in biochemistry. It is widely used in molecular biology for isolating DNA, RNA and protein. Equal volumes of a phenol:chloroform mixture and an aqueous sample are mixed, forming a biphasic mixture...

, and to manipulate it in the laboratory, such as restriction digest
Restriction digest
A restriction digest is a procedure used in molecular biology to prepare DNA for analysis or other processing. It is sometimes termed DNA fragmentation...

s and the polymerase chain reaction
Polymerase chain reaction
The polymerase chain reaction is a scientific technique in molecular biology to amplify a single or a few copies of a piece of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence....

. Modern biology
Biology
Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Biology is a vast subject containing many subdivisions, topics, and disciplines...

 and biochemistry
Biochemistry
Biochemistry, sometimes called biological chemistry, is the study of chemical processes in living organisms, including, but not limited to, living matter. Biochemistry governs all living organisms and living processes...

 make intensive use of these techniques in recombinant DNA technology. Recombinant DNA
Recombinant DNA
Recombinant DNA molecules are DNA sequences that result from the use of laboratory methods to bring together genetic material from multiple sources, creating sequences that would not otherwise be found in biological organisms...

 is a man-made DNA sequence that has been assembled from other DNA sequences. They can be transformed
Transformation (genetics)
In molecular biology transformation is the genetic alteration of a cell resulting from the direct uptake, incorporation and expression of exogenous genetic material from its surroundings and taken up through the cell membrane. Transformation occurs naturally in some species of bacteria, but it can...

 into organisms in the form of plasmid
Plasmid
In microbiology and genetics, a plasmid is a DNA molecule that is separate from, and can replicate independently of, the chromosomal DNA. They are double-stranded and, in many cases, circular...

s or in the appropriate format, by using a viral vector
Viral vector
Viral vectors are a tool commonly used by molecular biologists to deliver genetic material into cells. This process can be performed inside a living organism or in cell culture . Viruses have evolved specialized molecular mechanisms to efficiently transport their genomes inside the cells they infect...

. The genetically modified
Genetic engineering
Genetic engineering, also called genetic modification, is the direct human manipulation of an organism's genome using modern DNA technology. It involves the introduction of foreign DNA or synthetic genes into the organism of interest...

 organisms produced can be used to produce products such as recombinant protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

s, used in medical research, or be grown in agriculture
Agriculture
Agriculture is the cultivation of animals, plants, fungi and other life forms for food, fiber, and other products used to sustain life. Agriculture was the key implement in the rise of sedentary human civilization, whereby farming of domesticated species created food surpluses that nurtured the...

.

Forensics


Forensic scientists can use DNA in blood
Blood
Blood is a specialized bodily fluid in animals that delivers necessary substances such as nutrients and oxygen to the cells and transports metabolic waste products away from those same cells....

, semen
Semen
Semen is an organic fluid, also known as seminal fluid, that may contain spermatozoa. It is secreted by the gonads and other sexual organs of male or hermaphroditic animals and can fertilize female ova...

, skin
Skin
-Dermis:The dermis is the layer of skin beneath the epidermis that consists of connective tissue and cushions the body from stress and strain. The dermis is tightly connected to the epidermis by a basement membrane. It also harbors many Mechanoreceptors that provide the sense of touch and heat...

, saliva
Saliva
Saliva , referred to in various contexts as spit, spittle, drivel, drool, or slobber, is the watery substance produced in the mouths of humans and most other animals. Saliva is a component of oral fluid. In mammals, saliva is produced in and secreted from the three pairs of major salivary glands,...

 or hair
Hair
Hair is a filamentous biomaterial, that grows from follicles found in the dermis. Found exclusively in mammals, hair is one of the defining characteristics of the mammalian class....

 found at a crime scene
Crime scene
A crime scene is a location where an illegal act took place, and comprises the area from which most of the physical evidence is retrieved by trained law enforcement personnel, crime scene investigators or in rare circumstances, forensic scientists....

 to identify a matching DNA of an individual, such as a perpetrator. This process is formally termed DNA profiling, but may also be called "genetic fingerprinting
Genetic fingerprinting
DNA profiling is a technique employed by forensic scientists to assist in the identification of individuals by their respective DNA profiles. DNA profiles are encrypted sets of numbers that reflect a person's DNA makeup, which can also be used as the person's identifier...

". In DNA profiling, the lengths of variable sections of repetitive DNA, such as short tandem repeat
Short tandem repeat
A short tandem repeat in DNA occurs when a pattern of two or more nucleotides are repeated and the repeated sequences are directly adjacent to each other. The pattern can range in length from 2 to 5 base pairs and is typically in the non-coding intron region...

s and minisatellite
Minisatellite
A minisatellite is a section of DNA that consists of a short series of bases 10-60 bp. These occur at more than 1,000 locations in the human genome...

s, are compared between people. This method is usually an extremely reliable technique for identifying a matching DNA. However, identification can be complicated if the scene is contaminated with DNA from several people. DNA profiling was developed in 1984 by British geneticist Sir Alec Jeffreys
Alec Jeffreys
Sir Alec John Jeffreys, FRS is a British geneticist, who developed techniques for DNA fingerprinting and DNA profiling which are now used all over the world in forensic science to assist police detective work, and also to resolve paternity and immigration disputes...

, and first used in forensic science to convict Colin Pitchfork in the 1988 Enderby murders
Colin Pitchfork
Colin Pitchfork is a British criminal, the first convicted of murder based on DNA fingerprinting evidence, and the first to be caught as a result of mass DNA screening. Pitchfork raped and murdered two girls, the first in Narborough, Leicestershire, in November 1983, and the second in Enderby,...

 case.

The development of forensic science,and the ability to now obtain genetic matching on minute samples of blood, skin, saliva or hair has led to a re-examination of a number of cases. Evidence can now be uncovered that was not scientifically possible at the time of the original examination. Combined with the removal of the double jeopardy
Double jeopardy
Double jeopardy is a procedural defense that forbids a defendant from being tried again on the same, or similar charges following a legitimate acquittal or conviction...

 law, this allows cases to be reopened where previous trials have failed to produce sufficient evidence to convince a jury. People charged with serious crimes may be required to provide a sample of DNA for matching purposes. The most obvious defence to DNA matches obtained forensically is to claim that cross-contamination of evidence has taken place. This has resulted in meticulous strict handling procedures with new cases of serious crime.
DNA profiling is also be used to identify victims of mass casualty incidents. As well as positively identifying bodies or body parts in serious accidents, DNA profiling is being successfully used to identify individual victims in mass war graves - matching to family members.

Bioinformatics



Bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

 involves the manipulation, searching, and data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

 of biological data, and this includes DNA sequence data. The development of techniques to store and search DNA sequences have led to widely applied advances in computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, especially string searching algorithm
String searching algorithm
String searching algorithms, sometimes called string matching algorithms, are an important class of string algorithms that try to find a place where one or several strings are found within a larger string or text....

s, machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

 and database theory
Database theory
Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems....

. String searching or matching algorithms, which find an occurrence of a sequence of letters inside a larger sequence of letters, were developed to search for specific sequences of nucleotides. The DNA sequenced may be aligned
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...

 with other DNA sequences to identify homologous
Homology (biology)
Homology forms the basis of organization for comparative biology. In 1843, Richard Owen defined homology as "the same organ in different animals under every variety of form and function". Organs as different as a bat's wing, a seal's flipper, a cat's paw and a human hand have a common underlying...

 sequences and locate the specific mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...

s that make them distinct. These techniques, especially multiple sequence alignment
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...

, are used in studying phylogenetic
Phylogenetics
In biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...

 relationships and protein function. Data sets representing entire genomes' worth of DNA sequences, such as those produced by the Human Genome Project
Human Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...

, are difficult to use without the annotations that identify the locations of genes and regulatory elements on each chromosome. Regions of DNA sequence that have the characteristic patterns associated with protein- or RNA-coding genes can be identified by gene finding
Gene prediction
In computational biology gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions...

 algorithms, which allow researchers to predict the presence of particular gene product
Gene product
A gene product is the biochemical material, either RNA or protein, resulting from expression of a gene. A measurement of the amount of gene product is sometimes used to infer how active a gene is. Abnormal amounts of gene product can be correlated with disease-causing alleles, such as the...

s and their possible functions in an organism even before they have been isolated experimentally. Entire genomes may also be compared which can shed light on the evolutionary history of particular organism and permit the examination of complex evolutionary events.

DNA nanotechnology



DNA nanotechnology uses the unique molecular recognition
Molecular recognition
The term molecular recognition refers to the specific interaction between two or more molecules through noncovalent bonding such as hydrogen bonding, metal coordination, hydrophobic forces, van der Waals forces, π-π interactions, electrostatic and/or electromagnetic effects...

 properties of DNA and other nucleic acids to create self-assembling branched DNA complexes with useful properties. DNA is thus used as a structural material rather than as a carrier of biological information. This has led to the creation of two-dimensional periodic lattices (both tile-based as well as using the "DNA origami
DNA origami
DNA origami is the nanoscale folding of DNA to create arbitrary two and three dimensional shapes at the nanoscale. The specificity of the interactions between complementary base pairs make DNA a useful construction material through design of its base sequences...

" method) as well as three-dimensional structures in the shapes of polyhedra
Polyhedron
In elementary geometry a polyhedron is a geometric solid in three dimensions with flat faces and straight edges...

. Nanomechanical devices
DNA machine
A DNA machine is a molecular machine constructed from DNA. Research into DNA machines was pioneered in the late 1980s by Nadrian Seeman and co-workers from New York University...

 and algorithmic self-assembly
DNA computing
DNA computing is a form of computing which uses DNA, biochemistry and molecular biology, instead of the traditional silicon-based computer technologies. DNA computing, or, more generally, biomolecular computing, is a fast developing interdisciplinary area...

 have also been demonstrated, and these DNA structures have been used to template the arrangement of other molecules such as gold nanoparticles
Colloidal gold
Colloidal gold is a suspension of sub-micrometre-sized particles of gold in a fluid — usually water. The liquid is usually either an intense red colour , or a dirty yellowish colour ....

 and streptavidin
Streptavidin
Streptavidin is a 60000 dalton protein purified from the bacterium Streptomyces avidinii. Streptavidin homo-tetramers have an extraordinarily high affinity for biotin . With a dissociation constant on the order of ≈10-14 mol/L, the binding of biotin to streptavidin is one of the strongest...

 proteins.

History and anthropology



Because DNA collects mutations over time, which are then inherited, it contains historical information, and, by comparing DNA sequences, geneticists can infer the evolutionary history of organisms, their phylogeny
Phylogenetics
In biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...

. This field of phylogenetics is a powerful tool in evolutionary biology. If DNA sequences within a species are compared, population geneticists
Population genetics
Population genetics is the study of allele frequency distribution and change under the influence of the four main evolutionary processes: natural selection, genetic drift, mutation and gene flow. It also takes into account the factors of recombination, population subdivision and population...

 can learn the history of particular populations. This can be used in studies ranging from ecological genetics
Ecological genetics
Ecological genetics is the study of genetics in natural populations.This contrasts with classical genetics, which works mostly on crosses between laboratory strains, and DNA sequence analysis, which studies genes at the molecular level....

 to anthropology
Anthropology
Anthropology is the study of humanity. It has origins in the humanities, the natural sciences, and the social sciences. The term "anthropology" is from the Greek anthrōpos , "man", understood to mean mankind or humanity, and -logia , "discourse" or "study", and was first used in 1501 by German...

; For example, DNA evidence is being used to try to identify the Ten Lost Tribes of Israel
Ten Lost Tribes
The Ten Lost Tribes of Israel refers to those tribes of ancient Israel that formed the Kingdom of Israel and which disappeared from Biblical and all other historical accounts after the kingdom was destroyed in about 720 BC by ancient Assyria...

.

DNA has also been used to look at modern family relationships, such as establishing family relationships between the descendants of Sally Hemings
Sally Hemings
Sarah "Sally" Hemings was a mixed-race slave owned by President Thomas Jefferson through inheritance from his wife. She was the half-sister of Jefferson's wife, Martha Wayles Skelton Jefferson by their father John Wayles...

 and Thomas Jefferson
Thomas Jefferson
Thomas Jefferson was the principal author of the United States Declaration of Independence and the Statute of Virginia for Religious Freedom , the third President of the United States and founder of the University of Virginia...

. This usage is closely related to the use of DNA in criminal investigations detailed above. Indeed, some criminal investigations have been solved when DNA from crime scenes has matched relatives of the guilty individual.

History of DNA research



DNA was first isolated by the Swiss
Switzerland
Switzerland name of one of the Swiss cantons. ; ; ; or ), in its full name the Swiss Confederation , is a federal republic consisting of 26 cantons, with Bern as the seat of the federal authorities. The country is situated in Western Europe,Or Central Europe depending on the definition....

 physician Friedrich Miescher
Friedrich Miescher
Johannes Friedrich Miescher was a Swiss physician and biologist. He was the first researcher to isolate and identify nucleic acid.-Biography:...

 who, in 1869, discovered a microscopic substance in the pus
Pus
Pus is a viscous exudate, typically whitish-yellow, yellow, or yellow-brown, formed at the site of inflammatory during infection. An accumulation of pus in an enclosed tissue space is known as an abscess, whereas a visible collection of pus within or beneath the epidermis is known as a pustule or...

 of discarded surgical bandages. As it resided in the nuclei of cells, he called it "nuclein". In 1878, Albrecht Kossel
Albrecht Kossel
Ludwig Karl Martin Leonhard Albrecht Kossel was a German biochemist and pioneer in the study of genetics. He was awarded the Nobel Prize for Physiology or Medicine in 1910 for his work in determining the chemical composition of nucleic acids, the genetic substance of biological cells.Kossel...

 isolated the non-protein component of "nuclein", nucleic acid
Nucleic acid
Nucleic acids are biological molecules essential for life, and include DNA and RNA . Together with proteins, nucleic acids make up the most important macromolecules; each is found in abundance in all living things, where they function in encoding, transmitting and expressing genetic information...

, and later isolated its five primary nucleobases. In 1919, Phoebus Levene
Phoebus Levene
Phoebus Aaron Theodore Levene, M.D. was a Russian-American biochemist who studied the structure and function of nucleic acids...

 identified the base, sugar and phosphate nucleotide unit. Levene suggested that DNA consisted of a string of nucleotide units linked together through the phosphate groups. However, Levene thought the chain was short and the bases repeated in a fixed order. In 1937 William Astbury
William Astbury
William Thomas Astbury FRS was an English physicist and molecular biologist who made pioneering X-ray diffraction studies of biological molecules. His work on keratin provided the foundation for Linus Pauling's discovery of the alpha helix...

 produced the first X-ray diffraction
X-ray scattering techniques
X-ray scattering techniques are a family of non-destructive analytical techniques which reveal information about the crystallographic structure, chemical composition, and physical properties of materials and thin films...

 patterns that showed that DNA had a regular structure.
In 1927 Nikolai Koltsov
Nikolai Koltsov
Nikolai Konstantinovich Koltsov was a Russian biologist. He was one of the creators of modern genetics. Nikolai Koltsov was a teacher of Nikolay Timofeeff-Ressovsky.-Scientific career:...

  proposed that inherited traits would be inherited via a "giant hereditary molecule" which would be made up of "two mirror strands that would replicate in a semi-conservative fashion using each strand as a template". In 1928, Frederick Griffith
Frederick Griffith
Frederick Griffith was a British bacteriologist whose focus was the epidemiology and pathology of bacterial pneumonia. In January 1928 he reported what is now known as Griffith's Experiment, the first widely accepted demonstrations of bacterial transformation, whereby a bacterium distinctly...

 discovered that traits
Trait (biology)
A trait is a distinct variant of a phenotypic character of an organism that may be inherited, environmentally determined or be a combination of the two...

 of the "smooth" form of the Pneumococcus could be transferred to the "rough" form of the same bacteria by mixing killed "smooth" bacteria with the live "rough" form. This system provided the first clear suggestion that DNA carries genetic information—the Avery–MacLeod–McCarty experiment—when Oswald Avery
Oswald Avery
Oswald Theodore Avery ForMemRS was a Canadian-born American physician and medical researcher. The major part of his career was spent at the Rockefeller University Hospital in New York City...

, along with coworkers Colin MacLeod and Maclyn McCarty
Maclyn McCarty
Maclyn McCarty was an American geneticist.Maclyn McCarty, who devoted his life as a physician-scientist to studying infectious disease organisms, was best known for his part in the monumental discovery that DNA, rather than protein, constituted the chemical nature of a gene...

, identified DNA as the transforming principle
Griffith's experiment
Griffith's experiment, reported in 1928 by Frederick Griffith, was one of the first experiments suggesting that bacteria are capable of transferring genetic information through a process known as transformation....

 in 1943. DNA's role in heredity
Heredity
Heredity is the passing of traits to offspring . This is the process by which an offspring cell or organism acquires or becomes predisposed to the characteristics of its parent cell or organism. Through heredity, variations exhibited by individuals can accumulate and cause some species to evolve...

 was confirmed in 1952, when Alfred Hershey
Alfred Hershey
Alfred Day Hershey was an American Nobel Prize-winning bacteriologist and geneticist.He was born in Owosso, Michigan and received his B.S. in chemistry at Michigan State University in 1930 and his Ph.D. in bacteriology in 1934, taking a position shortly thereafter at the Department of Bacteriology...

 and Martha Chase
Martha Chase
Martha Cowles Chase , also known as Martha C. Epstein, was an American geneticist famously known for being a member of the 1952 team which experimentally showed that DNA rather than protein is the genetic material of life. She was greatly respected as a geneticist. Chase was born in 1927 in...

 in the Hershey–Chase experiment showed that DNA is the genetic material of the T2 phage
Enterobacteria phage T2
Enterobacteria phage T2 is a virulent bacteriophage of the T4-like viruses genus, in the family Myoviridae. It infects Escherichia coli and is the best known of the T-even phages. Its virion contains linear double-stranded DNA, terminally redundant and circularly permuted. The phage is covered by a...

.

In 1953, James D. Watson
James D. Watson
James Dewey Watson is an American molecular biologist, geneticist, and zoologist, best known as one of the co-discoverers of the structure of DNA in 1953 with Francis Crick...

 and Francis Crick
Francis Crick
Francis Harry Compton Crick OM FRS was an English molecular biologist, biophysicist, and neuroscientist, and most noted for being one of two co-discoverers of the structure of the DNA molecule in 1953, together with James D. Watson...

 suggested what is now accepted as the first correct double-helix model of DNA structure
Molecular structure of Nucleic Acids
The "Molecular structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid" was an article published by James D. Watson and Francis Crick in the scientific journal Nature in its 171st volume on pages 737–738 . It was the first publication which described the discovery of the double helix...

 in the journal Nature
Nature (journal)
Nature, first published on 4 November 1869, is ranked the world's most cited interdisciplinary scientific journal by the Science Edition of the 2010 Journal Citation Reports...

. Their double-helix, molecular model of DNA was then based on a single X-ray diffraction image (labeled as "Photo 51
Photo 51
Photo 51 is the nickname given to an X-ray diffraction image of DNA taken by Rosalind Franklin in 1952 that was critical evidence in identifying the structure of DNA. The photo was taken by Franklin while working at King's College London in Sir John Randall's group.James D...

") taken by Rosalind Franklin
Rosalind Franklin
Rosalind Elsie Franklin was a British biophysicist and X-ray crystallographer who made critical contributions to the understanding of the fine molecular structures of DNA, RNA, viruses, coal and graphite...

 and Raymond Gosling
Raymond Gosling
Raymond Gosling is a distinguished scientist who worked with both Maurice Wilkins and Rosalind Franklin at King's College London in deducing the structure of DNA, under the direction of Sir John Randall. His other KCL colleagues included Alex Stokes and Herbert Wilson.-Early years:He was born in...

 in May 1952, as well as the information that the DNA bases are paired — also obtained through private communications from Erwin Chargaff
Erwin Chargaff
Erwin Chargaff was an American biochemist who emigrated to the United States during the Nazi era. Through careful experimentation, Chargaff discovered two rules that helped lead to the discovery of the double helix structure of DNA...

 in the previous years. Chargaff's rules
Chargaff's rules
Chargaff's rules state that DNA from any cell of all organisms should have a 1:1 ratio of pyrimidine and purine bases and, more specifically, that the amount of guanine is equal to cytosine and the amount of adenine is equal to thymine. This pattern is found in both strands of the DNA...

 played a very important role in establishing double-helix configurations for B-DNA as well as A-DNA.

Experimental evidence supporting the Watson and Crick
Watson and Crick
James D. Watson and Francis Crick were the two co-discoverers of the structure of DNA in 1953. They used x-ray diffraction data collected by Rosalind Franklin and proposed the double helix or spiral staircase structure of the DNA molecule...

 model were published in a series of five articles in the same issue of Nature. Of these, Franklin and Gosling's paper was the first publication of their own X-ray diffraction data and original analysis method that partially supported the Watson and Crick model; this issue also contained an article on DNA structure by Maurice Wilkins
Maurice Wilkins
Maurice Hugh Frederick Wilkins CBE FRS was a New Zealand-born English physicist and molecular biologist, and Nobel Laureate whose research contributed to the scientific understanding of phosphorescence, isotope separation, optical microscopy and X-ray diffraction, and to the development of radar...

 and two of his colleagues, whose analysis and in vivo B-DNA X-ray patterns also supported the presence in vivo of the double-helical DNA configurations as proposed by Crick and Watson for their double-helix molecular model of DNA in the previous two pages of Nature. In 1962, after Franklin's death, Watson, Crick, and Wilkins jointly received the Nobel Prize
Nobel Prize
The Nobel Prizes are annual international awards bestowed by Scandinavian committees in recognition of cultural and scientific advances. The will of the Swedish chemist Alfred Nobel, the inventor of dynamite, established the prizes in 1895...

 in Physiology or Medicine
Nobel Prize in Physiology or Medicine
The Nobel Prize in Physiology or Medicine administered by the Nobel Foundation, is awarded once a year for outstanding discoveries in the field of life science and medicine. It is one of five Nobel Prizes established in 1895 by Swedish chemist Alfred Nobel, the inventor of dynamite, in his will...

. However, Nobel rules of the time allowed only living recipients, but a vigorous debate continues on who should receive credit for the discovery.

In an influential presentation in 1957, Crick laid out the central dogma of molecular biology
Central dogma of molecular biology
The central dogma of molecular biology was first articulated by Francis Crick in 1958 and re-stated in a Nature paper published in 1970:In other words, the process of producing proteins is irreversible: a protein cannot be used to create DNA....

, which foretold the relationship between DNA, RNA, and proteins, and articulated the "adaptor hypothesis". Final confirmation of the replication mechanism that was implied by the double-helical structure followed in 1958 through the Meselson–Stahl experiment. Further work by Crick and coworkers showed that the genetic code was based on non-overlapping triplets of bases, called codons, allowing Har Gobind Khorana, Robert W. Holley
Robert W. Holley
Robert William Holley was an American biochemist. He shared the Nobel Prize in Physiology or Medicine in 1968 for describing the structure of alanine transfer RNA, linking DNA and protein synthesis.Holley was born in Urbana, Illinois, and graduated from Urbana High School in 1938...

 and Marshall Warren Nirenberg
Marshall Warren Nirenberg
Marshall Warren Nirenberg was an American biochemist and geneticist of Jewish origin. He shared a Nobel Prize in Physiology or Medicine in 1968 with Har Gobind Khorana and Robert W. Holley for "breaking the genetic code" and describing how it operates in protein synthesis...

 to decipher the genetic code. These findings represent the birth of molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...

.

Further reading


  • Judson, Horace F
    Horace Freeland Judson
    Horace Freeland Judson was a historian of molecular biology and the author of several books, including The Eighth Day of Creation, a history of molecular biology, and The Great Betrayal: Fraud In Science, an examination of the deliberate manipulation of scientific data.-Life and career:The Eighth...

    . 1979. The Eighth Day of Creation: Makers of the Revolution in Biology. Touchstone Books, ISBN 0-671-22540-5. 2nd edition: Cold Spring Harbor Laboratory Press, 1996 paperback: ISBN 0-87969-478-5., first published in October 1974 by MacMillan, with foreword by Francis Crick;the definitive DNA textbook,revised in 1994 with a 9 page postscript
  • Micklas, David. 2003. DNA Science: A First Course. Cold Spring Harbor Press: ISBN 978-0879696368
  • Rosenfeld, Israel. 2010. DNA: A Graphic Guide to the Molecule that Shook the World. Columbia University Press: ISBN 978-0231142717
  • Schultz, Mark and Zander Cannon. 2009. The Stuff of Life: A Graphic Guide to Genetics and DNA. Hill and Wang: ISBN 0809089475
  • Watson, James D
    James D. Watson
    James Dewey Watson is an American molecular biologist, geneticist, and zoologist, best known as one of the co-discoverers of the structure of DNA in 1953 with Francis Crick...

    . 2004. DNA: The Secret of Life. Random House: ISBN 978-0099451846


External links