Molecular phylogeneticsIn biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...
(məˈlɛkjʊlər faɪlɵdʒɪˈnɛtɪks) is the analysis of hereditary molecular differences, mainly in DNA sequences, to gain information on an organism's evolutionary relationships. The result of a molecular
phylogeneticIn biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...
analysis is expressed in a
phylogenetic treeA phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics...
. Molecular phylogenetics is one aspect of
molecular systematicsBiological systematics is the study of the diversification of terrestrial life, both past and present, and the relationships among living things through time. Relationships are visualized as evolutionary trees...
, a broader term that also includes the use of molecular data in
taxonomyTaxonomy is the science of identifying and naming species, and arranging them into a classification. The field of taxonomy, sometimes referred to as "biological taxonomy", revolves around the description and use of taxonomic units, known as taxa...
and
biogeographyBiogeography is the study of the distribution of species , organisms, and ecosystems in space and through geological time. Organisms and biological communities vary in a highly regular fashion along geographic gradients of latitude, elevation, isolation and habitat area...
.
History of molecular phylogenetics
The theoretical frameworks for molecular systematics were laid in the 1960s in the works of
Emile ZuckerkandlEmile Zuckerkandl is an Austrian-American biologist considered one of the founders of the field of molecular evolution. He is best known for introducing, with Linus Pauling, the concept of the molecular clock, which set the stage for the neutral theory of molecular evolution.- Life and work...
,
Emanuel MargoliashEmanuel Margoliash was a biochemist who spent much of his career studying the protein cytochrome c. He is best known for his work on molecular evolution; with Walter Fitch, he devised Fitch-Margoliash method for constructing evolutionary trees based on protein sequences.He was a member of the...
,
Linus PaulingLinus Carl Pauling was an American chemist, biochemist, peace activist, author, and educator. He was one of the most influential chemists in history and ranks among the most important scientists of the 20th century...
, and
Walter M. FitchWalter M. Fitch . Until his death he was professor of molecular evolution at the University of California, Irvine. He was also a member of the National Academy of Sciences, the American Philosophical Society, and the American Association for the Advancement of Science, and was a Foreign Member of...
. Applications of molecular systematics were pioneered by
Charles G. SibleyCharles Gald Sibley was an American ornithologist and molecular biologist. He had an immense influence on the scientific classification of birds, and the work that Sibley initiated has substantially altered our understanding of the evolutionary history of modern birds.Sibley's taxonomy has been a...
(
birdBirds are feathered, winged, bipedal, endothermic , egg-laying, vertebrate animals. Around 10,000 living species and 188 families makes them the most speciose class of tetrapod vertebrates. They inhabit ecosystems across the globe, from the Arctic to the Antarctic. Extant birds range in size from...
s), Herbert C. Dessauer (
herpetologyHerpetology is the branch of zoology concerned with the study of amphibians and reptiles...
), and
Morris GoodmanMorris Goodman was an American scientist known for his work in molecular evolution and molecular systematics...
(
primateA primate is a mammal of the order Primates , which contains prosimians and simians. Primates arose from ancestors that lived in the trees of tropical forests; many primate characteristics represent adaptations to life in this challenging three-dimensional environment...
s), followed by
Allan C. WilsonAllan Charles Wilson was a pioneer in the use of molecular approaches to understand evolutionary change and reconstruct phylogenies, and a contributor to the study of human evolution. He was one of the most controversial figures in post-war biology; his work attracted a great deal of attention...
, Robert K. Selander, and John C. Avise (who studied various groups). Work with
protein electrophoresisProtein electrophoresis is a method for analysing the proteins in a fluid or an extract. The electrophoresis may be performed with a small volume of sample in a number of alternative ways with or without a supporting medium: SDS polyacrylamide gel electrophoresis Protein electrophoresis is a method...
began around 1956. Although the results were not quantitative and did not initially improve on morphological classification, they provided tantalizing hints that long-held notions of the classifications of
birdBirds are feathered, winged, bipedal, endothermic , egg-laying, vertebrate animals. Around 10,000 living species and 188 families makes them the most speciose class of tetrapod vertebrates. They inhabit ecosystems across the globe, from the Arctic to the Antarctic. Extant birds range in size from...
s, for example, needed substantial revision. In the period of 1974–1986, DNA-DNA hybridization was the dominant technique.
Techniques and applications
Every living
organismIn biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...
contains
DNADeoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
,
RNARibonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....
, and
proteinProteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
s. In general, closely related organisms have a high degree of agreement in the
molecular structureThe molecular structure of a substance is described by the combination of nuclei and electrons that comprise its constitute molecules. This includes the molecular geometry , the electronic properties of the...
of these substances, while the molecules of organisms distantly related usually show a pattern of dissimilarity. Conserved sequences, such as mitochondrial DNA, are expected to accumulate mutations over time, and assuming a constant rate of mutation provide a
molecular clockThe molecular clock is a technique in molecular evolution that uses fossil constraints and rates of molecular change to deduce the time in geologic history when two species or other taxa diverged. It is used to estimate the time of occurrence of events called speciation or radiation...
for dating divergence. Molecular phylogeny uses such data to build a "relationship tree" that shows the probable
evolutionEvolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...
of various organisms. Not until recent decades, however, has it been possible to isolate and identify these molecular structures.
The most common approach is the comparison of homologous sequences for genes using
sequence alignmentIn bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...
techniques to identify similarity. Another application of molecular phylogeny is in
DNA barcodingDNA barcoding is a taxonomic method that uses a short genetic marker in an organism's DNA to identify it as belonging to a particular species. It differs from molecular phylogeny in that the main goal is not to determine classification but to identify an unknown sample in terms of a known...
, wherein the species of an individual organism is identified using small sections of
mitochondrial DNAMitochondrial DNA is the DNA located in organelles called mitochondria, structures within eukaryotic cells that convert the chemical energy from food into a form that cells can use, adenosine triphosphate...
. Another application of the techniques that make this possible can be seen in the very limited field of human genetics, such as the ever-more-popular use of
genetic testingGenetic testing is among the newest and most sophisticated of techniques used to test for genetic disorders which involves direct examination of the DNA molecule itself. Other genetic tests include biochemical tests for such gene products as enzymes and other proteins and for microscopic...
to determine a child's
paternityA parent is a caretaker of the offspring in their own species. In humans, a parent is of a child . Children can have one or more parents, but they must have two biological parents. Biological parents consist of the male who sired the child and the female who gave birth to the child...
, as well as the emergence of a new branch of criminal
forensicsForensic science is the application of a broad spectrum of sciences to answer questions of interest to a legal system. This may be in relation to a crime or a civil action...
focused on evidence known as
genetic fingerprintingDNA profiling is a technique employed by forensic scientists to assist in the identification of individuals by their respective DNA profiles. DNA profiles are encrypted sets of numbers that reflect a person's DNA makeup, which can also be used as the person's identifier...
.
Theoretical background
Early attempts at molecular systematics were also termed as
chemotaxonomyChemotaxonomy , also called chemosystematics, is the attempt to classify and identify organisms , according to demonstrable differences and similarities in their biochemical compositions. The compounds studied in most of the cases are mostly proteins, amino acids and peptides...
and made use of proteins,
enzymeEnzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...
s,
carbohydrateA carbohydrate is an organic compound with the empirical formula ; that is, consists only of carbon, hydrogen, and oxygen, with a hydrogen:oxygen atom ratio of 2:1 . However, there are exceptions to this. One common example would be deoxyribose, a component of DNA, which has the empirical...
s, and other molecules that were separated and characterized using techniques such as
chromatographyChromatography is the collective term for a set of laboratory techniques for the separation of mixtures....
. These have been replaced in recent times largely by
DNA sequencingDNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....
, which produces the exact sequences of nucleotides or
bases in either DNA or RNA segments extracted using different techniques. In general, these are considered superior for evolutionary studies, since the actions of evolution are ultimately reflected in the genetic sequences. At present, it is still a long and expensive process to sequence the entire DNA of an organism (its
genomeIn modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
), and this has been done for only a few species. However, it is quite feasible to determine the sequence of a defined area of a particular
chromosomeA chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences. Chromosomes also contain DNA-bound proteins, which serve to package the DNA and control its functions.Chromosomes...
. Typical molecular systematic analyses require the sequencing of around 1000
base pairIn molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...
s. At any location within such a sequence, the bases found in a given position may vary between organisms. The particular sequence found in a given organism is referred to as its
haplotypeA haplotype in genetics is a combination of alleles at adjacent locations on the chromosome that are transmitted together...
. In principle, since there are four base types, with 1000 base pairs, we could have 4
1000 distinct haplotypes. However, for organisms within a particular species or in a group of related species, it has been found empirically that only a minority of sites show any variation at all and most of the variations that are found are correlated, so that the number of distinct haplotypes that are found is relatively small.
In a molecular systematic analysis, the haplotypes are determined for a defined area of genetic material; a substantial sample of individuals of the target
speciesIn biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...
or other
taxon|thumb|270px|[[African elephants]] form a widely-accepted taxon, the [[genus]] LoxodontaA taxon is a group of organisms, which a taxonomist adjudges to be a unit. Usually a taxon is given a name and a rank, although neither is a requirement...
is used, however many current studies are based on single individuals. Haplotypes of individuals of closely related, but different, taxa are also determined. Finally, haplotypes from a smaller number of individuals from a definitely different taxon are determined: These are referred to as an
out group. The base sequences for the haplotypes are then compared. In the simplest case, the difference between two haplotypes is assessed by counting the number of locations where they have different bases: This is referred to as the number of
substitutions (other kinds of differences between haplotypes can also occur, for example the
insertion of a section of
nucleic acidNucleic acids are biological molecules essential for life, and include DNA and RNA . Together with proteins, nucleic acids make up the most important macromolecules; each is found in abundance in all living things, where they function in encoding, transmitting and expressing genetic information...
in one haplotype that is not present in another). The difference between organisms is usually re-expressed as a
percentage divergence, by dividing the number of substitutions by the number of base pairs analysed: the hope is that this measure will be independent of the location and length of the section of DNA that is sequenced.
An older and superseded approach was to determine the divergences between the
genotypeThe genotype is the genetic makeup of a cell, an organism, or an individual usually with reference to a specific character under consideration...
s of individuals by
DNA-DNA hybridisationDNA-DNA hybridization generally refers to a molecular biology technique that measures the degree of genetic similarity between pools of DNA sequences. It is usually used to determine the genetic distance between two species...
. The advantage claimed for using hybridisation rather than gene sequencing was that it was based on the entire genotype, rather than on particular sections of DNA. Modern sequence comparison techniques overcome this objection by the use of multiple sequences.
Once the divergences between all pairs of samples have been determined, the resulting
triangular matrixIn the mathematical discipline of linear algebra, a triangular matrix is a special kind of square matrix where either all the entries below or all the entries above the main diagonal are zero...
of differences is submitted to some form of statistical cluster analysis, and the resulting
dendrogramA dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering...
is examined in order to see whether the samples cluster in the way that would be expected from current ideas about the taxonomy of the group, or not. Any group of haplotypes that are all more similar to one another than any of them is to any other haplotype may be said to constitute a
cladeA clade is a group consisting of a species and all its descendants. In the terms of biological systematics, a clade is a single "branch" on the "tree of life". The idea that such a "natural group" of organisms should be grouped together and given a taxonomic name is central to biological...
. Statistical techniques such as
bootstrappingIn statistics, bootstrapping is a computer-based method for assigning measures of accuracy to sample estimates . This technique allows estimation of the sample distribution of almost any statistic using only very simple methods...
and jackknifing help in providing reliability estimates for the positions of haplotypes within the evolutionary trees.
Limitations of molecular systematics
Molecular systematics is an essentially
cladisticCladistics is a method of classifying species of organisms into groups called clades, which consist of an ancestor organism and all its descendants . For example, birds, dinosaurs, crocodiles, and all descendants of their most recent common ancestor form a clade...
approach: it assumes that classification must correspond to phylogenetic descent, and that all valid taxa must be monophyletic.
Molecular phylogenies can be affected by myriad problems, including long-branch attraction,
saturationSaturation or saturated may refer to:- Meteorology :* Dew point, which is a temperature that occurs when atmospheric humidity reaches 100% and the air is saturated with moisture- Physics :...
, and
taxon|thumb|270px|[[African elephants]] form a widely-accepted taxon, the [[genus]] LoxodontaA taxon is a group of organisms, which a taxonomist adjudges to be a unit. Usually a taxon is given a name and a rank, although neither is a requirement...
sampling problems: This means that strikingly different results can be obtained by applying different models to the same dataset.
See also
- molecular evolution
Molecular evolution is in part a process of evolution at the scale of DNA, RNA, and proteins. Molecular evolution emerged as a scientific field in the 1960s as researchers from molecular biology, evolutionary biology and population genetics sought to understand recent discoveries on the structure...
- computational phylogenetics
Computational phylogenetics is the application of computational algorithms, methods and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa...
- PhyloCode
The International Code of Phylogenetic Nomenclature, known as the PhyloCode for short, is a developing draft for a formal set of rules governing phylogenetic nomenclature...
- Microbial phylogenetics
Microbial phylogenetics is the study of the evolutionary relatedness among various groups of microorganisms. The molecular approach to microbial phylogenetic analysis, pioneered by Carl Woese in the 1970s and leading to the three-domain model , revolutionized our thinking about evolution in the...
Further reading
- Felsenstein, J.
Joseph "Joe" Felsenstein is Professor in the Departments of Genome Sciences and Biology and Adjunct Professor in the Departments of Computer Science and Statistics at the University of Washington in Seattle...
2004. Inferring phylogenies. Sinauer Associates Incorporated. ISBN 0-87893-177-5.
- Hillis, D. M. & Moritz, C. 1996. Molecular systematics. 2nd ed. Sinauer Associates Incorporated. ISBN 0-87893-282-8.
- Page, R. D. M. & Holmes, E. C. 1998. Molecular evolution: a phylogenetic approach. Blackwell Science, Oxford. ISBN 0-86542-889-1.
- Soltis, P.S., Soltis, D.E., and Doyle, J.J. (1992) Molecular systematics of plants. Chapman & Hall, New York. ISBN-0-41202-231-1.
- Soltis, P.S., Soltis, D.E., and Doyle, J.J. (1998) Molecular Systematics of Plants II: DNA Sequencing. Kluwer Academic Publishers Boston, Dordrecht, London. ISBN-0-41211-131-4.
External links