All Topics  
Cladistics

 

   Email Print
   Bookmark   Link






 

Cladistics



 
 
Cladistics is the hierarchical classification of species
Species

In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring....
 based on evolution
Evolution

In biology, evolution is change in the heritability trait of a population of organisms from one generation to the next. These changes are caused by a combination of three main processes: variation, reproduction, and selection....
ary ancestry. Cladistics is distinguished from other taxonomic systems because it focuses on evolution
Evolution

In biology, evolution is change in the heritability trait of a population of organisms from one generation to the next. These changes are caused by a combination of three main processes: variation, reproduction, and selection....
 rather than similarities between species, and because it places heavy emphasis on objective, quantitative analysis.

Cladistics generates diagrams called cladograms that represent the evolutionary tree of life
Tree of life (science)

Charles Darwin believed that phylogeny, the ascent of all species through time, was expressible as a metaphor he termed the Tree of Life. The modern development of this idea is called the Phylogenetic tree....
. DNA
DNA

Deoxyribonucleic acid is a nucleic acid that contains the genetics instructions used in the development and functioning of all known living organisms and some viruses....
 and RNA
RNA

Ribonucleic acid is a type of molecule that consists of a long chain of nucleotide units. Each nucleotide consists of a nucleobase, a ribose sugar, and a phosphate....
 sequencing data are used in many important cladistic efforts.






Discussion
Ask a question about 'Cladistics'
Start a new discussion about 'Cladistics'
Answer questions from other users
Full Discussion Forum



Recent Posts









Encyclopedia


Cladistics is the hierarchical classification of species
Species

In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring....
 based on evolution
Evolution

In biology, evolution is change in the heritability trait of a population of organisms from one generation to the next. These changes are caused by a combination of three main processes: variation, reproduction, and selection....
ary ancestry. Cladistics is distinguished from other taxonomic systems because it focuses on evolution
Evolution

In biology, evolution is change in the heritability trait of a population of organisms from one generation to the next. These changes are caused by a combination of three main processes: variation, reproduction, and selection....
 rather than similarities between species, and because it places heavy emphasis on objective, quantitative analysis.

Cladistics generates diagrams called cladograms that represent the evolutionary tree of life
Tree of life (science)

Charles Darwin believed that phylogeny, the ascent of all species through time, was expressible as a metaphor he termed the Tree of Life. The modern development of this idea is called the Phylogenetic tree....
. DNA
DNA

Deoxyribonucleic acid is a nucleic acid that contains the genetics instructions used in the development and functioning of all known living organisms and some viruses....
 and RNA
RNA

Ribonucleic acid is a type of molecule that consists of a long chain of nucleotide units. Each nucleotide consists of a nucleobase, a ribose sugar, and a phosphate....
 sequencing data are used in many important cladistic efforts. Computer programs are widely used in cladistics, due to the highly complex nature of cladogram generation procedures
Computational phylogenetics

Computational phylogenetics is the application of computational algorithms, methods and programs to Phylogenetics analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa....
.

Cladistics originated in the work of the German entomologist, Willi Hennig
Willi Hennig

Emil Hans Willi Hennig was a Germans biologist who is considered the founder of phylogenetic systematics, also known as cladistics. With his works on evolution and systematics he revolutionised the view of the natural order of beings....
, who himself referred to it as phylogenetic systematics; the use of the terms "cladistics" and "clade" was popularized by other researchers. The term phylogenetics
Phylogenetics

In biology, phylogenetics is the study of evolutionary relatedness among various groups of organisms , which is discovered through molecular sequencing data and morphological data matrices....
 is often used synonymously with cladistics. Cladistics originated in the field of biology
Biology

Biology is a branch of the natural sciences concerned with the study of living organisms and their interaction with each other and their environment ....
 but in recent years has found application in other disciplines. The word cladistics is derived from the ancient Greek
Ancient Greek

Ancient Greek is the historical stage in the development of the Greek language spanning across the Archaic Greece , Classical Greece , and Hellenistic civilization periods of ancient Greece and the classical antiquity....
 , klados, "branch."

Terminology

Clade Types
* A clade
Clade

A clade is a term used in modern alpha taxonomy, the scientific classification of living and fossil organisms, to describe a monophyletic group, defined as a group consisting of a single common ancestor and all its descendants.The term "monophyletic group" is used in this article in the conventional sense of "an a...
 is an ancestor and all of its descendents
  • A monophyletic group is a clade
  • A paraphyletic group is a monophyletic group that excludes some of the descendants (e.g. reptiles are sauropsids excluding birds). Most cladists discourage the use of paraphyletic groups.
  • A polyphyletic group is a group consisting of members from two non-overlapping monophyletic groups (e.g. flying animals). Most cladists discourage the use of polyphyletic groups.
  • An outgroup
    Outgroup

    In cladistics, whenever three or more monophyletic groups of organisms are compared, and all but one of them are more closely related to each other than any single one of them is to the last, the latter group is known as the outgroup....
     is an organism that is considered not to be part of the group in question, but is closely related to the group.
A characteristic that is present in both the outgroups and in the ancestors is called a plesiomorphy (meaning "close form", also called an ancestral state).
  • A characteristic that occurs only in later descendants is called an apomorphy (meaning "separate form", also called a "derived" state) for that group. Note: The adjectives plesiomorphic and apomorphic are used instead of "primitive" and "advanced" to avoid placing value-judgments on the evolution of the character states, since both may be advantageous in different circumstances. It is not uncommon to refer informally to a collective set of plesiomorphies as a ground plan for the clade or clades they refer to.
  • A species or clade is basal
    Basal (phylogenetics)

    In phylogenetics, a basal clade is the earliest clade to branch in a larger clade; it appears at the base of a cladogram.A basal group form an outgroup to the rest of the clade, such as in the following example:...
     to another clade if it holds more plesiomorphic characters than that other clade. Usually a basal group is very species-poor as compared to a more derived group. It is not a requirement that a basal group be extant. For example, palaeodicots are basal to flowering plants.
  • A clade or species located within another clade is said to be nested within that clade.


Three definitions of clade
There are three major ways to define a clade
Clade

A clade is a term used in modern alpha taxonomy, the scientific classification of living and fossil organisms, to describe a monophyletic group, defined as a group consisting of a single common ancestor and all its descendants.The term "monophyletic group" is used in this article in the conventional sense of "an a...
 for use in a cladistic taxonomy.

  • Node-based: the last common ancestor of A and B, and all descendants of that ancestor. Crown group
    Crown group

    A crown group is the smallest monophyletic group, or "clade", to contain the last common ancestor of all members, and all of that ancestor's descendants....
    s are a type of node-based clade.


  • Branch-based: the first ancestor of A which is not also an ancestor of Z, and all descendants of that ancestor. (This type of definition was originally called "stem-based", but this was changed to avoid confusion with the term "stem group".) Total groups are a type of branch-based clade.


  • Apomorphy-based: the first ancestor of A to possess derived trait M homologously
    Homology (biology)

    In evolutionary biology, homology refers to any similarity between characteristics that is due to their common descent. The word homologous derives from the ancient Greek ??????e??, 'to agree'....
     (that is, synapomorphically
    Synapomorphy

    In evolutionary biology, a synapomorphy is a derived character state shared by two or more terminal groups and inherited from their most recent common ancestor....
    ) with that trait in A, and all descendants of that ancestor.


History of cladistics

Hennig's major book, even the 1979 version, does not contain the term cladistics in the index. He referred to his own approach as phylogenetic systematics, as implied by the book's title. A review paper by Dupuis observes that the term clade was introduced in 1958 by Julian Huxley
Julian Huxley

Sir Julian Sorell Huxley Fellow of the Royal Society was an English evolutionary biologist, Humanist and Internationalism . He was a proponent of natural selection, and a leading figure in the mid-twentieth century evolutionary synthesis....
, cladistic by Cain and Harrison in 1960, and cladist (for an adherent of Hennig's school) by Mayr in 1965.

From the time of Hennig's original formulation until the end of the 1980s cladistics remained a minority approach to classification. However in the 1990s it rapidly became the dominant method of classification in evolutionary biology. Cheap but increasingly powerful personal computer
Personal computer

A personal computer is any general-purpose computer whose original sales price, size, and capabilities make it useful for individuals, and which is intended to be operated directly by an end user, with no intervening computer operator....
s made it possible to process large quantities of data about organisms and their characteristics. At about the same time the development of effective polymerase chain reaction
Polymerase chain reaction

The polymerase chain reaction is a technique widely used in molecular biology. It derives its name from one of its key components, a DNA polymerase used to amplify a piece of DNA by in vitro enzyme DNA replication....
 techniques made it possible to apply cladistic methods of analysis to biochemical features of organisms as well as to anatomical ones.

Cladistics as a successor to phenetics
For some decades in the mid to late 20th century, a commonly used methodology was phenetics
Phenetics

In biology, phenetics, also known as numerical taxonomy or taximetrics, is an attempt to classify organisms based on overall similarity, usually in Morphology or other observable traits, regardless of their phylogeny or evolutionary relation....
 ("numerical taxonomy"). This can be seen as a predecessor to some methods of today's cladistics (namely distance matrix
Distance matrices in phylogeny

Distance matrices are used in phylogeny asNon-parametric distance methods were originally applied to phenetic data using a matrix of pairwise distances....
 methods like neighbor-joining
Neighbor-joining

In bioinformatics, neighbor-joining is a bottom-up clustering method used for the construction of phylogeny tree data structures. Usually used for trees based on DNA or protein primary structure data, the algorithm requires knowledge of the distance between each pair of taxa in the tree....
), but made no attempt to resolve phylogeny, only similarities.

Cladograms

The starting point of cladistic analysis is a group of species and molecular, morphological, or other data characterizing those species. The end result is a tree-like
Tree (graph theory)

In mathematics, more specifically graph theory, a tree is a graph in which any two Vertex are connected by exactly one path . Alternatively, any connectedness graph with no Cycle is a tree....
 relationship diagram called a cladogram, or sometimes a dendrogram (Greek for "tree drawing"). The cladogram graphically represents a hypothetical evolutionary process. Cladograms are subject to revision as additional data become available.

Synonyms

The terms evolutionary tree, and sometimes phylogenetic tree
Phylogenetic tree

A phylogenetic tree or evolutionary tree is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common descent....
 are often used synonymously with cladogram, but others treat phylogenetic tree as a broader term that includes trees generated with a nonevolutionary emphasis.

Subtrees are clades

In cladograms, all organisms lie at the leaves. The two taxa
Taxon

A taxon or taxonomic unit is a name designating an organism or a group of organisms. In biological nomenclature according to Carl Linnaeus, a taxon is assigned a taxonomic rank and can be placed at a particular level in a systematic hierarchy reflecting evolutionary relationships....
 on either side of a split are called sister taxa or sister groups. Each subtree, whether it contains only two or a hundred thousand items, is called a clade
Clade

A clade is a term used in modern alpha taxonomy, the scientific classification of living and fossil organisms, to describe a monophyletic group, defined as a group consisting of a single common ancestor and all its descendants.The term "monophyletic group" is used in this article in the conventional sense of "an a...
.

2-way versus 3-way forks

Many cladists require that all forks in a cladogram be 2-way forks. Some cladograms include 3-way or 4-way forks when there are insufficient data to resolve the forking to a higher level of detail. See phylogenetic tree
Phylogenetic tree

A phylogenetic tree or evolutionary tree is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common descent....
 for more information about forking choices in trees.

Number of distinct cladograms

For a given set of species, the number of distinct cladograms that can be drawn (ignoring which cladogram best matches the species characteristics) is:

This exponential growth of the number of possible cladograms explains why manual creation of cladograms becomes very difficult when the number of species is large.

Depth

If a cladogram represents N species, the number of levels (the "depth") in the cladogram is on the order of log2(N). For example, if there are 32 species of deer
Deer

Deer are the ruminant mammals forming the family Cervidae . A number of broadly similar animals from related families within the order even-toed ungulate are often also called deer....
, a cladogram representing deer will be around 5 levels deep (because 25 = 32). A cladogram representing the complete tree of life, with about 10 million species, would be about 23 levels deep. This formula gives a lower limit: in most cases the actual depth will be a larger value because the various branches of the cladogram will not be uniformly deep. Conversely, the depth may be shallower if forks larger than 2-way forks are permitted.

Time scale

A cladogram tree has an implicit time axis, with time running forward from the base of the tree to the leaves of the tree. If the approximate date (for example, expressed as millions of years ago) of all the evolutionary forks were known, those dates could be captured in the cladogram. Thus, the time axis of the cladogram could be assigned a time scale (e.g. 1 cm = 1 million years), and the forks of the tree could be graphically located along the time axis. Such cladograms are called scaled cladograms. Many cladograms are not scaled along the time axis, for a variety of reasons:
  • Many cladograms are built from species characteristics that cannot be readily dated (e.g. morphological data in the absence of fossils or other dating information)
  • When the characteristic data are DNA/RNA sequences, it is feasible to use sequence differences to establish the relative ages of the forks, but converting those ages into actual years requires a significant approximation of the rate of change
  • Even when the dating information is available, positioning the cladogram's forks along the time axis in proportion to their dates may cause the cladogram to become difficult to understand or hard to fit within a human-readable format


Extinct species

Cladistics makes no distinction between extinct and nonextinct species, and it is appropriate to include extinct species in the group of organisms being analyzed. Cladograms that are based on DNA/RNA generally do not include extinct species because DNA/RNA samples from extinct species are rare. Cladograms based on morphology, especially morphological characteristics that are preserved in fossils, are more likely to include extinct species.

Cladistics in taxonomy


Cladistics contrasted with traditional taxonomy


Prior to the advent of cladistics, most taxonomists used Linnaean taxonomy
Linnaean taxonomy

Linnaean taxonomy is a method of classifying living things, originally devised by Carolus Linnaeus , although it has changed considerably since his time....
 and later Evolutionary taxonomy
Evolutionary taxonomy

Evolutionary taxonomy or evolutionary systematics seeks to classify organisms using a combination of phylogenetic relationship and overall similarity....
 to organize life forms. These traditional approaches, still in use by some researchers (especially in works intended for a more general audience) use several fixed levels of a hierarchy, such as kingdom, phylum
Phylum

A phylum "Phylum" is adopted from the Greek phylai, the clan-based voting groups in Greek city-states. is a taxonomic rank below Kingdom and above Class ....
, class
Class (biology)

A class is the taxonomic rank in the biological classification of organisms in biology below phylum and above Order .The orders of taxonomy are life, Domain , kingdom , phylum, class , order , family , genus, and species....
, order
Order (biology)

In Biological classification used in biology, the order is a taxonomic rank between class and family . The superorder is a rank between class and order....
, and family
Family (biology)

In biological classification, family is a taxonomic rank. Exact details of formal nomenclature depend on the Nomenclature Codes which applies....
. Cladistics does not use those terms, because one of the fundamental premises of cladistics is that the evolutionary tree is so deep and so complex that it is inadvisable to set a fixed number of levels.

Evolutionary taxonomy insists that groups reflect phylogenies
Phylogenetics

In biology, phylogenetics is the study of evolutionary relatedness among various groups of organisms , which is discovered through molecular sequencing data and morphological data matrices....
. In contrast, Linnean taxonomy allows both monophyletic and paraphyletic groups as taxa. Since the early 20th century, Linnaean taxonomists have generally attempted to make genus
Genus

A genus is a low-level taxonomic rank used in the classification of living and fossil organisms. The taxonomic ranks are domain , kingdom , phylum, class , order , family , genus, and species....
-level and lower-level taxa monophyletic. Ernst Mayr drew a distinction between the terms cladistics and phylogeny, using the term cladistics to refer to classifications which only take into account genealogy
Genealogy

Genealogy is the study of families and the tracing of their lineages and history. Genealogists use oral traditions, historical records, genetic analysis, and other records to obtain information about a family and to demonstrate kinship and pedigree of its members....
, as opposed to phylogeny, which had previously been used in a broader sense to refer to the combination of genealogy and amount of divergence from an ancestor (i.e. Evolutionary taxonomy). Mayr wrote, in 1985:

Willi Hennig
Willi Hennig

Emil Hans Willi Hennig was a Germans biologist who is considered the founder of phylogenetic systematics, also known as cladistics. With his works on evolution and systematics he revolutionised the view of the natural order of beings....
's pioneering work provoked a spirited debate about the relative merits of cladistics versus traditional taxonomy which has continued down to the present. Some of the debates that the cladists engaged in had been running since the 19th century, but they entered these debates with a new fervor, as can be seen from the Foreword to Hennig (1979) by Rosen, Nelson, and Patterson:

Cladistics strictly and exclusively follows phylogeny and has arbitrarily deep trees with binary branching: each taxon is a clade. Linnaean taxonomy, while following phylogeny, also subjectively considers morphology
Morphology (biology)

The term morphology in biology refers to form, structure and configuration of an organism. This includes aspects of the outward appearance as well as the form and structure of the internal parts like bones and organs....
 and has a fixed hierarchy, whose taxa are not always clades.

Paraphyletic groups discouraged

Many cladists discourage the use of paraphyletic groups because they detract from cladistics' emphasis on clades (monophyletic groups). In contrast, proponents of the use of paraphyletic groups argue that any dividing line in a cladogram creates both a monophyletic section above and a paraphyletic section below. They also contend that paraphyletic taxa are necessary for classifying earlier sections of the tree – for instance, the early vertebrates that would someday evolve into the family Hominidae
Hominidae

The Hominidae form a taxonomic biological family, including four extant genus: Homo s, chimpanzees, gorillas, and orangutans.A number of known extinct genera are grouped with humans in the Hominina subtribe, others with orangutans in the Ponginae subtribe....
 cannot be placed in any other monophyletic family. They also argue that paraphyletic taxa provide information about significant changes in organisms' morphology, ecology, or life history – in short, that both paraphyletic groups and clades are valuable notions with separate purposes.

Complexity of the Tree of Life

One argument in favor of cladistics is that it supports arbitrarily complex, arbitrarily deep trees. Especially when extinct species are considered (both known and unknown), the complexity and depth of the tree can be very large. Every single speciation event, including all the species that are now extinct, represents an additional fork on the hypothetical, complete cladogram representing the full tree of life. Fractal
Fractal

A fractal is generally "a rough or fragmented Shape that can be split into parts, each of which is a reduced-size copy of the whole," a property called self-similarity....
s can be used to represent this notion of increasing detail: as a viewpoint zooms into the tree of life, the complexity remains virtually constant. This great complexity of the tree, and the uncertainty associated with the complexity, are among the reasons that cladists cite for the attractiveness of cladistics over traditional taxonomy.

Proponents of noncladistic approaches to taxonomy point to punctuated equilibrium
Punctuated equilibrium

Punctuated equilibrium is a theory in Evolution which states that most Sexual reproduction species experience little change for most of their geological history, and that when phenotypic evolution does occur, it is localized in rare, rapid events of branching speciation ....
 to bolster the case that the tree of life has a finite depth and finite complexity. If the number of species currently alive is finite, and the number of extinct species that we will ever know about is finite, then the depth and complexity of the tree of life is bounded, and there is no need to handle arbitrarily deep trees.

PhyloCode approach to naming species

A formal code of phylogenetic nomenclature, the PhyloCode
PhyloCode

The International Code of Phylogenetic Nomenclature, known for short as the PhyloCode, is a developing draft for a formal set of rules governing phylogenetic nomenclature....
, is currently under development for cladistic taxonomy. It is intended for use by both those who would like to abandon Linnaean taxonomy and those who would like to use taxa and clades side by side. In several instances (see for example Hesperornithes) it has been employed to clarify uncertainties in Linnaean systematics so that in combination they yield a taxonomy that unambiguously places problematic groups in the evolutionary tree in a way that is consistent with current knowledge.

Example

For example, Linnaean taxonomy contains the taxon Tetrapoda, defined morphologically as vertebrates with four limbs (as well as animals with four-limbed ancestors, such as snakes), which is often given the rank of superclass, and divides into the classes
Class (biology)

A class is the taxonomic rank in the biological classification of organisms in biology below phylum and above Order .The orders of taxonomy are life, Domain , kingdom , phylum, class , order , family , genus, and species....
 Amphibia, Reptilia, Aves, Mammalia, and some extinct families
Family (biology)

In biological classification, family is a taxonomic rank. Exact details of formal nomenclature depend on the Nomenclature Codes which applies....
.

Cladistics also contains the taxon Tetrapoda, whose living members can be classified phylogenically as "the clade defined by the common ancestor of amphibians and mammals", or more precisely the clade defined by the common ancestor of a specific amphibian and mammal (or bird or reptile), but whose tree is still being worked out (there are a number of extinct branches). The taxon does not have a rank, and its subtaxa are subclades: these can be contained within one another, but one does not divide the clade into several non-overlapping taxa (as in traditional taxonomy): one can split into two clades at the first branching, but that is all. With regards to the traditional classes, Aves and Mammalia are subclades, contained in the subclade Amniota, but Reptilia* is a paraphyletic taxon, not a clade — "At best, the cladists suggest, we could say that the traditional Reptilia are "non-avian, non-mammalian amniotes" — and instead one divides Amniota into the two clades Sauropsida
Sauropsida

Sauropsida is a group of amniotes that includes reptiles, dinosaurs, and birds. Among amniotes, sauropsida is distinguished from theropsida , also called synapsids....
 (which contains birds and all living amniotes other than mammals, including all living traditional reptiles) and Theropsida (mammals and the extinct "mammal-like reptiles"). Similarly, Amphibia* is a paraphyletic taxon.

Summary of advantages of cladistics

Proponents of cladistics enumerate key distinctions between cladistics and Linnaean taxonomy as follows:

Summary of criticisms of cladistics

Critics of cladistics include Ashlock,Ashlock PD. 1972. Monophyly again. Systematic Zoology 21: 430–438.
Ashlock PD. 1974. The uses of cladistics. Annual Review of Ecology and Systematics 5: 81–89.
Ashlock PD. 1979. An evolutionary systematist’s view of classification. Systematic Zoology 28: 441–450. Mayr,Mayr E. 1978. Origin and history of some terms in systematic and evolutionary biology. Systematic Zoology 27: 83–88.
Mayr E, Bock WJ. 2002. Classifications and other ordering systems. Journal of Zoological Systematics and Evolutionary Research 40: 169–194. Williams and Envall. Some of their criticisms include:

Process to generate a cladogram

Myosinunrootedtree
A simplified procedure for generating a cladogram is:
  1. Gather and organize data
  2. Consider possible cladograms
  3. Select best cladogram


Step 1

A cladistic analysis begins with the following data:
  • a list of species to be organized
  • a list of characteristics to be compared
  • for each species, the value of each of the listed characteristics or character states


For example, if analyzing 20 species of birds, the data might be:
  • the list of 20 species
  • characteristics such as genome sequence, skeletal anatomy, biochemical processes, and feather coloration
  • for each of the 20 species, its particular genome sequence, skeletal anatomy, biochemical processes, and feather coloration


Molecular versus morphological data
The characteristics used to create a cladogram can be roughly categorized as either morphological (synapsid skull, warm blooded, notochord
Notochord

The notochord is a flexible, rod-shaped body found in embryos of all chordates. It is composed of cell s derived from the mesoderm and defines the primitive axis of the embryo....
, unicellular, etc.) or molecular (DNA, RNA, or other genetic information). Prior to the advent of DNA sequencing, all cladistic analysis used morphological data.

As DNA sequencing
DNA sequencing

The term DNA sequencing refers to methods for determining the order of the nucleotide bases, adenine, guanine, cytosine, and thymine, in a molecule of DNA....
 has become cheaper and easier, molecular systematics has become a more and more popular way to reconstruct phylogenies. Using a parsimony criterion is only one of several methods to infer a phylogeny from molecular data; maximum likelihood
Maximum likelihood

Maximum likelihood estimation is a popular statistics method used for fitting a mathematical model to data. The modeling of real world data using estimation by maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit....
 and Bayesian inference
Bayesian inference

Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true....
, which incorporate explicit models of sequence evolution, are non-Hennigian ways to evaluate sequence data. Another powerful method of reconstructing phylogenies is the use of genomic retrotransposon marker
Retrotransposon Marker

Retrotransposon markers are retrotransposons that are used as Cladistics markers.The analysis of Retrotransposon#Types of retrotransposonss ? Short INterspersed Elements ? Retrotransposon#Types of retrotransposonss ? Long INterspersed Elements ? or truncated Retrotransposon#Types of retrotransposonss ? Long Terminal Repeats ? as molecular C...
s, which are thought to be less prone to the problem of reversion that plagues sequence data. They are also generally assumed to have a low incidence of homoplasies because it was once thought that their integration into the genome
Genome

In classical genetics, the genome of a diploid organism including eukarya refers to a full set of chromosomes or genes in a gamete; thereby, a regular somatic cell contains two full sets of genomes....
 was entirely random; this seems at least sometimes not to be the case, however.

Ideally, morphological, molecular, and possibly other phylogenies should be combined into an analysis of total evidence: All have different intrinsic sources of error. For example, character convergence (homoplasy) is much more common in morphological data than in molecular sequence data, but character reversions that are unrecognizable as such are more common in the latter (see long branch attraction
Long branch attraction

Long branch attraction is a phenomenon in phylogenetic analyses when rapidly evolving lineages are inferred to be closely related, regardless of their true evolutionary relationships....
). Morphological homoplasies can usually be recognized as such if character states are defined with enough attention to detail.

Plesiomorphies and synapomorphies
The researcher must decide which character states were present before the last common ancestor of the species group (plesiomorphies) and which were present in the last common ancestor (synapomorphies) and does so by comparison to one or more outgroups. The choice of an outgroup is a crucial step in cladistic analysis because different outgroups can produce trees with profoundly different topologies. Note that only synapomorphies are of use in characterizing clades.

Avoid homoplasies
A homoplasy is a character that is shared by multiple species due to some cause other than common ancestry. Typically, homoplasies occur due to convergent evolution. Use of homoplasies when building a cladogram is sometimes unavoidable but is to be avoided when possible.

A well known example of homoplasy due to convergent evolution would be the character, "presence of wings". Though the wings of birds, bat
Bat

Bats are mammals in the order Chiroptera. The forelimbs of all bats are developed as wings, making them the only mammals naturally capable of sustained flight ....
s, and insects serve the same function, each evolved independently, as can be seen by their anatomy
Anatomy

Anatomy is a branch of biology that is the consideration of the body plan. It is a general term that includes human anatomy, animal anatomy and plant anatomy ....
. If a bird, bat, and a winged insect were scored for the character, "presence of wings", a homoplasy would be introduced into the dataset, and this would confound the analysis, possibly resulting in a false evolutionary scenario.

Homoplasies can often be avoided outright in morphological datasets by defining characters more precisely and increasing their number. When analyzing "supertrees" (datasets incorporating as many taxa of a suspected clade as possible), it may become unavoidable to introduce character definitions that are imprecise, as otherwise the characters might not apply at all to a large number of taxa; to continue with the "wings" example, the presence of wings would hardly be a useful character if attempting a phylogeny of all Metazoa, as most of these don't have wings at all. Cautious choice and definition of characters thus is another important element in cladistic analyses. With a faulty outgroup or character set, no method of evaluation is likely to produce a phylogeny representing the evolutionary reality.

Step 2

When there are just a few species being organized, it is possible to do this step manually, but most cases require a computer program. There are scores of computer programs available to support cladistics. See phylogenetic tree
Phylogenetic tree

A phylogenetic tree or evolutionary tree is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common descent....
 for more information about tree-generating computer programs.

Because the total number of possible cladograms grows exponentially with the number of species, it is impractical for a computer program to evaluate every individual cladogram. A typical cladistic program begins by using heuristic
Heuristic

Heuristic is an adjective for methods that help in problem solving, in turn leading to learning and discovery. These methods in most cases employ experimentation and trial-and-error techniques....
 techniques to identify a small number of candidate cladograms. Many cladistic programs then continue the search with the following repetitive steps:

  1. Evaluate the candidate cladograms by comparing them to the characteristic data
  2. Identify the best candidates that are most consistent with the characteristic data
  3. Create additional candidates by creating several variants of each of the best candidates from the prior step
  4. Use heuristics to create several new candidate cladograms unrelated to the prior candidates
  5. Repeat these steps until the cladograms stop getting better


Computer programs that generate cladograms use algorithms that are very computationally intensive, because the cladogram algorithm is NP-hard
NP-hard

NP-hard , in computational complexity theory, is a class of problems informally "at least as hard as the hardest problems in NP ." A problem H is NP-hard if and only if there is an NP-complete problem L that is polynomial-time Turing reduction to H, i.e....
.

Step 3

There are several algorithms available to identify the "best" cladogram. Most algorithms use a metric
Metric (mathematics)

In mathematics, a metric or distance function is a function which defines a distance between elements of a Set . A set with a metric is called a metric space....
 to measure how consistent a candidate cladogram is with the data. Most cladogram algorithms use the mathematical techniques of optimization
Optimization (mathematics)

In mathematics, the simplest case of optimization, or mathematical programming, refers to the study of problems in which one seeks to maxima and minima or maxima and minima a Function of a real variable by systematically choosing the values of Real number or integer variables from within an allowed set....
 and minimization.

In general, cladogram generation algorithms must be implemented as computer programs, although some algorithms can be performed manually when the data sets are trivial (for example, just a few species and a couple of characteristics).

Some algorithms are useful only when the characteristic data are molecular (DNA, RNA); other algorithms are useful only when the characteristic data are morphological. Other algorithms can be used when the characteristic data includes both molecular and morphological data.

Algorithms for cladograms include least squares
Least squares

The method of least squares or ordinary least squares is used to solve overdetermined systems. Least squares is often applied in statistical contexts, particularly regression analysis....
, neighbor-joining
Neighbor-joining

In bioinformatics, neighbor-joining is a bottom-up clustering method used for the construction of phylogeny tree data structures. Usually used for trees based on DNA or protein primary structure data, the algorithm requires knowledge of the distance between each pair of taxa in the tree....
, parsimony
Parsimony

Parsimony is a 'less is better' concept of frugality, economy or caution in arriving at a hypothesis or course of action. The word derives from Middle English parcimony, from Latin parsimonia, from parsus, past participle of parcere: to spare....
, maximum likelihood
Maximum likelihood

Maximum likelihood estimation is a popular statistics method used for fitting a mathematical model to data. The modeling of real world data using estimation by maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit....
, and Bayesian inference
Bayesian inference

Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true....
.

Biologists sometimes use the term parsimony
Parsimony

Parsimony is a 'less is better' concept of frugality, economy or caution in arriving at a hypothesis or course of action. The word derives from Middle English parcimony, from Latin parsimonia, from parsus, past participle of parcere: to spare....
 for a specific kind of cladogram generation algorithm and sometimes as an umbrella term for all cladogram algorithms.

Algorithms that perform optimization tasks (such as building cladograms) can be sensitive to the order in which the input data (the list of species and their characteristics) is presented. Inputting the data in various orders can cause the same algorithm to produce different "best" cladograms. In these situations, the user should input the data in various orders and compare the results.

Using different algorithms on a single data set can sometimes yield different "best" cladograms, because each algorithm may have a unique definition of what is "best".

Because of the astronomical number of possible cladograms, algorithms cannot guarantee that the solution is the overall best solution. A nonoptimal cladogram will be selected if the program settles on a local minimum rather than the desired global minimum. To help solve this problem, many cladogram algorithms use a simulated annealing
Simulated annealing

Simulated annealing is a generic probabilistic algorithm metaheuristic for the global optimization problem of applied mathematics, namely locating a good approximation to the global optimum of a given function in a large search space....
 approach to increase the likelihood that the selected cladogram is the optimal one.

Application to other disciplines

The processes used to generate cladograms are not limited to the field of biology. The generic nature of cladistics means that cladistics can be used to organize groups of items in many different academic realms. The only requirement is that the items have characteristics that can be identified and measured.

Recent attempts in the use of cladistic methods outside of biology attack problems in:
  • anthropology;
  • languages
    Historical linguistics

    Historical linguistics is the study of language change. It has five main concerns:* to describe and account for observed changes in particular languages;...
    ;
  • the filiation of manuscripts in textual criticism
    Textual criticism

    Textual criticism is a branch of literary criticism that is concerned with the identification and removal of transcription errors in the Writing of manuscripts....
    , usually called stemma
    Textual criticism

    Textual criticism is a branch of literary criticism that is concerned with the identification and removal of transcription errors in the Writing of manuscripts....
    , and
  • the lineage of Linux distros, a class of computer operating system
    Operating system

    An operating system is an interface between hardware and applications; it is responsible for the management and coordination of activities and the sharing of the limited resources of the computer....
    .


Footnotes


See also

  • Bauplan
  • Bioinformatics
    Bioinformatics

    Bioinformatics is the application of information technology to the field of molecular biology. The term bioinformatics was coined by Paulien Hogeweg in 1978 for the study of informatic processes in biotic systems....
  • Biomathematics
  • Clade
    Clade

    A clade is a term used in modern alpha taxonomy, the scientific classification of living and fossil organisms, to describe a monophyletic group, defined as a group consisting of a single common ancestor and all its descendants.The term "monophyletic group" is used in this article in the conventional sense of "an a...
  • Coalescent theory
    Coalescent theory

    In genetics, coalescent theory is a retrospective model of population genetics. It employs a sample of individuals from a population to trace all alleles of a gene shared by all members of the population to a single ancestral copy, known as the most recent common ancestor ....
  • Computational phylogenetics
    Computational phylogenetics

    Computational phylogenetics is the application of computational algorithms, methods and programs to Phylogenetics analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa....
  • Dendrogram
    Dendrogram

    A dendrogram is a Tree diagram frequently used to illustrate the arrangement of the clusters produced by a Cluster analysis. Dendrograms are often used in computational biology to illustrate the clustering of genes....
  • Evolution of Mollusca for a cladistic illustration
  • Last common ancestor
    Last Common Ancestor

    A Last Common Ancestor is a term given to the most recent common ancestor of any two clades, for example species that came to be separated by a species barrier....
  • Important publications in phylogenetics
  • Language family
    Language family

    A language family is a group of languages related Genetic from a common ancestor, called the proto-language of that family.As with Alpha taxonomy, the evidence of relationship is observable shared characteristics....
  • Maximum parsimony
    Maximum parsimony

    Parsimony is a non-parametric statistics method commonly used in computational phylogenetics for estimating phylogeny. Under parsimony, the preferred phylogenetic tree is the tree that requires the least evolutionary change to explain some observed data....
  • Molecular phylogeny
    Molecular phylogeny

    Molecular phylogenetics, also known as molecular systematics, is the use of the structure of molecules to gain information on an organism's evolutionary relationships....
  • PhyloCode
    PhyloCode

    The International Code of Phylogenetic Nomenclature, known for short as the PhyloCode, is a developing draft for a formal set of rules governing phylogenetic nomenclature....
  • Phylogenetics
    Phylogenetics

    In biology, phylogenetics is the study of evolutionary relatedness among various groups of organisms , which is discovered through molecular sequencing data and morphological data matrices....
  • Phylogenetic tree
    Phylogenetic tree

    A phylogenetic tree or evolutionary tree is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common descent....
  • Phylogenetic network
    Phylogenetic network

    A phylogenetic network is any Graph used to visualize evolutionary relationships between species or organisms. It is employed when reticulate events such as Hybrid , horizontal gene transfer, recombination, or gene duplication and loss are believed to be involved....
  • Phylogenetics software
  • Phylogenomics
    Phylogenomics

    Phylogenomics can be regarded as the intersection between the fields of evolution and genomics . The term has been used in multiple ways to refer to some type of analysis involving genome data and evolutionary reconstructions, especially phylogenetics....
  • Phylogeography
    Phylogeography

    Phylogeography is the study of the historical processes that may be responsible for the contemporary geographic distributions of individuals. This is accomplished by considering the geographic distribution of individuals in light of the patterns associated with a gene genealogy....
  • Phylogenetic comparative methods
    Phylogenetic comparative methods

    Phylogenetic comparative methods use information on the evolutionary relationships of organisms to compare species #Reference-Harvey-and-Pagel-1991 ....
  • Scientific Classification
    Scientific classification

    Biological classification or scientific classification in biology, is a method by which biologists group and categorize species of organisms....
  • Systematics
    Systematics

    Biological systematics is the study of the diversification of life on the planet Earth, both past and present, and the relationships among living things through time....


  • External links

    • - A high-level cladogram showing the complete tree of life.
    • from Talk.Origins
    • For a cladistic approach to animal classification: