Human mitochondrial molecular clock
Encyclopedia
The human mitochondrial molecular clock is the rate at which mutations have been accumulating in the mitochondrial genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

 of hominids during the course of human evolution
Human evolution
Human evolution refers to the evolutionary history of the genus Homo, including the emergence of Homo sapiens as a distinct species and as a unique category of hominids and mammals...

. The archeological record of human activity from early periods in human prehistory is relatively limited and its interpretation has been controversial. Because of the uncertainties from the archeological record, scientists have turned to molecular dating techniques in order to refine the timeline of human evolution. A major goal of scientists in the field is to develop an accurate hominid mitochondrial molecular clock which could then be used to confidently date events that occurred during the course of human evolution.

Estimates of the mutation rate of human mitochondrial DNA (mtDNA) vary greatly depending on the available data and the method used for estimation. The two main methods of estimation, phylogeny based methods and pedigree based methods, have produced mutation rates that differ by almost an order of magnitude. Current research has been focused on resolving the high variability obtained from different rate estimates.

Rate variability

A major assumption of the molecular clock theory is that mutations within a particular genetic system occur at a statistically uniform rate and this uniform rate can be used for dating genetic events. In practice the assumption of a single uniform rate is an oversimplification. Though a single mutation rate is often applied, it is often a composite or an average of several different mutation rates. Many factors influence observed mutation rate
Mutation rate
In genetics, the mutation rate is the chance of a mutation occurring in an organism or gene in each generation...

s and these factors include the type of samples, the region of the genome studied and the time period covered.

Actual vs. observed rates

The rate at which mutations occur during reproduction, the germline mutation
Germline mutation
A germline mutation is any detectable and heritable variation in the lineage of germ cells. Mutations in these cells are transmitted to offspring, while, on the other hand, those in somatic cells are not. A germline mutation gives rise to a constitutional mutation in the offspring, that is, a...

 rate, is thought to be higher than all observed mutation rates, because not all mutations are successfully passed down to subsequent generations. MtDNA is only passed down along the matrilineal line, and therefore mutations passed down to sons are lost. Random genetic drift may also cause the loss of mutations. For these reasons, the actual mutation rate will not be equivalent to the mutation rate observed from a population sample.

Population size

Population dynamics are believed to influence observed mutation rates. When a population is expanding, more germline mutation
Germline mutation
A germline mutation is any detectable and heritable variation in the lineage of germ cells. Mutations in these cells are transmitted to offspring, while, on the other hand, those in somatic cells are not. A germline mutation gives rise to a constitutional mutation in the offspring, that is, a...

s are preserved in the population. As a result, observed mutation rates tend to increase in an expanding population. When populations contract, as in a population bottleneck
Population bottleneck
A population bottleneck is an evolutionary event in which a significant percentage of a population or species is killed or otherwise prevented from reproducing....

, more germline mutations are lost. Population bottlenecks thus tend to slow down observed mutation rates. Since the emergence of the species homo sapiens about 200,000 years ago, human population have expanded from a few thousand individuals living in Africa to over 6 billion all over the world. However the expansion has not been uniform, the history of human populations may have consisted of both bottlenecks and expansions.

Structural variability

The mutation rate across the mitochondrial genome is not uniformly distributed. Certain regions of the genome are known to mutate more rapidly than others. The Hypervariable region
Hypervariable region
A hypervariable region is a location within nuclear DNA or the D-loop of mitochondrial DNA in which base pairs of nucleotides repeat or have substitutions...

s are known to be highly polymorphic relative to other parts of the genome.

The rate at which mutations accumulate in coding and non-coding regions of the genome also differs as mutations in the coding region
Coding region
The coding region of a gene, also known as the coding sequence or CDS, is that portion of a gene's DNA or RNA, composed of exons, that codes for protein. The region is bounded nearer the 5' end by a start codon and nearer the 3' end with a stop codon...

 are subject to purifying selection
Natural selection
Natural selection is the nonrandom process by which biologic traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution....

. For this reason, some studies avoid coding region or synonymous mutations when calibrating the molecular clock. only consider synonymous mutations, they have recalibrated the molecular clock of human mtDNA as 7990 years per synonymous mutation over
the mitochondrial genome.
consider both coding and non-coding region mutations to arrive at a single mutation rate, but apply a correction factor to account for selection in the coding region.

Temporal variability

The mutation rate has been observed to vary with time. Mutation rates within the human species are faster than those observed along the human-ape lineage. The mutation rate is also thought to be faster in recent times, since the beginning of the Holocene 11,000 years ago.

Parallel mutations and saturation

Parallel mutation(sometimes referred to as Homoplasy) or convergent evolution
Convergent evolution
Convergent evolution describes the acquisition of the same biological trait in unrelated lineages.The wing is a classic example of convergent evolution in action. Although their last common ancestor did not have wings, both birds and bats do, and are capable of powered flight. The wings are...

 occurs when separate lineages have the same mutation independently occur at the same site in the genome.
Saturation
Saturation (genetic)
Genetic saturation is the reduced appearance, which occurs over time, of sequence divergence rate that results from reverse mutations, homoplasies and other multiple changes occurring at single sites along two lineages....

 occurs when a single site experiences multiple mutations. Parallel mutations and saturation result in the underestimation of the mutation rate because they are likely to be overlooked.

Heteroplasmy

Individuals affected by heteroplasmy
Heteroplasmy
Heteroplasmy is the presence of a mixture of more than one type of an organellar genome within a cell or individual...

 have a mixture of mtDNA types, some with new mutations and some without. The new mutations may or may not be passed down to subsequent generations. Thus the presence of heteroplasmic individuals in a sample may complicate the calculation of mutation rates.

Pedigree based

Pedigree methods estimate the mutation rate by comparing the mtDNA sequences of a sample of parent/offspring pairs or analyzing mtDNA sequences of individuals from a deep-rooted genealogy. The number of new mutations in the sample is counted and divided by the total number of parent-to-child DNA transmission events to arrive at a mutation rate.

Phylogeny based

Phylogeny based methods are estimated by first reconstructing the haplotype of the most recent common ancestor (MRCA) of a sample of two or more genetic lineages. A requirement is that the time to the most recent common ancestor(TMRCA) of the sample of lineages must already be known from other independent sources, usually the archeological record. The average number of mutations that have accumulated since the MRCA is then computed and divided by the TMRCA to arrive at the mutation rate. The human mutation rate is usually estimated by comparing the sequences of modern humans and chimpanzees and then reconstructing the ancestral haplotype of the chimpanzee-human common ancestor. According to the paleontological record the last common ancestor of humans may have lived around 6 million years ago.

Pedigree vs. Phylogeny comparison

Rates obtained by pedigree methods are about 10 times faster than those obtained by phylogenetic methods. Several factors acting together may be responsible for this difference. As pedigree methods record mutations in living subjects, the mutation rates from pedigree studies are closer to the germline mutation rate. Pedigree studies use genealogies that are only a few generations deep whereas phylogeny based methods use timescales that are thousands or millions of years deep. According to Henn et al. 2009, phylogeny based methods take into account events that occur over long time scales and are thus less affected by stochastic fluctuations. Howell et al. 2003 suggests that selection, saturation, parallel mutations and genetic drift are responsible for the differences observed between pedigree based methods and phylogeny based methods.

Estimating based on AMH archaeology

style = "Color:#3f3f3f"| Methods/parameters for archaeologically estimated dates of mitochondrial Eve
Study Sequence
type
TAnchor
(location)
Referencing method
(correction method)
Restriction fragments 40, 30, and 12 Ka
(Australia,
New Guinea
New World)
archaeologically defined
migrations matched with
estimated sequence divergence rates
Genomic 40 to 55 Ka
(Papua New Guinea)
14.5 to 21.5 Ka
(Haps H1 and H3)
PNG
Papua New Guinea
Papua New Guinea , officially the Independent State of Papua New Guinea, is a country in Oceania, occupying the eastern half of the island of New Guinea and numerous offshore islands...

 following
Haplogroup P

Anatomical modern humans (AMH) spread out of Africa and over a large area of Eurasia and left artifacts along the northern coast of the Southwest, South, Southeast and East Asia. did not rely on a predicted TCHLCA to estimate SNP rates. Instead, they used evidence of colonization in Southeast Asia and Oceania to estimate mutation rates. In addition they used RFLP technology (Restriction fragment length polymorphism
Restriction fragment length polymorphism
In molecular biology, restriction fragment length polymorphism, or RFLP , is a technique that exploits variations in homologous DNA sequences. It refers to a difference between samples of homologous DNA molecules that come from differing locations of restriction enzyme sites, and to a related...

) to examine differences between DNA. Using these techniques this group came up with a TMRCA of 140,000 to 290,000 years. It should be noted however that Cann et al. (1987) estimated the TMRCA of humans to be approximately 210 ky and the most recent estimates Soares et al. 2009 (using 7 million year chimpanzee human mtDNA MRCA) differ by only 9%, which is relatively close considering the wide confidence range for both estimates and calls for more ancient TCHLCA.

have reevaluated the predicted migrations globally and compared those to the actual evidence. This group used the coding regions of sequences. They postulate that the molecular clock based on chimp-human comparisons is not reliable, particularly in predicting recent migrations, such as founding migrations into Europe, Australia, and the Americans. With this technique this group came up with a TMRCA of 82,000 to 134,000 years.

Estimating based on CHLCA

Because chimps and humans share a matrilineal ancestor, establishing the geological age of that last ancestor allows the estimation of the mutation rate. The chimp-human last common ancestor (CHLCA) is frequently applied as an anchor for mt-TMRCA studies with ranges between 4 and 13 million years cited in the literature. This is one source of variation in the time estimates. The other weakness is the non-clocklike accumulation of SNPs, would tend to make more recent branches look older than they actually are.
SNP rates as described by Soares et al. (2009)
Regions(s) Subregions
(or site within codon)
SNP rate
(per site * year)
Control
region
HVR
Hypervariable region
A hypervariable region is a location within nuclear DNA or the D-loop of mitochondrial DNA in which base pairs of nucleotides repeat or have substitutions...

 I
1.6 × 10−7
HVR II 2.3 × 10−7
remaining 1.5 × 10−8
Protein-
coding
(1st and 2nd) 8.8 × 10−9
(3rd
Wobble base pair
In molecular biology, a wobble base pair is a non-Watson-Crick base pairing between two nucleotides in RNA molecules. The four main wobble base pairs are guanine-uracil, inosine-uracil, inosine-adenine, and inosine-cytosine . The thermodynamic stability of a wobble base pair is comparable to that...

)
1.9 × 10−8
DNA encoding rRNA (rDNA) 8.2 × 10−9
DNA encoding tRNA (tDNA) 6.9 × 10−9
other 2.4 × 10−8
TCHLCA assumed 6.5 Ma, relative rate to 1st & 2nd codons

These two sources may balance each other or amplify each other depending on the direction of the TCHLCA error. There are two major reasons why this method is widely employed. First the pedigree based rates are inappropriate for estimates for very long periods of time. Second, while the archaeology anchored rates represent the intermediate range, archaeological evidence for human colonization often occurs well after colonization. For example, colonization of Eurasia from west to east is believed to have occurred along the Indian Ocean. However, the oldest archaeological sites that also demonstrate anatomically modern humans (AMH) are in China and Australia, greater than 42,000 years in age. However the oldest Indian site with AMH remains is from 34,000 years, and another site with AMH compatible archaeology is in excess of 76,000 years in age. Therefore application of the anchor is a subjective interpretation of when humans were first present.

A simple measure the sequence divergence
Genetic divergence
Genetic divergence is the process in which two or more populations of an ancestral species accumulate independent genetic changes through time, often after the populations have become reproductively isolated for some period of time...

 between humans and chimps by observing the SNPs. Given that the mitogenome is about 16553 base pairs in length (each base-pair which can be aligned with known references is called a site). The formula is:

The '2' in the denominator is derived from the 2 lineages, human and chimpanzee, that split from the CHLCA. Ideally it represents the accumulation of mutations on both lineages but in different positions (SNPs). As long as the number of SNP observed approximates the number of mutations this formula works well. However, at rapidly evolving sites mutations are obscured by saturation affects. Sorting positions within the mitogenome by rate and compensating for saturation are alternative approaches.

Because the TCHLCA is subject to change with more paleontological information, the equation described above allows the comparison of TMRCA from different studies.
style = "Color:#3f3f3f"| Methods/parameters for estimating date of mitochondrial Eve
Study Sequence
type
TCHLCA
(sorting time)
Referencing method
(correction method)
HVR 4 to 6 Ma CH transversions,
(15:1 transition:transversion)
genomic
(not HVR)
5 Ma CH genomic
comparison
genomic
(not HVR)
5 to 7.5 Ma CH
(relaxed rate, rate-class defined)
genomic
(not HVR)
6.0 Ma
(+ 0.5 Ma)
CH
(rate class defined)
genomic
(not HVR)
6.5 Ma
(+ 0.5 Ma)
CH
(rate class defined)
genomic 6.5Ma
(+ 0.5 Ma)
CHLCA anchored, (Examined selection by
Ka/(Ks + k))
Chimpanzee to Human = CH, LCA = last common ancestor

Early, HVR, sequence-based methods

To overcome the effects of saturation
Saturation (genetic)
Genetic saturation is the reduced appearance, which occurs over time, of sequence divergence rate that results from reverse mutations, homoplasies and other multiple changes occurring at single sites along two lineages....

, HVR analysis relied on the transversion
Transversion
In molecular biology, transversion refers to the substitution of a purine for a pyrimidine or vice versa. It can only be reverted by a spontaneous reversion. Because this type of mutation changes the chemical structure dramatically, the consequences of this change tend to be more drastic than those...

al distance between humans and chimpanzees. A transition
Transition (genetics)
In genetics, a transition is a point mutation that changes a purine nucleotide to another purine or a pyrimidine nucleotide to another pyrimidine . Approximately two out of three single nucleotide polymorphisms are transitions....

 to transversion ratio was applied to this distance to estimate sequence divergence in the HVR between chimpanzees and humans, and divided by an assumed TCHLCA of 4 to 6 million years. Based on 26.4 substitutions between chimpanzee and human and 15:1 ratio, the estimated 396 transitions over 610 base-pairs demonstrated sequence divergence of 69.2% (rate * TCHLCA of 0.369), producing divergence rates of roughly 11.5% to 17.3% per million years.
also estimated the sequence divergence rate for the sites in the rapidly evolving HVR I and HVR II regions. As noted in the table above, the rate of evolution is so high that site saturation occurs in direct chimpanzee and human comparisons. Consequently this study used transversions, which evolve at a slower rate than the more common transition polymorphisms. Comparing chimp and human mitogenomes, they noted 26.4 transversions within the HVR regions, however they made no correction for saturation. As more HVR sequence was obtained following this study, it was noted that the dinucleotide site CRS:16181-16182 experienced numerous transversions in parsimony analysis, many of these were considered to be sequencing errors. However the sequencing of Feldhofer I Neanderthal
Neanderthal Genome Project
The Neanderthal genome project is a collaboration of scientists coordinated by the Max Planck Institute for Evolutionary Anthropology in Germany and 454 Life Sciences in the United States to sequence the Neanderthal genome....

 revealed that there was also a transversion between humans and Neanderthals at this site. In addition, noted three sites in which recurrent transversions had occurred in human lineages, two of which are in HVR I, 16265 (12 occurrences) and 16318(8 occurrences).Soares et al excluded 16182 and 16183 from their analysis Therefore, 26.4 transversions was an underestimate of the likely number of transversion events. The year 1991 study also used a transition-to-transversion ratio from the study of old world monkeys of 15:1. However, examination of chimp and gorilla HVR reveals a rate that is lower, and the examination of humans places the rate at 34:1. Therefore this study underestimated that level of sequence divergence between chimpanzee and human. The estimated sequence divergence 0.738/site (includes transversions) is significantly lower than the ~2.5 per site suggested by Soares et al. (2009). These two errors would result in an overestimate of the human mitochondrial TMRCA. However, they failed to detect the basal L0 lineage in the analysis and also failed to detect recurrent transitions in many lineages, which also underestimate the TMRCA. Also, Vigilant et al. (1991) used a more recent CHLCA anchor of 4 to 6 million years.

Coding region sequence based methods

Partial coding region sequence originally supplemented HVR studies because complete coding region sequence was uncommon. There were suspicions that the HVR studies had missed major branches based on some earlier RFLP and coding region studies. was the first study to compare genomic sequences for coalescence analysis. Coding region sequence discriminated M
Haplogroup M (mtDNA)
In human mitochondrial genetics, Haplogroup M is a human mitochondrial DNA haplogroup. An enormous haplogroup spanning all the continents, the macro-haplogroup M, like its sibling N, is a descendant of haplogroup L3....

 and N
Haplogroup N (mtDNA)
In human mitochondrial genetics, Haplogroup N is a human mitochondrial DNA haplogroup. An enormous haplogroup spanning many continents, the macro-haplogroup N, like its sibling M, is a descendant of haplogroup L3....

 haplogroups and L0 and L1 macrohaplogroups. Because the genomic DNA sequencing resolved the two deepest branches it improved some aspects estimating TMRCA over HVR sequence alone. Excluding the D-loop and using a 5-million-year TCHLCA, estimated the mutation rate to be 1.70 × 10−8 per site per year (rate * TCHLCA = 0.085, 15,435 sites).

However, coding region DNA has come under question because coding sequences are either under purifying selection to maintain structure and function, or under regional selection to evolve new capacities. The problem with mutations in the coding region has been described as such: mutations occurring in the coding region that are not lethal to the mitochondria can persist but are negatively selective
Natural selection
Natural selection is the nonrandom process by which biologic traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution....

 to the host; over a few generations these will persist, but over thousands of generations these slowly are pruned from the population, leaving SNPs. However, over thousands of generations regionally selective mutations may not be discriminated from these transient coding region mutations. The problem with rare mutations in the human mitogenomes is significant enough to prompt a half-dozen recent studies on the matter.

estimated the non-D loop region evolution 1.7 × 10−8 per year per site based on 53 non-identical genomic sequence overrepresenting Africa in a global sample. Despite this over-representation, the resolution of the L0 subbranches was lacking and one other deep L1 branches has been found. Despite these limitations that sampling was adequate for the hallmark study. Today, L0 is restricted to African populations, whereas L1 is the ancestral haplogroup of all non-Africans, as well as most Africans. Mitochondrial Eve's sequence can be approximated by comparing a sequence from L0 with a sequence from L1. By reconciling the mutations in L0 and L1. The mtDNA sequences of contemporary human populations will generally differ from Mitochondrial Eve's sequence by about 50 mutations. Mutation rates were not classified according to site (other than excluding the HVR reigons). The TCHLCA used in the year 2000 study of 5 Ma was also lower than values used in the most recent studies.

Inter-comparing rates and studies

Molecular clocking of mitochondrial DNA has been criticized because of its inconsistent molecular clock. A retrospective analysis of any pioneering process will reveal inadequacies. With mitochondrial the inadequacies are the argument from ignorance
Argument from ignorance
Argument from ignorance, also known as argumentum ad ignorantiam or "appeal to ignorance" , is a fallacy in informal logic. It asserts that a proposition is true because it has not yet been proven false, it is "generally accepted"...

of rate variation and overconfidence concerning the TCHLCA of 5 Ma. Lack of historical perspective might explain the second issue, the problem of rate variation is something that could only be resolved by the massive study of mitochondria that followed. The number of HVR sequences that have accumulated from 1987 to 2000 increased by magnitudes. used 2196 mitogenomic sequences and uncovered 10,683 substitution events within these sequences. Eleven of 16560 sites in the mitogenome produced greater than 11% of all the substitutions with statistically significant rate variation within the 11 sites.(CRS sites 16519, 152, 16311, 145, 195, 16189, 16129, 16083, 16362, 160, 709, 16129, 16083, 16362, 150, and 709) They argue that there is a neutral-site mutation rate which is a magnitude slower than rate observed for the fastest site, CRS 16519. Consequently, purifying selection aside, the rate of mutation itself varies between sites, with a few sites much more likely to undergo new mutations relative to others. Soares et al. (2009) noted two spans of DNA, CRS 2651-2700 and 3028-3082, that had no SNPs within the 2196 mitogenomic sequences.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK