Pseudogenes are defunct relatives of known
geneA gene is the basic unit of heredity in a living organism. All living things depend on genes. Genes hold the information to build and maintain their cells and pass genetic traits to offspring...
s that have lost their
proteinProteins are organic compounds made of amino acids arranged in a linear chain and folded into a globular form. The amino acids in a polymer chain are joined together by the peptide bonds between the carboxyl and amino groups of adjacent amino acid residues...
-coding ability or are otherwise no longer
expressedGene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as rRNA genes or tRNA genes, the product is a functional RNA...
in the cell. Although some do not have
intronAn intron is a DNA region within a gene that is not translated into protein. These non-coding sections are transcribed to precursor mRNA and some other RNAs , and subsequently removed by a process called splicing during the processing to mature RNA. After intron splicing An intron is a DNA region...
s or promoters (these pseudogenes are copied from mRNA and incorporated into the chromosome and are called processed pseudogenes), most have some gene-like features (such as promoters,
CpG islandCpG islands are genomic regions that contain a high frequency of CpG sites. In mammalian genomes, CpG islands are typically 300-3,000 base pairs in length. They are in and near approximately 40% of promoters of mammalian genes...
s, and
splice sitesIn molecular biology, splicing is a modification of an RNA after transcription, in which introns are removed and exons are joined. This is needed for the typical eukaryotic messenger RNA before it can be used to produce a correct protein through translation...
), they are nonetheless considered
nonfunctionalA function is part of an answer to a question about why some object or process occurred in a system that evolved through a process of selection. Thus, function refers forward from the object or process, along some chain of causation, to the goal or success...
, due to their lack of protein-coding ability resulting from various genetic disablements (
stop codonIn the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are unique sequences of amino acids, and most codons in messenger RNA correspond to the addition of an amino acid to a growing protein chain — stop codons signal...
s,
frameshiftA directed change in translational reading frames that allows the production of a single protein from two or more overlapping genes. The process is programmed by the nucleotide sequence of the mRNA and is sometimes also affected by the secondary or tertiary mRNA structure...
s, or a lack of
transcriptionTranscription, or RNA synthesis, is the process of creating an equivalent RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA in the presence of the correct enzymes...
) or their inability to encode RNA (such as with rRNA pseudogenes).
Pseudogenes are defunct relatives of known
geneA gene is the basic unit of heredity in a living organism. All living things depend on genes. Genes hold the information to build and maintain their cells and pass genetic traits to offspring...
s that have lost their
proteinProteins are organic compounds made of amino acids arranged in a linear chain and folded into a globular form. The amino acids in a polymer chain are joined together by the peptide bonds between the carboxyl and amino groups of adjacent amino acid residues...
-coding ability or are otherwise no longer
expressedGene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as rRNA genes or tRNA genes, the product is a functional RNA...
in the cell. Although some do not have
intronAn intron is a DNA region within a gene that is not translated into protein. These non-coding sections are transcribed to precursor mRNA and some other RNAs , and subsequently removed by a process called splicing during the processing to mature RNA. After intron splicing An intron is a DNA region...
s or promoters (these pseudogenes are copied from mRNA and incorporated into the chromosome and are called processed pseudogenes), most have some gene-like features (such as promoters,
CpG islandCpG islands are genomic regions that contain a high frequency of CpG sites. In mammalian genomes, CpG islands are typically 300-3,000 base pairs in length. They are in and near approximately 40% of promoters of mammalian genes...
s, and
splice sitesIn molecular biology, splicing is a modification of an RNA after transcription, in which introns are removed and exons are joined. This is needed for the typical eukaryotic messenger RNA before it can be used to produce a correct protein through translation...
), they are nonetheless considered
nonfunctionalA function is part of an answer to a question about why some object or process occurred in a system that evolved through a process of selection. Thus, function refers forward from the object or process, along some chain of causation, to the goal or success...
, due to their lack of protein-coding ability resulting from various genetic disablements (
stop codonIn the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are unique sequences of amino acids, and most codons in messenger RNA correspond to the addition of an amino acid to a growing protein chain — stop codons signal...
s,
frameshiftA directed change in translational reading frames that allows the production of a single protein from two or more overlapping genes. The process is programmed by the nucleotide sequence of the mRNA and is sometimes also affected by the secondary or tertiary mRNA structure...
s, or a lack of
transcriptionTranscription, or RNA synthesis, is the process of creating an equivalent RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA in the presence of the correct enzymes...
) or their inability to encode RNA (such as with rRNA pseudogenes). Thus the term, coined in 1977 by Jacq,
et al., is composed of the prefix
pseudo, which means
false, and the root
gene, which is the central unit of
molecular geneticsMolecular genetics is the field of biology that studies the structure and function of genes at a molecular level. The field studies how the genes are transferred from generation to generation. Molecular genetics employs the methods of genetics and molecular biology. It is so-called to...
.
Because pseudogenes are generally thought of as the last stop for genomic material that is to be removed from the genome, they are often labeled as
junk DNAIn evolutionary biology and molecular biology, junk DNA is a provisional label for the portions of the DNA sequence of a chromosome or a genome for which no function has been identified....
. Nonetheless, pseudogenes contain fascinating biological and
evolutionIn biology, evolution is change in the genetic material of a population of organisms from one generation to the next. Though changes produced in any one generation are normally small, differences accumulate with each generation and can, over time, cause substantial changes in the population, a...
ary histories within their sequences. This is due to a pseudogene's shared ancestry with a functional gene: in the same way that
DarwinCharles Robert Darwin FRS was an English naturalist who realised and presented compelling evidence that all species of life have evolved over time from common ancestors, through the process he called natural selection...
thought of two species as possibly having a shared common ancestry followed by millions of years of evolutionary divergence (see
speciationSpeciation is the evolutionary process by which new biological species arise. The biologist Orator F. Cook seems to have been the first to coin the term 'speciation' for the splitting of lineages or 'cladogenesis,' as opposed to 'anagenesis' or 'phyletic evolution' occurring within lineages...
), a pseudogene and its associated functional gene also share a common ancestor and have diverged as separate genetic entities over millions of years.
Properties of pseudogenes
Pseudogenes are characterized by a combination of
homologyIn evolutionary biology, homology refers to any similarity between characteristics of organisms that is due to their shared ancestry. The word homologous derives from the ancient Greek ομολογειν, 'to agree'. There are examples in different branches of biology...
to a known gene and
nonfunctionality. That is, although every pseudogene has a
DNADeoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information...
sequence that is similar to some functional gene, they are nonetheless unable to produce functional final products (nonfunctionality). Pseudogenes are quite difficult to identify and characterize in genomes, because the two requirements of homology and nonfunctionality are implied through sequence calculations and alignments rather than biologically proven.
- Homology is implied by sequence identity between the DNA sequences of the pseudogene and parent gene. After aligning
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...
the two sequences, the percentage of identical base pairIn molecular biology, two nucleotides on opposite complementary DNA or RNA strands that are connected via hydrogen bonds are called a base pair . In the canonical Watson-Crick base pairing, adenine forms a base pair with thymine , as does guanine with cytosine in DNA. In RNA, thymine is replaced...
s is computed. A high sequence identity (usually between 40% and close to 100%) means that it is highly likely that these two sequences diverged from a common ancestral sequence (are homologous), and highly unlikely that these two sequences were independently created (see typewriting monkeys).
- Nonfunctionality can manifest itself in many ways. Normally, a gene must go through several steps in going from a genetic DNA sequence to a fully-functional protein: transcription
Transcription, or RNA synthesis, is the process of creating an equivalent RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA in the presence of the correct enzymes...
, pre-mRNA processing, translationTranslation is the first stage of protein biosynthesis . Translation is the production of proteins by decoding mRNA produced in transcription. Translation occurs in the cytoplasm where the ribosomes are located. Ribosomes are made of a small and large subunit which surrounds the mRNA...
, and protein foldingProtein folding is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil....
are all required parts of this process. If any of these steps fails, then the sequence may be considered nonfunctional. In high-throughput pseudogene identification, the most commonly identified disablements are stop codonIn the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are unique sequences of amino acids, and most codons in messenger RNA correspond to the addition of an amino acid to a growing protein chain — stop codons signal...
s and frameshiftsA frameshift mutation is a genetic mutation caused by indels, ie. insertion or deletion of a number of nucleotides that is not evenly divisible by three from a DNA sequence...
, which almost universally prevent the translation of a functional protein product.
- Pseudogenes for RNA genes are often easier to discover. Many RNA genes occur as multiple copy genes, and pseudogenes are identified through sequence identity and location within the region.
Types and origin of pseudogenes
There are three main types of pseudogenes, all with distinct mechanisms of origin and characteristic features. The classifications of pseudogenes are as follows:
- Processed (or retrotransposed) pseudogenes. In higher eukaryotes, particularly mammals, retrotransposition is a fairly common event that has had a huge impact on the composition of the genome. For example, somewhere between 30% - 44% of the human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs. Twenty-two of these are autosomal chromosome pairs, while the remaining pair is sex-determining...
consists of repetitive elements such as SINEs and LINEs (see retrotransposons). In the process of retrotransposition, a portion of the mRNAMessenger ribonucleic acid is a molecule of RNA encoding a chemical "blueprint" for a protein product. mRNA is transcribed from a DNA template, and carries coding information to the sites of protein synthesis: the ribosomes. Here, the nucleic acid polymer is translated into a polymer of amino...
transcript of a gene is spontaneously reverse transcribed back into DNA and inserted into chromosomal DNA. Although retrotransposons usually create copies of themselves, it has been shown in an in vitro system that they can create retrotransposed copies of random genes, too. Once these pseudogenes are inserted back into the genome, they usually contain a poly-A tailPolyadenylation is the addition of a poly tail to an RNA molecule. The poly tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA which only has As. In eukaryotes, polyadenylation is part of the process that produces mature messenger RNA for translation...
, and usually have had their introns spliced outIn molecular biology, splicing is a modification of an RNA after transcription, in which introns are removed and exons are joined. This is needed for the typical eukaryotic messenger RNA before it can be used to produce a correct protein through translation...
; these are both hallmark features of cDNAs. However, because they are derived from a mature mRNA product, processed pseudogenes also lack the upstream promoters of normal genes; thus, they are considered "dead on arrival", becoming non-functional pseudogenes immediately upon the retrotransposition event. A further characteristic of processed pseudogenes is common truncation of the 5' end relative to the parent sequence, which is a result of the relatively non-processive retrotransposition mechanism that creates processed pseudogenes.
- Non-processed (or duplicated) pseudogenes. Gene duplication
Gene duplication is any duplication of a region of DNA that contains a gene; it may occur as an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome.The second copy of the gene is often free from selective pressure — that is, mutations of it have no...
is another common and important process in the evolution of genomes. A copy of a functional gene may arise as a result of a gene duplication event and subsequently acquire mutationIn biology, a mutation is a randomly derived change to the nucleotide sequence of the genetic material of an organism.Mutations can be caused by copying errors in the genetic material during cell division, or by exposure to mutagens , or can be induced by the organism itself, by cellular processes...
s that cause it to become nonfunctional. Duplicated pseudogenes usually have all the same characteristics of genes, including an intact exonAn exon is a nucleic acid sequence that is represented in the mature form of an RNA molecule after either portions of a precursor RNA have been removed by cis-splicing or by two or more precursor RNA molecules have been ligated by trans-splicing. The mature RNA molecule can be a messenger RNA or...
-intronAn intron is a DNA region within a gene that is not translated into protein. These non-coding sections are transcribed to precursor mRNA and some other RNAs , and subsequently removed by a process called splicing during the processing to mature RNA. After intron splicing An intron is a DNA region...
structure and promoter sequences. The loss of a duplicated gene's functionality usually has little effect on an organism's fitnessFitness is a central concept in evolutionary theory. It describes the capability of an individual of certain genotype to reproduce, and usually is equal to the proportion of the individual's genes in all the genes of the next generation...
, since an intact functional copy still exists. According to some evolutionary models, shared duplicated pseudogenes indicate the evolutionary relatedness of humans and the other primates.
- Disabled genes, or unitary pseudogenes. Various mutations can stop a gene from being successfully transcribed or translated, and a gene may become nonfunctional or deactivated if such a mutation becomes fixed in the population. This is the same mechanism by which non-processed genes become deactivated, but the difference in this case is that the gene was not duplicated before becoming disabled. Normally, such gene deactivation would be unlikely to become fixed in a population, but various population effects, such as genetic drift
Genetic drift or allelic drift is the change in the relative frequency with which a gene variant occurs in a population due to random sampling and chance: the alleles in offspring are a random sample of those in the parents, and chance has a role in determining whether a given individual survives...
, a population bottleneckA population bottleneck is an evolutionary event in which a significant percentage of a population or species is killed or otherwise prevented from reproducing....
, or in some cases, natural selectionNatural selection is the process by which heritable traits that make it more likely for an organism to survive and successfully reproduce become more common in a population over successive generations...
, can lead to fixation. The classic example of a unitary pseudogene is the gene that presumably coded the enzyme L-gulono-γ-lactone oxidaseL-gulonolactone oxidase is an enzyme that catalyzes the reaction of D-glucuronolactone with oxygen to L-xylo-hex-3-gulonolactone and hydrogen peroxide. It uses FAD as a cofactor...
(GULO) in primates. In all mammals studied besides primates (except guinea pigs), GULO aids in the biosynthesis of Ascorbic acidAscorbic acid is a sugar acid with antioxidant properties. Its appearance is white to light-yellow crystals or powder, and it is water-soluble. One form of ascorbic acid is commonly known as vitamin C. The name is derived from a- and scorbutus , the disease caused by a deficiency of vitamin C...
(vitamin C), but it exists as a disabled gene (GULOP) in humans and other primates. Another interesting and more recent example of a disabled gene, which links the deactivation of the caspase 12Caspase 12 is an enzyme known as a cysteine protease. It belongs to a family of enzymes called caspases that cleave their substrates at C-terminal aspartic acid residues...
gene (through a nonsense mutationIn genetics, a nonsense mutation is a point mutation in a sequence of DNA that results in a premature stop codon, or a nonsense codon in the transcribed mRNA, and in a truncated, incomplete, and usually nonfunctional protein product. A missense mutation is a point mutation where a single...
) to positive selection in humans.
Pseudogenes can complicate molecular genetic studies. For example, a researcher who wants to amplify a gene by
PCRIn molecular biology, the polymerase chain reaction is a technique to amplify a single or few copies of a piece of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence...
may simultaneously amplify a pseudogene that shares similar sequences. This is known as PCR bias or amplification bias. Similarly, pseudogenes are sometimes annotated as genes in
genomeIn modern molecular biology the genome refers to all of its hereditary information encoded in DNA .The genome includes both the genes and the non-coding sequences of the DNA. The term was adapted in 1920 by Hans Winkler, Professor of Botany at the University of Hamburg, Germany...
sequences.
Processed pseudogenes often pose a problem for
gene predictionGene finding typically refers to the area of computational biology that is concerned with algorithmically identifying stretches of sequence, usually genomic DNA, that are biologically functional. This especially includes protein-coding genes, but may also include other functional elements such as...
programs, often being misidentified as real genes or exons. It has been proposed that identification of processed pseudogenes can help improve the accuracy of gene prediction methods.
It has also been shown that the parent sequences that give rise to processed pseudogenes lose their coding potential faster than those giving rise to non-processed pseudogenes.
Functional pseudogenes?
By definition, pseudogenes lack a function. However, the classification of pseudogenes generally relies on computational analysis of genomic sequences using complex
algorithmIn mathematics, computing, linguistics, and related subjects, an algorithm is an effective method for solving a problem using a finite sequence of instructions. Algorithms are used for calculation, data processing, and many other fields....
s. This has led to the incorrect identification of pseudogenes. For example the functional, chimeric gene
jingwei in
DrosophilaDrosophila is a genus of small flies, belonging to the family Drosophilidae, whose members are often called "fruit flies" or more appropriately pomace flies, vinegar flies, or wine flies, a reference to the characteristic of many species to linger around overripe or rotting fruit...
was once thought to be a processed pseudogene.
It has been established that quite a few pseudogenes can go through the process of
transcriptionTranscription, or RNA synthesis, is the process of creating an equivalent RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA in the presence of the correct enzymes...
, either if their own
promoterIn genetics, a promoter is a region of DNA that facilitates the transcription of a particular gene. Promoters are typically located near the genes they regulate, on the same strand and upstream .-Overview:...
is still intact or in some cases using the promoter of a nearby gene; this expression of pseudogenes also appears to be tissue-specific. In 2003, Hirotsune
et al. identified a retrotransposed pseudogene whose transcript purportedly plays a
trans-regulatory role in the expression of its homologous gene,
Makorin1, and suggested this as a general model under which pseudogenes may play an important biological role. Other researchers have since hypothesized similar roles for other pseudogenes. Hirotsune's report prompted two molecular biologists to carefully review scientific literature on the subject of pseudogenes. To the surprise of many, they found a number of examples in which pseudogenes play a role in gene regulation and expression, forcing Hirotsune's group to rescind their claim that they were the first to identify pseudogene function. Furthermore, the original findings of Hirotsune
et al. concerning
Makorin1 have recently been strongly contested; thus, the possibility that some pseudogenes could have important biological functions was disputed.
Additionally, University of Chicago and University of Cincinnati scientists reported in 2002 that a processed pseudogene called phosphoglycerate mutase 3 actually produces a functional protein.
A 2008 publication in Nature discusses that some endogenous
siRNASírna Sáeglach , son of Dian, son of Demal, son of Rothechtaid mac Main, was, according to medieval Irish legend and historical tradition, a High King of Ireland...
s are derived from pseudogenes, and thus some pseudogenes play a role in regulating protein-coding transcripts.
External links