Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Pseudogene

Pseudogene

Overview
Pseudogenes are defunct relatives of known gene
Gene
A gene is the basic unit of heredity in a living organism. All living things depend on genes. Genes hold the information to build and maintain their cells and pass genetic traits to offspring...

s that have lost their protein
Protein
Proteins are organic compounds made of amino acids arranged in a linear chain and folded into a globular form. The amino acids in a polymer chain are joined together by the peptide bonds between the carboxyl and amino groups of adjacent amino acid residues...

-coding ability or are otherwise no longer expressed
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as rRNA genes or tRNA genes, the product is a functional RNA...

 in the cell. Although some do not have intron
Intron
An intron is a DNA region within a gene that is not translated into protein. These non-coding sections are transcribed to precursor mRNA and some other RNAs , and subsequently removed by a process called splicing during the processing to mature RNA. After intron splicing An intron is a DNA region...

s or promoters (these pseudogenes are copied from mRNA and incorporated into the chromosome and are called processed pseudogenes), most have some gene-like features (such as promoters, CpG island
CpG island
CpG islands are genomic regions that contain a high frequency of CpG sites. In mammalian genomes, CpG islands are typically 300-3,000 base pairs in length. They are in and near approximately 40% of promoters of mammalian genes...

s, and splice sites
Splicing (genetics)
In molecular biology, splicing is a modification of an RNA after transcription, in which introns are removed and exons are joined. This is needed for the typical eukaryotic messenger RNA before it can be used to produce a correct protein through translation...

), they are nonetheless considered nonfunctional
Function (biology)
A function is part of an answer to a question about why some object or process occurred in a system that evolved through a process of selection. Thus, function refers forward from the object or process, along some chain of causation, to the goal or success...

, due to their lack of protein-coding ability resulting from various genetic disablements (stop codon
Stop codon
In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are unique sequences of amino acids, and most codons in messenger RNA correspond to the addition of an amino acid to a growing protein chain — stop codons signal...

s, frameshift
Frameshift
A directed change in translational reading frames that allows the production of a single protein from two or more overlapping genes. The process is programmed by the nucleotide sequence of the mRNA and is sometimes also affected by the secondary or tertiary mRNA structure...

s, or a lack of transcription
Transcription (genetics)
Transcription, or RNA synthesis, is the process of creating an equivalent RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA in the presence of the correct enzymes...

) or their inability to encode RNA (such as with rRNA pseudogenes).
Discussion
Ask a question about 'Pseudogene'
Start a new discussion about 'Pseudogene'
Answer questions from other users
Full Discussion Forum
 
Unanswered Questions
Encyclopedia
Pseudogenes are defunct relatives of known gene
Gene
A gene is the basic unit of heredity in a living organism. All living things depend on genes. Genes hold the information to build and maintain their cells and pass genetic traits to offspring...

s that have lost their protein
Protein
Proteins are organic compounds made of amino acids arranged in a linear chain and folded into a globular form. The amino acids in a polymer chain are joined together by the peptide bonds between the carboxyl and amino groups of adjacent amino acid residues...

-coding ability or are otherwise no longer expressed
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as rRNA genes or tRNA genes, the product is a functional RNA...

 in the cell. Although some do not have intron
Intron
An intron is a DNA region within a gene that is not translated into protein. These non-coding sections are transcribed to precursor mRNA and some other RNAs , and subsequently removed by a process called splicing during the processing to mature RNA. After intron splicing An intron is a DNA region...

s or promoters (these pseudogenes are copied from mRNA and incorporated into the chromosome and are called processed pseudogenes), most have some gene-like features (such as promoters, CpG island
CpG island
CpG islands are genomic regions that contain a high frequency of CpG sites. In mammalian genomes, CpG islands are typically 300-3,000 base pairs in length. They are in and near approximately 40% of promoters of mammalian genes...

s, and splice sites
Splicing (genetics)
In molecular biology, splicing is a modification of an RNA after transcription, in which introns are removed and exons are joined. This is needed for the typical eukaryotic messenger RNA before it can be used to produce a correct protein through translation...

), they are nonetheless considered nonfunctional
Function (biology)
A function is part of an answer to a question about why some object or process occurred in a system that evolved through a process of selection. Thus, function refers forward from the object or process, along some chain of causation, to the goal or success...

, due to their lack of protein-coding ability resulting from various genetic disablements (stop codon
Stop codon
In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are unique sequences of amino acids, and most codons in messenger RNA correspond to the addition of an amino acid to a growing protein chain — stop codons signal...

s, frameshift
Frameshift
A directed change in translational reading frames that allows the production of a single protein from two or more overlapping genes. The process is programmed by the nucleotide sequence of the mRNA and is sometimes also affected by the secondary or tertiary mRNA structure...

s, or a lack of transcription
Transcription (genetics)
Transcription, or RNA synthesis, is the process of creating an equivalent RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA in the presence of the correct enzymes...

) or their inability to encode RNA (such as with rRNA pseudogenes). Thus the term, coined in 1977 by Jacq, et al., is composed of the prefix pseudo, which means false, and the root gene, which is the central unit of molecular genetics
Molecular genetics
Molecular genetics is the field of biology that studies the structure and function of genes at a molecular level. The field studies how the genes are transferred from generation to generation. Molecular genetics employs the methods of genetics and molecular biology. It is so-called to...

.

Because pseudogenes are generally thought of as the last stop for genomic material that is to be removed from the genome, they are often labeled as junk DNA
Junk DNA
In evolutionary biology and molecular biology, junk DNA is a provisional label for the portions of the DNA sequence of a chromosome or a genome for which no function has been identified....

. Nonetheless, pseudogenes contain fascinating biological and evolution
Evolution
In biology, evolution is change in the genetic material of a population of organisms from one generation to the next. Though changes produced in any one generation are normally small, differences accumulate with each generation and can, over time, cause substantial changes in the population, a...

ary histories within their sequences. This is due to a pseudogene's shared ancestry with a functional gene: in the same way that Darwin
Charles Darwin
Charles Robert Darwin FRS was an English naturalist who realised and presented compelling evidence that all species of life have evolved over time from common ancestors, through the process he called natural selection...

 thought of two species as possibly having a shared common ancestry followed by millions of years of evolutionary divergence (see speciation
Speciation
Speciation is the evolutionary process by which new biological species arise. The biologist Orator F. Cook seems to have been the first to coin the term 'speciation' for the splitting of lineages or 'cladogenesis,' as opposed to 'anagenesis' or 'phyletic evolution' occurring within lineages...

), a pseudogene and its associated functional gene also share a common ancestor and have diverged as separate genetic entities over millions of years.

Properties of pseudogenes


Pseudogenes are characterized by a combination of homology
Homology (biology)
In evolutionary biology, homology refers to any similarity between characteristics of organisms that is due to their shared ancestry. The word homologous derives from the ancient Greek ομολογειν, 'to agree'. There are examples in different branches of biology...

to a known gene and nonfunctionality. That is, although every pseudogene has a DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information...

 sequence that is similar to some functional gene, they are nonetheless unable to produce functional final products (nonfunctionality). Pseudogenes are quite difficult to identify and characterize in genomes, because the two requirements of homology and nonfunctionality are implied through sequence calculations and alignments rather than biologically proven.
  1. Homology is implied by sequence identity between the DNA sequences of the pseudogene and parent gene. After aligning
    Sequence alignment
    In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...

     the two sequences, the percentage of identical base pair
    Base pair
    In molecular biology, two nucleotides on opposite complementary DNA or RNA strands that are connected via hydrogen bonds are called a base pair . In the canonical Watson-Crick base pairing, adenine forms a base pair with thymine , as does guanine with cytosine in DNA. In RNA, thymine is replaced...

    s is computed. A high sequence identity (usually between 40% and close to 100%) means that it is highly likely that these two sequences diverged from a common ancestral sequence (are homologous), and highly unlikely that these two sequences were independently created (see typewriting monkeys).
  2. Nonfunctionality can manifest itself in many ways. Normally, a gene must go through several steps in going from a genetic DNA sequence to a fully-functional protein: transcription
    Transcription (genetics)
    Transcription, or RNA synthesis, is the process of creating an equivalent RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA in the presence of the correct enzymes...

    , pre-mRNA processing, translation
    Translation (genetics)
    Translation is the first stage of protein biosynthesis . Translation is the production of proteins by decoding mRNA produced in transcription. Translation occurs in the cytoplasm where the ribosomes are located. Ribosomes are made of a small and large subunit which surrounds the mRNA...

    , and protein folding
    Protein folding
    Protein folding is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil....

     are all required parts of this process. If any of these steps fails, then the sequence may be considered nonfunctional. In high-throughput pseudogene identification, the most commonly identified disablements are stop codon
    Stop codon
    In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are unique sequences of amino acids, and most codons in messenger RNA correspond to the addition of an amino acid to a growing protein chain — stop codons signal...

    s and frameshifts
    Frameshift mutation
    A frameshift mutation is a genetic mutation caused by indels, ie. insertion or deletion of a number of nucleotides that is not evenly divisible by three from a DNA sequence...

    , which almost universally prevent the translation of a functional protein product.
  3. Pseudogenes for RNA genes are often easier to discover. Many RNA genes occur as multiple copy genes, and pseudogenes are identified through sequence identity and location within the region.

Types and origin of pseudogenes


There are three main types of pseudogenes, all with distinct mechanisms of origin and characteristic features. The classifications of pseudogenes are as follows:
  1. Processed (or retrotransposed) pseudogenes. In higher eukaryotes, particularly mammals, retrotransposition is a fairly common event that has had a huge impact on the composition of the genome. For example, somewhere between 30% - 44% of the human genome
    Human genome
    The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs. Twenty-two of these are autosomal chromosome pairs, while the remaining pair is sex-determining...

     consists of repetitive elements such as SINEs and LINEs (see retrotransposons). In the process of retrotransposition, a portion of the mRNA
    Messenger RNA
    Messenger ribonucleic acid is a molecule of RNA encoding a chemical "blueprint" for a protein product. mRNA is transcribed from a DNA template, and carries coding information to the sites of protein synthesis: the ribosomes. Here, the nucleic acid polymer is translated into a polymer of amino...

     transcript of a gene is spontaneously reverse transcribed back into DNA and inserted into chromosomal DNA. Although retrotransposons usually create copies of themselves, it has been shown in an in vitro system that they can create retrotransposed copies of random genes, too. Once these pseudogenes are inserted back into the genome, they usually contain a poly-A tail
    Polyadenylation
    Polyadenylation is the addition of a poly tail to an RNA molecule. The poly tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA which only has As. In eukaryotes, polyadenylation is part of the process that produces mature messenger RNA for translation...

    , and usually have had their introns spliced out
    Splicing (genetics)
    In molecular biology, splicing is a modification of an RNA after transcription, in which introns are removed and exons are joined. This is needed for the typical eukaryotic messenger RNA before it can be used to produce a correct protein through translation...

    ; these are both hallmark features of cDNAs. However, because they are derived from a mature mRNA product, processed pseudogenes also lack the upstream promoters of normal genes; thus, they are considered "dead on arrival", becoming non-functional pseudogenes immediately upon the retrotransposition event. A further characteristic of processed pseudogenes is common truncation of the 5' end relative to the parent sequence, which is a result of the relatively non-processive retrotransposition mechanism that creates processed pseudogenes.
  2. Non-processed (or duplicated) pseudogenes. Gene duplication
    Gene duplication
    Gene duplication is any duplication of a region of DNA that contains a gene; it may occur as an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome.The second copy of the gene is often free from selective pressure — that is, mutations of it have no...

     is another common and important process in the evolution of genomes. A copy of a functional gene may arise as a result of a gene duplication event and subsequently acquire mutation
    Mutation
    In biology, a mutation is a randomly derived change to the nucleotide sequence of the genetic material of an organism.Mutations can be caused by copying errors in the genetic material during cell division, or by exposure to mutagens , or can be induced by the organism itself, by cellular processes...

    s that cause it to become nonfunctional. Duplicated pseudogenes usually have all the same characteristics of genes, including an intact exon
    Exon
    An exon is a nucleic acid sequence that is represented in the mature form of an RNA molecule after either portions of a precursor RNA have been removed by cis-splicing or by two or more precursor RNA molecules have been ligated by trans-splicing. The mature RNA molecule can be a messenger RNA or...

    -intron
    Intron
    An intron is a DNA region within a gene that is not translated into protein. These non-coding sections are transcribed to precursor mRNA and some other RNAs , and subsequently removed by a process called splicing during the processing to mature RNA. After intron splicing An intron is a DNA region...

     structure and promoter sequences. The loss of a duplicated gene's functionality usually has little effect on an organism's fitness
    Fitness (biology)
    Fitness is a central concept in evolutionary theory. It describes the capability of an individual of certain genotype to reproduce, and usually is equal to the proportion of the individual's genes in all the genes of the next generation...

    , since an intact functional copy still exists. According to some evolutionary models, shared duplicated pseudogenes indicate the evolutionary relatedness of humans and the other primates.
  3. Disabled genes, or unitary pseudogenes. Various mutations can stop a gene from being successfully transcribed or translated, and a gene may become nonfunctional or deactivated if such a mutation becomes fixed in the population. This is the same mechanism by which non-processed genes become deactivated, but the difference in this case is that the gene was not duplicated before becoming disabled. Normally, such gene deactivation would be unlikely to become fixed in a population, but various population effects, such as genetic drift
    Genetic drift
    Genetic drift or allelic drift is the change in the relative frequency with which a gene variant occurs in a population due to random sampling and chance: the alleles in offspring are a random sample of those in the parents, and chance has a role in determining whether a given individual survives...

    , a population bottleneck
    Population bottleneck
    A population bottleneck is an evolutionary event in which a significant percentage of a population or species is killed or otherwise prevented from reproducing....

    , or in some cases, natural selection
    Natural selection
    Natural selection is the process by which heritable traits that make it more likely for an organism to survive and successfully reproduce become more common in a population over successive generations...

    , can lead to fixation. The classic example of a unitary pseudogene is the gene that presumably coded the enzyme L-gulono-γ-lactone oxidase
    L-gulonolactone oxidase
    L-gulonolactone oxidase is an enzyme that catalyzes the reaction of D-glucuronolactone with oxygen to L-xylo-hex-3-gulonolactone and hydrogen peroxide. It uses FAD as a cofactor...

     (GULO) in primates. In all mammals studied besides primates (except guinea pigs), GULO aids in the biosynthesis of Ascorbic acid
    Ascorbic acid
    Ascorbic acid is a sugar acid with antioxidant properties. Its appearance is white to light-yellow crystals or powder, and it is water-soluble. One form of ascorbic acid is commonly known as vitamin C. The name is derived from a- and scorbutus , the disease caused by a deficiency of vitamin C...

     (vitamin C), but it exists as a disabled gene (GULOP) in humans and other primates. Another interesting and more recent example of a disabled gene, which links the deactivation of the caspase 12
    Caspase 12
    Caspase 12 is an enzyme known as a cysteine protease. It belongs to a family of enzymes called caspases that cleave their substrates at C-terminal aspartic acid residues...

     gene (through a nonsense mutation
    Nonsense mutation
    In genetics, a nonsense mutation is a point mutation in a sequence of DNA that results in a premature stop codon, or a nonsense codon in the transcribed mRNA, and in a truncated, incomplete, and usually nonfunctional protein product. A missense mutation is a point mutation where a single...

    ) to positive selection in humans.


Pseudogenes can complicate molecular genetic studies. For example, a researcher who wants to amplify a gene by PCR
Polymerase chain reaction
In molecular biology, the polymerase chain reaction is a technique to amplify a single or few copies of a piece of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence...

 may simultaneously amplify a pseudogene that shares similar sequences. This is known as PCR bias or amplification bias. Similarly, pseudogenes are sometimes annotated as genes in genome
Genome
In modern molecular biology the genome refers to all of its hereditary information encoded in DNA .The genome includes both the genes and the non-coding sequences of the DNA. The term was adapted in 1920 by Hans Winkler, Professor of Botany at the University of Hamburg, Germany...

 sequences.

Processed pseudogenes often pose a problem for gene prediction
Gene prediction
Gene finding typically refers to the area of computational biology that is concerned with algorithmically identifying stretches of sequence, usually genomic DNA, that are biologically functional. This especially includes protein-coding genes, but may also include other functional elements such as...

 programs, often being misidentified as real genes or exons. It has been proposed that identification of processed pseudogenes can help improve the accuracy of gene prediction methods.

It has also been shown that the parent sequences that give rise to processed pseudogenes lose their coding potential faster than those giving rise to non-processed pseudogenes.

Functional pseudogenes?


By definition, pseudogenes lack a function. However, the classification of pseudogenes generally relies on computational analysis of genomic sequences using complex algorithm
Algorithm
In mathematics, computing, linguistics, and related subjects, an algorithm is an effective method for solving a problem using a finite sequence of instructions. Algorithms are used for calculation, data processing, and many other fields....

s. This has led to the incorrect identification of pseudogenes. For example the functional, chimeric gene jingwei in Drosophila
Drosophila
Drosophila is a genus of small flies, belonging to the family Drosophilidae, whose members are often called "fruit flies" or more appropriately pomace flies, vinegar flies, or wine flies, a reference to the characteristic of many species to linger around overripe or rotting fruit...

was once thought to be a processed pseudogene.

It has been established that quite a few pseudogenes can go through the process of transcription
Transcription (genetics)
Transcription, or RNA synthesis, is the process of creating an equivalent RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA in the presence of the correct enzymes...

, either if their own promoter
Promoter
In genetics, a promoter is a region of DNA that facilitates the transcription of a particular gene. Promoters are typically located near the genes they regulate, on the same strand and upstream .-Overview:...

 is still intact or in some cases using the promoter of a nearby gene; this expression of pseudogenes also appears to be tissue-specific. In 2003, Hirotsune et al. identified a retrotransposed pseudogene whose transcript purportedly plays a trans-regulatory role in the expression of its homologous gene, Makorin1, and suggested this as a general model under which pseudogenes may play an important biological role. Other researchers have since hypothesized similar roles for other pseudogenes. Hirotsune's report prompted two molecular biologists to carefully review scientific literature on the subject of pseudogenes. To the surprise of many, they found a number of examples in which pseudogenes play a role in gene regulation and expression, forcing Hirotsune's group to rescind their claim that they were the first to identify pseudogene function. Furthermore, the original findings of Hirotsune et al. concerning Makorin1 have recently been strongly contested; thus, the possibility that some pseudogenes could have important biological functions was disputed.
Additionally, University of Chicago and University of Cincinnati scientists reported in 2002 that a processed pseudogene called phosphoglycerate mutase 3 actually produces a functional protein.

A 2008 publication in Nature discusses that some endogenous siRNA
Sírna
Sírna Sáeglach , son of Dian, son of Demal, son of Rothechtaid mac Main, was, according to medieval Irish legend and historical tradition, a High King of Ireland...

s are derived from pseudogenes, and thus some pseudogenes play a role in regulating protein-coding transcripts.

External links