Conserved non-coding sequence
Encyclopedia
A conserved non-coding sequence (CNS) is a DNA sequence
DNA sequence
The sequence or primary structure of a nucleic acid is the composition of atoms that make up the nucleic acid and the chemical bonds that bond those atoms. Because nucleic acids, such as DNA and RNA, are unbranched polymers, this specification is equivalent to specifying the sequence of...

 of noncoding DNA
Noncoding DNA
In genetics, noncoding DNA describes components of an organism's DNA sequences that do not encode for protein sequences. In many eukaryotes, a large percentage of an organism's total genome size is noncoding DNA, although the amount of noncoding DNA, and the proportion of coding versus noncoding...

 that is evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

arily conserved
Conserved sequence
In biology, conserved sequences are similar or identical sequences that occur within nucleic acid sequences , protein sequences, protein structures or polymeric carbohydrates across species or within different molecules produced by the same organism...

. These sequences are of interest for their potential to regulate gene production.

CNSs in plants and animals are highly associated with transcription factor
Transcription factor
In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences, thereby controlling the flow of genetic information from DNA to mRNA...

 binding sites and other cis-acting regulatory elements
Cis-regulatory element
A cis-regulatory element or cis-element is a region of DNA or RNA that regulates the expression of genes located on that same molecule of DNA . This term is constructed from the Latin word cis, which means "on the same side as". These cis-regulatory elements are often binding sites for one or...

. Conserved non-coding sequences can be important sites of evolutionary divergence as mutations in these regions may alter the regulation of conserved genes
Conserved sequence
In biology, conserved sequences are similar or identical sequences that occur within nucleic acid sequences , protein sequences, protein structures or polymeric carbohydrates across species or within different molecules produced by the same organism...

, producing species-specific patterns of gene expression
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...

. These features have made them an invaluable resource in comparative genomics
Comparative genomics
Comparative genomics is the study of the relationship of genome structure and function across different biological species or strains. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary...

.

Sources of CNSs

All CNSs are likely to perform some function in order to have constraints on their evolution, but they are distinguished based on where in the genome they are found and how they got there.

Introns

Introns are stretches of sequence found mostly in eukaryotic organisms which interrupt the coding regions of genes, with basepair lengths varying across three orders of magnitude. Intron sequences may be conserved, often because they contain expression regulating elements that put functional constraints on their evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

. Patterns of conserved introns between species of different kingdoms have been used to make inferences about intron density at different points in evolutionary history. This makes them an important resource for understanding the dynamics of intron gain and loss in eukaryotes (1,28).

UTRs

Some of the most highly conserved noncoding regions are found in the untranslated regions (UTRs) at the 3’ end of mature RNA transcripts
Transcription
Transcription may refer to:*Transcription , a business which converts speech into a written or electronic text document*Transcription , software which helps convert speech into text transcript...

, rather than in the introns. This suggests an important function operating at the post-transcriptional
Post-transcriptional regulation
Post-transcriptional regulation is the control of gene expression at the RNA level, therefore between the transcription and the translation of the gene...

 level. If these regions perform an important regulatory function, the increase in 3’-UTR length over evolutionary time suggests that conserved UTRs contribute to organism complexity. Regulatory motifs
Sequence motif
In genetics, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has, or is conjectured to have, a biological significance...

 in UTRs often conserved in genes belonging to the same metabolic family could potentially be used to develop highly specific medicines that target RNA transcripts.

UCRs

Ultraconserved regions are regions over 200 bp in length with 100% identity across species. These unique sequences are mostly found in noncoding regions. It is still not fully understood why the negative selective pressure
Natural selection
Natural selection is the nonrandom process by which biologic traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution....

 on these regions is so much stronger than the selection in protein-coding regions. Though these regions can be seen as unique, the distinction between regions with a high degree of sequence conservation and those with perfect sequence conservation is not necessarily one of biological significance. One study in Nature found that all extremely conserved noncoding sequences have important regulatory functions regardless of whether the conservation is perfect, making the distinction of ultraconservation appear somewhat arbitrary.

Transposable Elements

Repetitive elements can accumulate in an organism’s genome as the result of a few different transposition processes. The extent to which this has taken place during the evolution of eukaryotes varies greatly: repetitive DNA accounts for just 3% of the fly
Drosophila melanogaster
Drosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W...

 genome, but accounts for 50% of the human genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...

.

There are different theories explaining the conservation of transposable elements. One holds that, like pseudogene
Pseudogene
Pseudogenes are dysfunctional relatives of known genes that have lost their protein-coding ability or are otherwise no longer expressed in the cell...

s, they provide a source of new genetic material, allowing for faster adaptation
Adaptation
An adaptation in biology is a trait with a current functional role in the life history of an organism that is maintained and evolved by means of natural selection. An adaptation refers to both the current state of being adapted and to the dynamic evolutionary process that leads to the adaptation....

 to changes in the environment. A simpler alternative is that, because eukaryotic genomes may have no means to prevent the proliferation of transposable elements, they are free to accumulate as long as they are not inserted into or near a gene in such that they would disrupt essential functions. A recent study showed that transposons contribute at least 16% of the eutheria
Eutheria
Eutheria is a group of mammals consisting of placental mammals plus all extinct mammals that are more closely related to living placentals than to living marsupials . They are distinguished from noneutherians by various features of the feet, ankles, jaws and teeth...

n-specific CNSs, marking them as a “major creative force” in the evolution of gene regulation
Regulation of gene expression
Gene modulation redirects here. For information on therapeutic regulation of gene expression, see therapeutic gene modulation.Regulation of gene expression includes the processes that cells and viruses use to regulate the way that the information in genes is turned into gene products...

 in mammal
Mammal
Mammals are members of a class of air-breathing vertebrate animals characterised by the possession of endothermy, hair, three middle ear bones, and mammary glands functional in mothers with young...

s. There are three major classes of transposable elements, distinguished by the mechanisms by which they proliferate.

Classes of Transposable Elements

DNA transposons encode a transposase
Transposase
Transposase is an enzyme that binds to the ends of a transposon and catalyzes the movement of the transposon to another part of the genome by a cut and paste mechanism or a replicative transposition mechanism....

 protein, which is flanked by inverted repeat
Inverted repeat
An inverted repeat is a sequence of nucleotides that is the reversed complement of another sequence further downstream.For example, 5'---GACTGC....GCAGTC---3'. When no nucleotides intervene between the sequence and its downstream complement, it is called a palindrome. Inverted repeats define the...

 sequences. The transposase excises the sequence and reintegrates it elsewhere in the genome. By excising immediately following DNA replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...

 and inserting into target sites which have not yet been replicated, the number of transposons in the genome can increase.

Retrotransposon
Retrotransposon
Retrotransposons are genetic elements that can amplify themselves in a genome and are ubiquitous components of the DNA of many eukaryotic organisms. They are a subclass of transposon. They are particularly abundant in plants, where they are often a principal component of nuclear DNA...

s use reverse transcriptase
Reverse transcriptase
In the fields of molecular biology and biochemistry, a reverse transcriptase, also known as RNA-dependent DNA polymerase, is a DNA polymerase enzyme that transcribes single-stranded RNA into single-stranded DNA. It also helps in the formation of a double helix DNA once the RNA has been reverse...

 to generate a cDNA
Complementary DNA
In genetics, complementary DNA is DNA synthesized from a messenger RNA template in a reaction catalyzed by the enzyme reverse transcriptase and the enzyme DNA polymerase. cDNA is often used to clone eukaryotic genes in prokaryotes...

 from the TE transcript. These are further divided into long terminal repeat
Long terminal repeat
Long terminal repeats are sequences of DNA that repeat hundreds or thousands of times. They are found in retroviral DNA and in retrotransposons, flanking functional genes...

 (LTR) retrotransposons, long interspersed elements (LINEs), and short interspersed nuclear elements (SINEs). In LTR retrotransposons, after the RNA template is degraded, a DNA strand complementary to the reverse-transcribed cDNA returns the element to a double-stranded state. Integrase
Integrase
Retroviral integrase is an enzyme produced by a retrovirus that enables its genetic material to be integrated into the DNA of the infected cell...

, an enzyme encoded by the LTR retrotransposon, then reincorporates the element at a new target site. These elements are flanked by long terminal repeats (300-500bp) which mediate the transposition process.

LINEs use a simpler method in which the cDNA is synthesized
DNA synthesis
DNA synthesis commonly refers to:*DNA replication - DNA biosynthesis *Polymerase chain reaction - enzymatic DNA synthesis *Oligonucleotide synthesis - chemical synthesis of nucleic acids...

 at the target site following cleavage by a LINE-encoded endonuclease
Endonuclease
Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain, in contrast to exonucleases, which cleave phosphodiester bonds at the end of a polynucleotide chain. Typically, a restriction site will be a palindromic sequence four to six nucleotides long. Most...

. LINE-encoded reverse transcriptase is not highly sequence-specific. The incorporation by LINE machinery of unrelated RNA transcripts gives rise to non-functional processed pseudogenes. If a small gene’s promoter
Promoter
In genetics, a promoter is a region of DNA that facilitates the transcription of a particular gene. Promoters are located near the genes they regulate, on the same strand and typically upstream .-Overview:...

 is included in the transcribed portion of the gene, the stable transcript can be duplicated and reinserted into the genome multiple times. The elements produced by this process are called SINEs.

Conserved Regulatory TEs

When these elements are active in a genome, they can introduce new promoter regions, disrupt existing regulatory sites, or, if inserted into transcribed regions, alter splicing patterns
RNA splicing
In molecular biology and genetics, splicing is a modification of an RNA after transcription, in which introns are removed and exons are joined. This is needed for the typical eukaryotic messenger RNA before it can be used to produce a correct protein through translation...

. A particular transposed element will be positively selected for if the altered expression it produces confers an adaptive advantage. This has resulted in some of the conserved regions found in humans. Nearly 25% characterized promoters in humans contain transposed elements. This is of particular interest in light of the fact that most transposable elements humans are no longer active.

Pseudogenes

Pseudogenes are vestiges of once-functional genes disabled by sequence deletions, insertions, or mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...

s. The primary evidence for this process is the presence of fully functioning orthologues to these inactivated sequences in lower-vertebrate genomes. Pseudogenes commonly emerge following a gene duplication
Gene duplication
Gene duplication is any duplication of a region of DNA that contains a gene; it may occur as an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome.The second copy of the gene is often free from selective pressure — that is, mutations of it have no...

 or polyploidization event. With two functional copies of a gene, there is no selective pressure to maintain expressibility of both, leaving one free to accumulate mutations as a nonfunctioning pseudogene. This is the typical case, whereby neutral selection allows pseudogenes to accumulate mutations, serving as “reservoirs” of new genetic material, with potential to be reincorporated into the genome. However, some pseudogenes have been found to be conserved in mammals. The simplest explanation for this is that these noncoding regions may serve some biological function, and this has been found to be the case for a several conserved pseudogenes. Makorin1 mRNA, for example, was found to be stabilized by its paralogous pseudogene, Makorin1-p1, which is conserved in several mouse species. Other pseudogenes have also been found to be conserved between humans and mice and between humans and chimpanzee
Chimpanzee
Chimpanzee, sometimes colloquially chimp, is the common name for the two extant species of ape in the genus Pan. The Congo River forms the boundary between the native habitat of the two species:...

s, originating from duplication events prior to the divergence of the species
Speciation
Speciation is the evolutionary process by which new biological species arise. The biologist Orator F. Cook seems to have been the first to coin the term 'speciation' for the splitting of lineages or 'cladogenesis,' as opposed to 'anagenesis' or 'phyletic evolution' occurring within lineages...

. Evidence of these pseudogenes’ transcription also supports the hypothesis that they have a biological function. Findings of potentially functional pseudogenes creates difficulty in defining them, since the term was originally meant for degenerate sequences with no biological function.

CNSs in Comparative Genomics: Evolutionary Insights

The conservation of both functional and nonfunctional noncoding regions provides an important tool for comparative genomics
Comparative genomics
Comparative genomics is the study of the relationship of genome structure and function across different biological species or strains. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary...

, though conservation of cis-regulatory elements has proven particularly useful.
The presence of CNSs could be due in some cases to a lack of divergence time, though the more common thinking is that they perform functions which place varying degrees of constraint on their evolution. Consistent with this theory, cis-regulatory elements are commonly found in conserved noncoding regions. Thus, sequence similarity is often used as a parameter to limit the search space when trying to identify regulatory elements conserved across species, though this is most useful in analyzing distantly related organisms, since closer relatives have sequence conservation among nonfunctional elements as well.

Orthologues with high sequence similarity may not share the same regulatory elements. These differences may account for different expression patterns across species. Conservation of noncoding sequence is important for the analysis of paralogs within a single species as well. CNSs shared by paralogous clusters of Hox genes are candidates for expression regulating regions, possibly coordinating the similar expression patterns of these genes.

Comparative genomic studies of the promoter regions of orthologous genes can also detect differences in the presence and relative positioning of transcription factor binding sites in promoter regions. Orthologues with high sequence similarity may not share the same regulatory elements. These differences may account for different expression patterns across species .

The regulatory functions commonly associated with conserved non-coding regions are thought to play a role in the evolution of eukaryotic complexity. On average, plants contain fewer CNSs per gene than mammals. This is thought to be related to their having undergone more polyploidization, or genome duplication events. During the subfunctionalization that ensues following gene duplication, there is potential for a greater rate of CNS loss per gene. Thus, genome duplication events may account for the fact that plants have more genes, each with fewer CNSs. Assuming the number of CNSs to be a proxy for regulatory complexity, this may account for the disparity in complexity between plants and mammals .

Because changes in gene regulation are thought to account for most of the differences between humans and chimpanzees, researchers have looked to CNSs to try to show this. A portion of the CNSs between humans and other primates have an enrichment of human-specific single-nucleotide polymorphisms, suggesting positive selection for these SNPs and accelerated evolution of those CNSs. Many of these SNPs are also associated with changes in gene expression, suggesting that these CNSs played an important role in human evolution
Human evolution
Human evolution refers to the evolutionary history of the genus Homo, including the emergence of Homo sapiens as a distinct species and as a unique category of hominids and mammals...

.

Online Bioinformatic Software for Analyzing CNSs

Program Website
Consite http://asp.ii.uib.no:8090/cgi-bin/CONSITE/consite
FootPrinter http://bio.cs.washington.edu/software
GenomeTrafac http://genometrafac.cchmc.org/genome-trafac/index.jsp
rVISTA http://rvista.dcode.org/
Toucan http://homes.esat.kuleuven.be/~saerts/software/toucan.php
Trafac http://trafac.chmcc.org/trafac/index.jsp
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK