CpG island
Encyclopedia
In genetics
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

, CpG islands or CG islands are genomic regions that contain a high frequency of CpG site
CpG site
CpG sites or CG sites are regions of DNA where a cytosine nucleotide occurs next to a guanine nucleotide in the linear sequence of bases along its length. "CpG" is shorthand for "—C—phosphate—G—", that is, cytosine and guanine separated by only one phosphate; phosphate links any two nucleosides...

s but to date objective definitions for CpG islands are limited. In mammalian genomes, CpG islands are typically 300-3,000 base pairs in length. They are in and near approximately 40% of promoters of mammal
Mammal
Mammals are members of a class of air-breathing vertebrate animals characterised by the possession of endothermy, hair, three middle ear bones, and mammary glands functional in mothers with young...

ian gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s. About 70% of human promoters have a high CpG content. Given the GC frequency however, the number of CpG dinucleotides is much lower than expected. The "p" in CpG refers to the phosphodiester bond
Phosphodiester bond
A phosphodiester bond is a group of strong covalent bonds between a phosphate group and two 5-carbon ring carbohydrates over two ester bonds. Phosphodiester bonds are central to all known life, as they make up the backbone of each helical strand of DNA...

 between the cytosine
Cytosine
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine . It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached . The nucleoside of cytosine is cytidine...

 and the guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...

, which indicates that the C and the G are next to each other in sequence regardless of being single- or double- stranded. More explicitly, both C and G would be on the same strand of DNA/RNA covalently bonded (chemically connected) by a phosphodiester bond (a strong bond). This differs from the easily confused base-pairing of C and G which share three hydrogen bonds (weaker bond) across two separate strands of DNA (also known as complementary base pairing).

The usual formal definition of a CpG island is a region with at least 200 bp
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

 and with a GC percentage that is greater than 50% and with an observed/expected CpG ratio that is greater than 60%, where the value of expected CpG is calculated by formula (GC content/2)2.
Another recent study revised the rules of CpG island prediction in order to exclude other GC-rich genomic sequences such as Alu repeats
Alu sequence
An Alu element is a short stretch of DNA originally characterized by the action of the Alu restriction endonuclease. Alu elements of different kinds occur in large numbers in primate genomes. In fact, Alu elements are the most abundant mobile elements in the human genome. They are derived from the...

. Based on an extensive search on the complete sequences of human chromosomes 21 and 22, DNA regions greater than 500 bp with a GC content greater than 55% and observed CpG/expected CpG of 0.65 were more likely to be the true CpG islands associated with the 5' regions of genes.

CpG islands are characterized by CpG dinucleotide content of at least 60% of that which would be statistically expected (~4–6%), whereas the rest of the genome has much lower CpG frequency (~1%), a phenomenon called CG suppression
CG suppression
CG suppression is a term for the phenomenon that CG dinucleotides are very uncommon in most portions of vertebrate genomes.In adult somatic tissues, cytosine residues may be methylated, and this occurs almost exclusively within a symmetric CpG context...

. Unlike CpG site
CpG site
CpG sites or CG sites are regions of DNA where a cytosine nucleotide occurs next to a guanine nucleotide in the linear sequence of bases along its length. "CpG" is shorthand for "—C—phosphate—G—", that is, cytosine and guanine separated by only one phosphate; phosphate links any two nucleosides...

s in the coding region
Coding region
The coding region of a gene, also known as the coding sequence or CDS, is that portion of a gene's DNA or RNA, composed of exons, that codes for protein. The region is bounded nearer the 5' end by a start codon and nearer the 3' end with a stop codon...

 of a gene, in most instances, the CpG sites in the CpG islands of promoters are unmethylated if genes are expressed. This observation led to the speculation that methylation
Methylation
In the chemical sciences, methylation denotes the addition of a methyl group to a substrate or the substitution of an atom or group by a methyl group. Methylation is a form of alkylation with, to be specific, a methyl group, rather than a larger carbon chain, replacing a hydrogen atom...

 of CpG sites in the promoter of a gene may inhibit the expression
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...

 of a gene. Methylation is central to imprinting alongside histone
Histone
In biology, histones are highly alkaline proteins found in eukaryotic cell nuclei that package and order the DNA into structural units called nucleosomes. They are the chief protein components of chromatin, acting as spools around which DNA winds, and play a role in gene regulation...

 modifications.

CpG islands typically occur at or near the transcription start site of genes, particularly housekeeping genes, in vertebrates. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the cytosines in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time methylated cytosines tend to turn into thymine
Thymine
Thymine is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nucleobase. As the name suggests, thymine may be derived by methylation of uracil at...

s because of spontaneous deamination
Deamination
Deamination is the removal of an amine group from a molecule. Enzymes which catalyse this reaction are called deaminases.In the human body, deamination takes place primarily in the liver, however glutamate is also deaminated in the kidneys. Deamination is the process by which amino acids are...

. While there is a special enzyme in human (Thymine-DNA glycosylase
Thymine-DNA glycosylase
G/T mismatch-specific thymine DNA glycosylase is an enzyme that in humans is encoded by the TDG gene.-Interactions:Thymine-DNA glycosylase has been shown to interact with Estrogen receptor alpha, SUMO3, CREB-binding protein, Promyelocytic leukemia protein and Small ubiquitin-related modifier...

, or TDG) that specifically replaces T's from T/G mismatches, it is not sufficiently effective to prevent the relatively rapid mutation of the dinucleotides. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some reason, perhaps having to do with the regulation of gene expression.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK