Expressed sequence tag
Encyclopedia
An expressed sequence tag or EST is a short sub-sequence of a cDNA sequence. They may be used to identify gene transcripts
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

, and are instrumental in gene discovery and gene sequence determination. The identification of ESTs has proceeded rapidly, with approximately 65.9 million ESTs now available in public databases (e.g. GenBank
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence...

 18 June 2010, all species).

An EST results from one-shot sequencing
Sequencing
In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer...

 of a cloned mRNA (i.e. several hundred base pairs of sequence starting from an end of a cDNA). The cDNAs used for EST generation are typically individual clones from a cDNA library
CDNA library
A cDNA library is a combination of cloned cDNA fragments inserted into a collection of host cells, which together constitute some portion of the transcriptome of the organism. cDNA is produced from fully transcribed mRNA found in the nucleus and therefore contains only the expressed genes of an...

. The resulting sequence is a relatively low quality fragment whose length is limited by current technology to approximately 500 to 800 nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

s. Because these clones consist of DNA that is complementary to mRNA, the ESTs represent portions of expressed genes. They may be represented in databases as either cDNA/mRNA sequence or as the reverse complement of the mRNA, the template strand.

ESTs can be mapped to specific chromosome locations using physical mapping
Gene mapping
Gene mapping, also called genome mapping, is the creation of a genetic map assigning DNA fragments to chromosomes.When a genome is first investigated, this map is nonexistent. The map improves with the scientific progress and is perfect when the genomic DNA sequencing of the species has been...

 techniques, such as radiation hybrid mapping
Radiation hybrid mapping
Radiation hybrid mapping is a technique for mapping mammalian chromosomes.Radiation hybrid mapping uses X-ray breakage of chromosomes to determine the distances between DNA markers, as well as their order on the chromosome. In addition, the method allows the relative likelihoods of alternative...

, Happy mapping
Happy mapping
HAPPY Mapping, by Paul H. Dear and Peter R. Cook in 1989, is a method used in molecular biology to study the linkage between two or more DNA sequences. According to the , it is "Mapping based on the analysis of approximately HAPloid DNA samples using the PolYmerase chain reaction"...

, or FISH
Fluorescent in situ hybridization
FISH is a cytogenetic technique developed by biomedical researchers in the early 1980s that is used to detect and localize the presence or absence of specific DNA sequences on chromosomes. FISH uses fluorescent probes that bind to only those parts of the chromosome with which they show a high...

. Alternatively, if the genome of the organism that originated the EST has been sequenced, one can align the EST sequence to that genome using a computer.

The current understanding of the human set of genes
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...

  includes the existence of thousands of genes based solely on EST evidence. In this respect, ESTs have become a tool to refine the predicted transcripts for those genes, which leads to the prediction of their protein products and ultimately their function. Moreover, the situation in which those ESTs are obtained (tissue, organ, disease state - e.g. cancer
Cancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...

) gives information on the conditions in which the corresponding gene is acting. ESTs contain enough information to permit the design of precise probes for DNA microarray
DNA microarray
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

s that then can be used to determine the gene expression
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...

.

Some authors use the term "EST" to describe genes for which little or no further information exists besides the tag.

The significance of ESTs, their properties, methods to analyze EST dataset and their applications in different areas of biology have been reviewed by Nagaraj et al. (2007).

dbEST

dbEST is a division of Genbank established in 1992. As for GenBank
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence...

, data in dbEST is directly submitted by laboratories worldwide and is not curated.

EST contigs

Because of the way ESTs are sequenced, many distinct expressed sequence tags are often partial sequences that correspond to the same mRNA of an organism. In an effort to reduce the number of expressed sequence tags for downstream gene discovery analyses, several groups assembled expressed sequence tags into EST contig
Contig
A contig is a set of overlapping DNA segments that together represent a consensus region of DNA. In bottom-up sequencing projects, a contig refers to overlapping sequence data ; in top-down sequencing projects, contig refers to the overlapping clones that form a physical map of the genome that is...

s. Example of resources that provide EST contigs include:
  • TIGR gene indices
  • Unigene
  • STACK


Constructing EST contigs is not trivial and may yield artifacts (contigs that contain two distinct gene products). When the complete genome sequence of an organism is available and transcripts are annotated, it is possible to bypass contig assembly and directly match transcripts with ESTs. This approach is used in the TissueInfo system (see below) and makes it easy to link annotations in the genomic database to tissue information provided by EST data.

Tissue information

High-throughput analyses of ESTs often encounter similar data management challenges. A first challenge is that tissue provenance of EST libraries is described in plain English in dbEST. This makes it difficult to write programs that can non ambiguously determine that two EST libraries were sequenced from the same tissue. Similarly, disease conditions for the tissue are not annotated in a computationally friendly manner. For instance, cancer origin of a library is often mixed with the tissue name (e.g., the tissue name "glioblastoma" indicates that the EST library was sequenced from brain tissue and the disease condition is cancer). With the notable exception of cancer, the disease condition is often not recorded in dbEST entries. The TissueInfo project was started in 2000 to help with these challenges. The project provides curated data (updated daily) to disambiguate tissue origin and disease state (cancer/non cancer), offers a tissue ontology that links tissues and organs by "is part of" relationships (i.e., formalizes knowledge that hypothalamus is part of brain, and that brain is part of the central nervous system) and distributes open-source software for linking transcript annotations from sequenced genomes to tissue expression profiles calculated with data in dbEST.

See also

  • gene expression
    Gene expression
    Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...

  • complementary DNA
    Complementary DNA
    In genetics, complementary DNA is DNA synthesized from a messenger RNA template in a reaction catalyzed by the enzyme reverse transcriptase and the enzyme DNA polymerase. cDNA is often used to clone eukaryotic genes in prokaryotes...

     (cDNA)
  • IMAGE cDNA clones
    IMAGE cDNA clones
    IMAGE cDNA clones are a collection of DNA vectors containing cDNAs from various organisms including human, mouse, rat, non-human primates, zebrafish, pufferfish, Xenopus , and cow. Together they represent a more or less complete set of expressed genes from these organisms...

  • Whole genome sequencing (WGS)

External links

  • ESTs Factsheet from NCBI, a good and easy to read introduction to ESTs.
  • The NCBI Handbook, Part 3, Chapter 21 has a very nice overview.
  • ECLAT a server for the classification of ESTs from mixed EST pools (from fungus infected plants) using codon usage.
  • http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.htmlThe current number of EST sequences in the GenBank
    GenBank
    The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information as part of the International Nucleotide Sequence...

    division dbEST].
  • Web Resources for EST data and analysis
  • http://icb.med.cornell.edu/crt/tissueinfo/ TissueInfo project: Curated EST tissue provenance, tissue ontology, open-source software.
  • http://www.estinformatics.org/ Web resource contains contains all publicly available ESTs which has been processed through various cleaning steps where contaminating DNA e.g. vector, E coli and short sequences (<100bp) removed.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK