Metagenomics
Encyclopedia
Metagenomics is the study of metagenomes, genetic
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

 material recovered directly from environmental
Natural environment
The natural environment encompasses all living and non-living things occurring naturally on Earth or some region thereof. It is an environment that encompasses the interaction of all living species....

 samples. The broad field may also be referred to as environmental genomics, ecogenomics or community genomics. Traditional microbiology
Microbiology
Microbiology is the study of microorganisms, which are defined as any microscopic organism that comprises either a single cell , cell clusters or no cell at all . This includes eukaryotes, such as fungi and protists, and prokaryotes...

 and microbial genome sequencing rely upon cultivated clonal cultures
Microbiological culture
A microbiological culture, or microbial culture, is a method of multiplying microbial organisms by letting them reproduce in predetermined culture media under controlled laboratory conditions. Microbial cultures are used to determine the type of organism, its abundance in the sample being tested,...

. Metagenomics offers a powerful lens for viewing the microbial world that has the potential to revolutionize understanding of the entire living world.

Early environmental gene sequencing cloned specific genes (often the 16S rRNA
16S ribosomal RNA
16S ribosomal RNA is a component of the 30S subunit of prokaryotic ribosomes. It is approximately 1.5kb in length...

 gene) to produce a profile of diversity in a natural sample. Such work revealed that the vast majority of microbial biodiversity
Biodiversity
Biodiversity is the degree of variation of life forms within a given ecosystem, biome, or an entire planet. Biodiversity is a measure of the health of ecosystems. Biodiversity is in part a function of climate. In terrestrial habitats, tropical regions are typically rich whereas polar regions...

 had been missed by cultivation-based methods.
Recent studies use "shotgun" Sanger sequencing or massively parallel pyrosequencing
Pyrosequencing
Pyrosequencing is a method of DNA sequencing based on the "sequencing by synthesis" principle. It differs from Sanger sequencing, in that it relies on the detection of pyrophosphate release on nucleotide incorporation, rather than chain termination with dideoxynucleotides...

 to get largely unbiased samples of all genes from all the members of the sampled communities.

Origin of the term

The term "metagenomics" was first used by Jo Handelsman
Jo Handelsman
Jo Handelsman is a Howard Hughes Medical Institute professor of molecular, cellular and developmental biology at Yale University. She is editor-in-chief of the academic journal DNA and Cell Biology and author of books on scientific education, most notably Scientific Teaching.-Education:Handelsman...

, Jon Clardy, Robert M. Goodman
Robert M. Goodman
Robert “Bob” M. Goodman is a prominent plant biologist and virologist, and has served as the executive dean of agriculture and natural resources at Rutgers, The State University of New Jersey since June 2005...

, and others, and first appeared in publication in 1998. The term metagenome referenced the idea that a collection of genes sequenced from the environment could be analyzed in a way analogous to the study of a single genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

. The exploding interest in environmental genetics, along with the buzzword-like nature of the term, has resulted in the broader use of metagenomics to describe any sequencing of genetic material from environmental (i.e. uncultured) samples, even work that focuses on one organism or gene. Recently, Kevin Chen and Lior Pachter (researchers at the University of California, Berkeley
University of California, Berkeley
The University of California, Berkeley , is a teaching and research university established in 1868 and located in Berkeley, California, USA...

) defined metagenomics as "the application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species."

Environmental gene surveys

Conventional sequencing
Sequencing
In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer...

 begins with a culture of identical cells as a source of DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

. However, early metagenomic studies revealed that there are probably large groups of microorganisms in many environments that cannot be cultured and thus cannot be sequenced. These early studies focused on 16S ribosomal RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

 sequences which are relatively short, often conserved within a species, and generally different between species. Many 16S rRNA sequences have been found which do not belong to any known cultured species, indicating that there are numerous non-isolated organisms out there.

Early molecular work in the field was conducted by Norman R. Pace
Norman R. Pace
Norman Richard Pace, Jr. is an American biochemist, and is Distinguished Professor of Molecular, Cellular and Developmental Biology at the University of Colorado. He is principal investigator at the Pace lab....

 and colleagues, who used PCR to explore the diversity of ribosomal RNA sequences. The insights gained from these breakthrough studies led Pace to propose the idea of cloning DNA directly from environmental samples as early as 1985. This led to the first report of isolating and cloning bulk DNA from an environmental sample, published by Pace and colleagues in 1991 while Pace was in the Department of Biology at Indiana University. Considerable efforts ensured that these were not PCR false positives and supported the existence of a complex community of unexplored species. Although this methodology was limited to exploring highly conserved, non-protein coding genes, it did support early microbial morphology-based observations that diversity was far more complex than was known by culturing methods.

Soon after that, Healy reported the metagenomic isolation of functional genes from "zoolibraries" constructed from a complex culture of environmental organisms grown in the laboratory on dried grasses in 1995. After leaving the Pace laboratory, Ed DeLong continued in the field and has published work that has largely laid the groundwork for environmental phylogenies based on signature 16S sequences, beginning with his group's construction of libraries from marine samples.

Longer sequences from environmental samples

Recovery of DNA sequences longer than a few thousand base pairs from environmental samples was very difficult until recent advances in molecular biological techniques, particularly related to constructing libraries in bacterial artificial chromosome
Bacterial artificial chromosome
A bacterial artificial chromosome is a DNA construct, based on a functional fertility plasmid , used for transforming and cloning in bacteria, usually E. coli. F-plasmids play a crucial role because they contain partition genes that promote the even distribution of plasmids after bacterial cell...

s (BACs), provided better vectors for molecular cloning.

Shotgun metagenomics

Advances in bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

, refinements of DNA amplification, and proliferation of computational power have greatly aided the analysis of DNA sequences recovered from environmental samples. These advances have enabled the adaptation of shotgun sequencing
Shotgun sequencing
In genetics, shotgun sequencing, also known as shotgun cloning, is a method used for sequencing long DNA strands. It is named by analogy with the rapidly-expanding, quasi-random firing pattern of a shotgun....

 to metagenomic samples. The approach, used to sequence many cultured microorganisms as well as the human genome
Human Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...

, randomly shears DNA, sequences many short sequences, and reconstructs them into a consensus sequence.

In 2002, Mya Breitbart, Forest Rohwer
Forest Rohwer
Forest Rohwer is an American microbial ecologist and Professor of Biology at San Diego State University. His particular interests include coral reef microbial ecology and viruses as both evolutionary agents and opportunistic pathogens in various environments.- Education and career :Rohwer holds...

, and colleagues used environmental shotgun sequencing to show that 200 liters of seawater contains over 5000 different viruses. Subsequent studies showed that there are >1000 viral species in human stool and possibly a million different viruses per kilogram of marine sediment, including many bacteriophages. Essentially all of the viruses in these studies were new species. In 2004, Gene Tyson, Jill Banfield, and colleagues at the University of California, Berkeley
University of California, Berkeley
The University of California, Berkeley , is a teaching and research university established in 1868 and located in Berkeley, California, USA...

 and the Joint Genome Institute
Joint Genome Institute
The U.S. Department of Energy Joint Genome Institute was created in 1997 to unite the expertise and resources in genome mapping, DNA sequencing, technology development, and information sciences pioneered at the DOE genome centers at Lawrence Berkeley National Laboratory , Lawrence Livermore...

 sequenced DNA extracted from an acid mine drainage
Acid mine drainage
Acid mine drainage , or acid rock drainage , refers to the outflow of acidic water from metal mines or coal mines. However, other areas where the earth has been disturbed may also contribute acid rock drainage to the environment...

 system. This effort resulted in the complete, or nearly complete, genomes for a handful of bacteria and archaea
Archaea
The Archaea are a group of single-celled microorganisms. A single individual or species from this domain is called an archaeon...

 that had previously resisted attempts to culture them. It was now possible to study entire genomes without the biases associated with laboratory cultures.

Global Ocean Sampling Expedition

Beginning in 2003, Craig Venter
Craig Venter
John Craig Venter is an American biologist and entrepreneur, most famous for his role in being one of the first to sequence the human genome and for his role in creating the first cell with a synthetic genome in 2010. Venter founded Celera Genomics, The Institute for Genomic Research and the J...

, leader of the privately-funded parallel of the Human Genome Project
Human Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...

, has led the Global Ocean Sampling Expedition
Global Ocean Sampling Expedition
The Global Ocean Sampling Expedition is an ocean exploration genome project with the goal of assessing the genetic diversity in marine microbial communities and to understand their role in nature's fundamental processes. Begun as a Sargasso Sea pilot sampling project in August 2003, Craig Venter...

, circumnavigating the globe and collecting metagenomic samples throughout. All of these samples are sequenced using shotgun sequencing, in hopes that new genomes (and therefore new organisms) would be identified. The pilot project, conducted in the Sargasso Sea
Sargasso Sea
The Sargasso Sea is a region in the middle of the North Atlantic Ocean, surrounded by ocean currents. It is bounded on the west by the Gulf Stream; on the north, by the North Atlantic Current; on the east, by the Canary Current; and on the south, by the North Atlantic Equatorial Current. This...

, found DNA from nearly 2000 different species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...

, including 148 types of bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...

 never before seen. As of 2009, Venter has circumnavigated the globe and thoroughly explored the West Coast of the United States
West Coast of the United States
West Coast or Pacific Coast are terms for the westernmost coastal states of the United States. The term most often refers to the states of California, Oregon, and Washington. Although not part of the contiguous United States, Alaska and Hawaii do border the Pacific Ocean but can't be included in...

, and is currently in the midst of a two-year expedition to explore the Baltic
Baltic Sea
The Baltic Sea is a brackish mediterranean sea located in Northern Europe, from 53°N to 66°N latitude and from 20°E to 26°E longitude. It is bounded by the Scandinavian Peninsula, the mainland of Europe, and the Danish islands. It drains into the Kattegat by way of the Øresund, the Great Belt and...

, Mediterranean
Mediterranean Sea
The Mediterranean Sea is a sea connected to the Atlantic Ocean surrounded by the Mediterranean region and almost completely enclosed by land: on the north by Anatolia and Europe, on the south by North Africa, and on the east by the Levant...

 and Black
Black Sea
The Black Sea is bounded by Europe, Anatolia and the Caucasus and is ultimately connected to the Atlantic Ocean via the Mediterranean and the Aegean seas and various straits. The Bosphorus strait connects it to the Sea of Marmara, and the strait of the Dardanelles connects that sea to the Aegean...

 Seas.

Pyrosequencing

In 2006 Robert Edwards, Forest Rohwer
Forest Rohwer
Forest Rohwer is an American microbial ecologist and Professor of Biology at San Diego State University. His particular interests include coral reef microbial ecology and viruses as both evolutionary agents and opportunistic pathogens in various environments.- Education and career :Rohwer holds...

, and colleagues at San Diego State University
San Diego State University
San Diego State University , founded in 1897 as San Diego Normal School, is the largest and oldest higher education facility in the greater San Diego area , and is part of the California State University system...

 published the first sequences of environmental samples generated with so-called next generation sequencing, in this case chip-based pyrosequencing
Pyrosequencing
Pyrosequencing is a method of DNA sequencing based on the "sequencing by synthesis" principle. It differs from Sanger sequencing, in that it relies on the detection of pyrophosphate release on nucleotide incorporation, rather than chain termination with dideoxynucleotides...

 developed by 454 Life Sciences
454 Life Sciences
454 Life Sciences, is a biotechnology company based in Branford, Connecticut. It is a subsidiary of Roche, and specializes in high-throughput DNA sequencing.-History and Major Achievements:...

. This technique for sequencing DNA generates shorter fragments than conventional techniques, however this limitation is compensated for by the very large number of sequences generated. In addition, this technique does not require cloning the DNA before sequencing, removing one of the main biases in metagenomics.

Software

A major problem with metagenomes is binning. Binning is the process of identifying from what organism a particular sequence has originated. Traditionally, BLAST is a method used to rapidly search for similar sequences in existing public databases. More advanced methods have been employed to bin sequences. Big successes have been achieved for a family of methods using intrinsic features of the sequence, such as oligonucleotide frequencies. These methods include TETRA (Teeling et al., 2004), Phylopythia (McHardy et al., 2007), TACOA (Diaz et al., 2009), PCAHIER (Zheng and Wu, 2010), DiScRIBinATE (Ghosh et al., 2010), SPHINX (Mohammed et al., 2011), and Parallel-META (Su et al., 2011). In 2007, Daniel Huson and Stephan Schuster developed and published the first stand-alone metagenome analysis tool, MEGAN
MEGAN
MEGAN is a computer program that allows optimized analysis of large metagenomic datasets.Metagenomics is the analysis of the DNA and RNA sequences from a usually uncultured environmental sample...

, which can be used to perform a first analysis of a metagenomic shotgun dataset. This tool was originally developed to analyse the metagenome of a mammoth sample. However in a recent study by Monzoorul et al. 2009, it was shown that adopting the LCA approach (of MEGAN) solely based on bit-score of the alignment leads to a number of false positive assignments especially in the context of metagenomic sequences originating from new organisms. This study proposed a new approach called SOrt-ITEMS which used several alignment parameters to increase the accuracy of assignments.

MG-RAST

In 2007, Folker Meyer and Robert Edwards and a team at Argonne National Laboratory and the University of Chicago released the Metagenomics RAST server (MG-RAST) a community resource for metagenome data set analysis. As of October 2011 3.7 Terabases (10^12 bases) of DNA have been analyzed by MG-RAST, more than 4300 public data sets are freely available for comparison within MG-RAST. Over 7000 users now have submitted a total of 38,000 metagenomes to MG-RAST. The server also acts as the de-fact repository for metagenomics data.

Applications

Metagenomics can improve strategies for monitoring the impact of pollutants on ecosystems and for cleaning up contaminated environments. Increased understanding of how microbial communities cope with pollutants is helping assess the potential of contaminated sites to recover from pollution and increase the chances of bioaugmentation or biostimulation trials to succeed.

Recent progress in mining the rich genetic resource of non-culturable microbes has led to the discovery of new genes, enzymes, and natural products. The impact of metagenomics is witnessed in the development of commodity and fine chemicals
Fine chemicals
Fine chemicals are pure, single chemical substances that are commercially produced with chemical reactions for highly specialized applications. Fine chemicals produced can be categorized into active pharmaceutical ingredients and their intermediates, biocides, and specialty chemicals for technical...

, agrochemicals and pharmaceuticals where the benefit of enzyme-catalyzed chiral synthesis
Chiral synthesis
Enantioselective synthesis, also called chiral synthesis, asymmetric synthesis or stereoselective synthesis, is organic synthesis that introduces one or more new and desired elements of chirality...

 is increasingly recognized.

Metagenomic sequencing is being used to characterize the microbial communities from 15-18 body sites from at least 250 individuals. This is part of the Human Microbiome
Microbiome
A microbiome is the totality of microbes, their genetic elements , and environmental interactions in a defined environment. A defined environment could, for example, be the gut of a human being or a soil sample. Thus, microbiome usually includes microbiota and their complete genetic elements...

 initiative with primary goals to determine if there is a core human microbiome, to understand the changes in the human microbiome that can be correlated with human health, and to develop new technological and bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

 tools to support these goals.

It is well known that the vast majority of microbes have not been cultivated. Functional metagenomics strategies are being used to explore the interactions between plants and microbes through cultivation-independent study of the microbial communities.

Finally, metagenomic sequencing is particularly useful in the study of viral communities. As viruses lack a shared universal phylogenetic marker (as are 16S RNA
16S ribosomal RNA
16S ribosomal RNA is a component of the 30S subunit of prokaryotic ribosomes. It is approximately 1.5kb in length...

 for bacteria and archaea, and 18S RNA
18S ribosomal RNA
18S ribosomal RNA is a part of the ribosomal RNA. The S in 18S represents Svedberg units. 18S rRNA is a component of the small eukaryotic ribosomal subunit...

 for eukarya), the only way to access the genetic diversity of the viral community from an environmental sample is through metagenomics. Viral metagenomes (also called viromes) should thus provide more and more information about viral diversity and evolution.

Microbial diversity

Much of the interest in metagenomics comes from the discovery that the vast majority of microorganisms had previously gone unnoticed. Traditional microbiological methods relied upon laboratory cultures of organisms. Surveys of ribosomal RNA (rRNA) genes taken directly from the environment revealed that cultivation based methods find less than 1% of the bacteria and archaea species in a sample.

Gene surveys

Shotgun sequencing and screens of clone libraries reveal genes present in environmental samples. This provides information both on which organisms are present and what metabolic processes are possible in the community. This can be helpful in understanding the ecology of a community, particularly if multiple samples are compared to each other.

Environmental genomes

Shotgun metagenomics also is capable of sequencing nearly complete microbial genomes directly from the environment. Because the collection of DNA from an environment is largely uncontrolled, the most abundant organisms in an environmental sample are most highly represented in the resulting sequence data. To achieve the high coverage needed to fully resolve the genomes of underrepresented community members, large samples, often prohibitively so, are needed. On the other hand, the random nature of shotgun sequencing ensures that many of these organisms will be represented by at least some small sequence segments. Due to the limitations of microbial isolation methods, the vast majority of these organisms would go unnoticed using traditional culturing techniques.

Community metabolism

Many bacterial communities show significant division of labor in metabolism. Waste products of some organisms are metabolites for others. Working together they turn raw resources into fully metabolized waste. Using comparative gene studies and expression experiments with microarray
Microarray
A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...

s or proteomics
Proteomics
Proteomics is the large-scale study of proteins, particularly their structures and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells. The term "proteomics" was first coined in 1997 to make an analogy with...

 researchers can piece together a metabolic network that goes beyond species boundaries. Such studies require detailed knowledge about which versions of which proteins are coded by which species and even by which strains of which species. Therefore, community genomic information is another fundamental tool (with metabolomics and proteomics) in the quest to determine how metabolites are transferred and transformed by a community.

Ancient DNA


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK