Computational genomics
Encyclopedia
Computational genomics refers to the use of computational analysis to decipher biology from genome sequences and related data , including both DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 and RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

 sequence as well as other "post-genomic" data (i.e. experimental data obtained with technologies that require the genome sequence, such as genomic DNA microarrays). As such, computational genomics may be regarded as a subset of bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

, but with a focus on using whole genomes (rather than individual genes) to understand the principles of how the DNA of a species controls its biology at the molecular level and beyond. With the current abundance of massive biological datasets, computational studies have become one of the most important means to biological discovery.

History

The roots of computational genomics are shared with those of bioinformatics. During the 1960s, Margaret Dayhoff and others at the National Biomedical Research Foundation assembled databases of homologous protein sequences for evolutionary study. Their research developed a phylogenetic tree
Phylogenetic tree
A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics...

 that determined the evolutionary changes that were required for a particular protein to change into another protein based on the underlying amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

 sequences. This led them to create a scoring matrix that assessed the likelihood of one protein being related to another.

Beginning in the 1980s, databases of genome sequences began to be recorded, but this presented new challenges in the form of searching and comparing the databases of gene information. Unlike text-searching algorithms that are used on websites such as google or Wikipedia, searching for sections of genetic similarity requires one to find strings that are not simply identical, but similar. This led to the development of the Needleman-Wunsch algorithm
Needleman-Wunsch algorithm
The Needleman–Wunsch algorithm performs a global alignment on two sequences . It is commonly used in bioinformatics to align protein or nucleotide sequences. The algorithm was published in 1970 by Saul B. Needleman and Christian D...

, which is a dynamic programming
Dynamic programming
In mathematics and computer science, dynamic programming is a method for solving complex problems by breaking them down into simpler subproblems. It is applicable to problems exhibiting the properties of overlapping subproblems which are only slightly smaller and optimal substructure...

 algorithm for comparing sets of amino acid sequences with each other by using scoring matrices derived from the earlier research by Dayhoff. Later, the BLAST
BLAST
In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences...

 algorithm was developed for performing fast, optimized searches of gene sequence databases. BLAST and its derivatives are probably the most widely-used algorithms for this purpose.

The emergence of the phrase "computational genomics" coincides with the availability of complete sequenced genomes in the mid-to-late 1990s. The first meeting of the Annual Conference on Computational Genomics was organized by scientists from The Institute for Genomic Research
The Institute for Genomic Research
The Institute for Genomic Research was a non-profit genomics research institute founded in 1992 by Craig Venter in Rockville, Maryland, United States. It is now a part of the J. Craig Venter Institute.-History:...

 (TIGR) in 1998, providing a forum for this speciality and effectively distinguishing this area of science from the more general fields of Genomics
Genomics
Genomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...

 or Computational Biology
Computational biology
Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems...

. The first use of this term in scientific literature, according to MEDLINE
MEDLINE
MEDLINE is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care...

 abstracts, was just one year earlier in Nucleic Acids Research
Nucleic Acids Research
Nucleic Acids Research is a peer-reviewed scientific journal published by Oxford University Press. It covers research on nucleic acids, such as DNA and RNA, and related work. Some of its content is available under an open access license. According to the Journal Citation Reports, the journal's 2010...

.. The final Computational Genomics conference was held in 2006, featuring a keynote talk by Nobel Laureate Barry Marshall, co-discoverer of the link between Helicobacter pylori
Helicobacter pylori
Helicobacter pylori , previously named Campylobacter pyloridis, is a Gram-negative, microaerophilic bacterium found in the stomach. It was identified in 1982 by Barry Marshall and Robin Warren, who found that it was present in patients with chronic gastritis and gastric ulcers, conditions that were...

 and stomach ulcers. As of 2010, the leading conferences in the field include Intelligent Systems for Molecular Biology
Intelligent Systems for Molecular Biology
Intelligent Systems for Molecular Biology is a scientific meeting on the subjects of bioinformatics and computational biology organized by the International Society for Computational Biology . Its principal focus is on the development and application of advanced computational methods for...

 (ISMB), RECOMB, and the Cold Spring Harbor Laboratory and Sanger Institute's meetings titled "Biology of Genomes" and "Genome Informatics".

The development of computer-assisted mathematics (using products such as Mathematica
Mathematica
Mathematica is a computational software program used in scientific, engineering, and mathematical fields and other areas of technical computing...

 or Matlab
MATLAB
MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...

) has helped engineers, mathematicians and computer scientists to start operating in this domain, and a public collection
of case studies and demonstrations is growing, ranging from whole genome comparisons to gene expression analysis.. This has increased the introduction of different ideas,
including concepts from systems and control, information theory, strings analysis and data mining.
It is anticipated that computational approaches will become and remain a standard topic for research and teaching, while students fluent in both topics start being formed in the multiple courses created in the past few years.

Contributions of computational genomics research to biology

Contributions of computational genomics research to biology include :
  • discovering subtle patterns in genomic sequences
  • proposing cellular signalling networks
  • proposing mechanisms of genome evolution
  • predict precise locations of all human genes
    Gênes
    Gênes is the name of a département of the First French Empire in present Italy, named after the city of Genoa. It was formed in 1805, when Napoleon Bonaparte occupied the Republic of Genoa. Its capital was Genoa, and it was divided in the arrondissements of Genoa, Bobbio, Novi Ligure, Tortona and...

     using comparative genomics
    Comparative genomics
    Comparative genomics is the study of the relationship of genome structure and function across different biological species or strains. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary...

     techniques with several mammalian and vertebrate species
    Species
    In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...

  • predict conserved genomic regions that are related to early embryonic development
  • discover potential links between repeated sequence motifs and tissue-specific gene expression
    Gene expression
    Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...

  • measure regions of genomes that have undergone unusually rapid evolution

See also

  • Bioinformatics
    Bioinformatics
    Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

  • Biowiki
  • Computational biology
    Computational biology
    Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems...

  • Genomics
    Genomics
    Genomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...

  • Microarray
    Microarray
    A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...

  • BLAST
    BLAST
    In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences...

  • Computational epigenetics
    Computational epigenetics
    Computational epigeneticsuses bioinformatic methods to complement experimental research in epigenetics. Due to the recent explosion of epigenome datasets, computational methods play an increasing role in all areas of epigenetic research.-Definition:...


External links

  • Harvard Extension School Biophysics 101, Genomics and Computational Biology, http://www.courses.fas.harvard.edu/~bphys101/info/syllabus.html
  • University of Bristol course in Computational Genomics, http://www.computational-genomics.net/
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK