Cheminformatics
Encyclopedia
Cheminformatics is the use of computer and informational
Information science
-Introduction:Information science is an interdisciplinary science primarily concerned with the analysis, collection, classification, manipulation, storage, retrieval and dissemination of information...

 techniques, applied to a range of problems in the field of chemistry
Chemistry
Chemistry is the science of matter, especially its chemical reactions, but also its composition, structure and properties. Chemistry is concerned with atoms and their interactions with other atoms, and particularly with the properties of chemical bonds....

. These in silico
In silico
In silico is an expression used to mean "performed on computer or via computer simulation." The phrase was coined in 1989 as an analogy to the Latin phrases in vivo and in vitro which are commonly used in biology and refer to experiments done in living organisms and outside of living organisms,...

techniques are used in pharmaceutical companies in the process of drug discovery
Drug discovery
In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which drugs are discovered or designed.In the past most drugs have been discovered either by identifying the active ingredient from traditional remedies or by serendipitous discovery...

. These methods can also be used in chemical and allied industries in various other forms.

History

The term chemoinformatics was defined by F.K. Brown in 1998:


Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization.


Since then, both spellings have been used, and some have evolved to be established as Cheminformatics, while European Academia settled in 2006 for Chemoinformatics. The recent establishment of the Journal of Cheminformatics is a strong push towards the shorter variant.

Basics

Cheminformatics combines the scientific working fields of chemistry
Chemistry
Chemistry is the science of matter, especially its chemical reactions, but also its composition, structure and properties. Chemistry is concerned with atoms and their interactions with other atoms, and particularly with the properties of chemical bonds....

 and computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

 for example in the area of topology
Topology (chemistry)
In chemistry, topology provides a convenient way of describing and predicting the molecular structure within the constraints of three-dimensional space. Given the determinants of chemical bonding and the chemical properties of the atoms, topology provides a model for explaining how the atoms...

 and chemical graph theory
Chemical graph theory
Chemical graph theory is the topology branch of mathematical chemistry which applies graph theory to mathematical modelling of chemical phenomena....

 and mining the chemical space
Chemical space
Chemical space is the space spanned by all possible molecules and chemical compounds – that is, all stoichiometric combinations of electrons and atomic nuclei, in all possible topology isomers. Chemical reactions allow us to move in chemical space...

.
Cheminformatics can also be applied to data analysis for various industries like paper and pulp, dyes and such allied industries.

Storage and retrieval

The primary application of cheminformatics is in the storage, indexing and search of information relating to compounds. The efficient search of such stored information includes topics that are dealt with in computer science as data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...

, information retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

, information extraction
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...

 and machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

. Related research topics include:
  • Unstructured data
    Unstructured data
    Unstructured Data refers to information that either does not have a pre-defined data model and/or does not fit well into relational tables. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well...

    • Information retrieval
      Information retrieval
      Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

    • Information extraction
      Information extraction
      Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...

  • Structured Data Mining
    Structured data mining
    Structure mining or structured data mining is the process of finding and extracting useful information from semi structured data sets. Graph mining is a special case of structured data mining.-Description:...

     and mining of Structured data
    • Database mining
    • Graph mining
    • Molecule mining
      Molecule mining
      This page describes mining for molecules. Since molecules may be represented by molecular graphs this is strongly related to graph mining and structured data mining. The main problem is how to represent molecules while discriminating the data instances...

    • Sequence mining
      Sequence mining
      Sequence mining is concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed that the values are discrete, and thus Time series mining is closely related, but usually considered a different activity...

    • Tree mining
  • Digital libraries

File formats

The in silico representation of chemical structures uses specialized formats such as the XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

-based Chemical Markup Language
Chemical Markup Language
CML is an approach to managing molecular information using tools such as XML and Java. It was the first domain specific implementation based strictly on XML, first based on a DTD and later on XML Schema, the most robust and widely used system for precise information management in many areas...

 or SMILES
Simplified molecular input line entry specification
The simplified molecular-input line-entry specification or SMILES is a specification in form of a line notation for describing the structure of chemical molecules using short ASCII strings...

. These representations are often used for storage in large chemical database
Chemical database
A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.- Chemical structures :...

s. While some formats are suited for visual representations in 2 or 3 dimensions, others are more suited for studying physical interactions, modeling and docking studies.

Virtual libraries

Chemical data can pertain to real or virtual molecules. Virtual libraries of compounds
may be generated in various ways to explore chemical space and hypothesize novel
compounds with desired properties.

Virtual libraries of classes of compounds (drugs, natural products, diversity-oriented synthetic products) were recently generated using the FOG (fragment optimized growth) algorithm.
This was done by using cheminformatic tools to train transition probabilities of a Markov chain
Markov chain
A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process characterized as memoryless: the next state depends only on the current state and not on the...

 on authentic classes of compounds, and then using the Markov chain to generate novel compounds that were similar to the training database.

Virtual screening

In contrast to high-throughput screening
High-throughput screening
High-throughput screening is a method for scientific experimentation especially used in drug discovery and relevant to the fields of biology and chemistry. Using robotics, data processing and control software, liquid handling devices, and sensitive detectors, High-Throughput Screening allows a...

, virtual screening involves computationally
screening in silico
In silico
In silico is an expression used to mean "performed on computer or via computer simulation." The phrase was coined in 1989 as an analogy to the Latin phrases in vivo and in vitro which are commonly used in biology and refer to experiments done in living organisms and outside of living organisms,...

libraries of compounds, by means of various methods such as
docking, to identify members likely to possess desired properties
such as biological activity against a given target. In some cases, combinatorial chemistry
Combinatorial chemistry
Combinatorial chemistry involves the rapid synthesis or the computer simulation of a large number of different but structurally related molecules or materials...

 is used in the development of the library to increase the efficiency in mining the chemical space. More commonly, a diverse library of small molecules or natural product
Natural product
A natural product is a chemical compound or substance produced by a living organism - found in nature that usually has a pharmacological or biological activity for use in pharmaceutical drug discovery and drug design...

s is screened.

Quantitative structure-activity relationship (QSAR)

This is the calculation of quantitative structure-activity relationship
Quantitative structure-activity relationship
Quantitative structure–activity relationship or QSPR is the process by which chemical structure is quantitatively correlated with a well defined process, such as biological activity or chemical reactivity.For example, biological activity can be expressed quantitatively as the concentration of a...

 and quantitative structure property relationship values, used to predict the activity of compounds from their structures. In this context there is also a strong relationship to Chemometrics
Chemometrics
Chemometrics is the science of extracting information from chemical systems by data-driven means. It is a highly interfacial discipline, using methods frequently employed in core data-analytic disciplines such as multivariate statistics, applied mathematics, and computer science, in order to...

. Chemical expert system
Expert system
In artificial intelligence, an expert system is a computer system that emulates the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning about knowledge, like an expert, and not by following the procedure of a developer as is the case in...

s are also relevant, since they represent parts of chemical knowledge as an in silico
In silico
In silico is an expression used to mean "performed on computer or via computer simulation." The phrase was coined in 1989 as an analogy to the Latin phrases in vivo and in vitro which are commonly used in biology and refer to experiments done in living organisms and outside of living organisms,...

representation.

See also


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK