Smiles arbitrary target specification
Encyclopedia
SMiles ARbitrary Target Specification (SMARTS) is a language for specifying substructural patterns in molecule
Molecule
A molecule is an electrically neutral group of at least two atoms held together by covalent chemical bonds. Molecules are distinguished from ions by their electrical charge...

s. The SMARTS line notation
Line notation
Line notation is a typographical notation system using ASCII characters, most often used for chemical nomenclature.Line notation is a word and symbol description of an electrochemical cell widely used in chemistry.-Chemistry:...

 is expressive and allows extremely precise and transparent substructural specification and atom typing.

SMARTS is related to the SMILES line notation that is used to encode molecular structures and like SMILES was originally developed by David Weininger
David Weininger
David Weininger is a chemist and entrepreneur. He is a co-founder of Daylight Chemical Information Systems, a company in Santa Fe, New Mexico that does rapid analysis of massive chemical databases. Weininger is the inventor of Simplified Molecular Input Line Entry Specification , a universal...

 and colleagues at Daylight Chemical Information Systems. The most comprehensive descriptions of the SMARTS language can be found in Daylight's SMARTS theory manual, tutorial and examples. OpenEye Scientific Software
OpenEye Scientific Software
OpenEye Scientific Software develops large-scale molecular modeling applications and toolkits.-Scope:Primarily geared towards drug discovery and design, areas of application include structure generation, docking, shape comparison, charge/electrostatics, cheminformatics and visualization...

 has developed their own version of SMARTS which differs from the original Daylight version in how the R descriptor (see cyclicity below) is defined.

Atomic properties

Atoms can be specified by symbol or atomic number. Aliphatic carbon is matched by [C], aromatic carbon by [c] and any carbon by [#6] or [C,c]. The wild card symbols '*', 'A' and 'a' match any atom, any aliphatic atom and any aromatic atom respectively. Implicit hydrogens are considered to be a characteristic of atoms and the SMARTS for an amino group can be written as [NH2]. Charge is specified by the descriptors '+' and '-' as exemplified by the SMARTS [nH+] (protonated aromatic nitrogen atom) and [O-]C(=O)c (deprotonated aromatic carboxylic acid
Carboxylic acid
Carboxylic acids are organic acids characterized by the presence of at least one carboxyl group. The general formula of a carboxylic acid is R-COOH, where R is some monovalent functional group...

).

Bonds

A number of bond types can be specified: '-' (single), '=' (double), '#' (triple), ':' (aromatic) and '~' (any).

Connectivity

The X and D descriptors are used to specify the total numbers of connections (including implicit hydrogen atoms) and connections to explicit atoms. Thus [CX4] matches carbon atoms with bonds to 4 other atoms while [CD4] matches quaternary carbon.

Cyclicity

As originally defined by Daylight, the R descriptor is used to specify ring membership. In the Daylight model for cyclic systems, the smallest set of smallest rings (SSSR) is used as a basis for ring membership. For example indole
Indole
Indole is an aromatic heterocyclic organic compound. It has a bicyclic structure, consisting of a six-membered benzene ring fused to a five-membered nitrogen-containing pyrrole ring. Indole is a popular component of fragrances and the precursor to many pharmaceuticals. Compounds that contain an...

 is perceived as a 5-membered ring fused with a 6-membered ring rather than a 9-membered ring. The two carbon atoms that make up the ring fusion would match [cR2] and the other carbon atoms would match [cR1].

The SSSR model has been criticised by OpenEye who, in their implementation of SMARTS, use R to denote the number of ring bonds for an atom. The two carbon atoms in the ring fusion match [cR3] and the other carbons match [cR2] in the OpenEye implementation of SMARTS. Used without a number, R specifies an atom in a ring in both implementations, for example [CR] (aliphatic carbon atom in ring).

Lower case r specifies the size of the smallest ring of which the atom is a member. The carbon atoms of the ring fusion would both match [cr5]. Bonds can be specified as cyclic, for example C@C matches directly bonded atoms in a ring.

Logical operators

Four logical operators allow atom and bond descriptors to be combined. The 'and' operator ';' can be used to define a protonated primary amine as [N;H3;+][C;X4]. The 'or' operator ',' has a higher priority and [c,n;H] defines (aromatic carbon or aromatic nitrogen) with implicit hydrogen. The 'and' operator '&' is has higher priority than ',' and [c,n&H] defines aromatic carbon or (aromatic nitrogen with implicit hydrogen).

The 'not' operator '!' can be used to define unsaturated aliphatic carbon as [C;!X4] and acyclic bonds as *-!@*.

Recursive SMARTS

Recursive SMARTS allow detailed specification of an atom's environment. For example the more reactive (with respect to electrophilic aromatic substitution
Electrophilic aromatic substitution
Electrophilic aromatic substitution EAS is an organic reaction in which an atom, usually hydrogen, appended to an aromatic system is replaced by an electrophile...

) ortho and para carbon atoms of phenol
Phenol
Phenol, also known as carbolic acid, phenic acid, is an organic compound with the chemical formula C6H5OH. It is a white crystalline solid. The molecule consists of a phenyl , bonded to a hydroxyl group. It is produced on a large scale as a precursor to many materials and useful compounds...

 can be defined as: [$(c1c([OH])cccc1),$(c1ccc([OH])cc1)]

Examples of SMARTS

A number of illustrative examples of SMARTS have been assembled by Daylight.

The definitions of hydrogen bond donors and acceptors used to apply Lipinski's Rule of Five
Lipinski's Rule of Five
Lipinski's Rule of Five is a rule of thumb to evaluate druglikeness or determine if a chemical compound with a certain pharmacological or biological activity has properties that would make it a likely orally active drug in humans. The rule was formulated by Christopher A...

 . are easily coded in SMARTS. Donors are defined as nitrogen or oxygen atoms that have at least one directly bonded hydrogen atom:

[N,n,O;!H0] or [#7,#8;!H0] (aromatic oxygen cannot have a bonded hydrogen)

Acceptors are defined as nitrogen or oxygen:

[N,n,O,o] or [#7,#8]

A simple definition of aliphatic amine
Amine
Amines are organic compounds and functional groups that contain a basic nitrogen atom with a lone pair. Amines are derivatives of ammonia, wherein one or more hydrogen atoms have been replaced by a substituent such as an alkyl or aryl group. Important amines include amino acids, biogenic amines,...

s that are likely to protonate at physiological pH
PH
In chemistry, pH is a measure of the acidity or basicity of an aqueous solution. Pure water is said to be neutral, with a pH close to 7.0 at . Solutions with a pH less than 7 are said to be acidic and solutions with a pH greater than 7 are basic or alkaline...

 can be written as the following recursive SMARTS:

[$([NH2][CX4]),$([NH]([CX4])[CX4]),$[NX3]([CX4])([CX4])[CX4])]

In real applications the CX4 atoms would need to be defined more precisely to prevent matching against electron withdrawing groups such as CF3 that would render the amine insufficiently basic to protonate at physiological pH
PH
In chemistry, pH is a measure of the acidity or basicity of an aqueous solution. Pure water is said to be neutral, with a pH close to 7.0 at . Solutions with a pH less than 7 are said to be acidic and solutions with a pH greater than 7 are basic or alkaline...

.

SMARTS can be used to encode pharmacophore
Pharmacophore
thumb|right|300px|An example of a pharmacophore model.A pharmacophore is an abstract description of molecular features which are necessary for molecular recognition of a ligand by a biological macromolecule....

 elements such as anionic centers. In the following example, recursive SMARTS notation is used to combine acid oxygen and tetrazole nitrogen in a definition of oxygen atoms that are likely to be anionic under normal physiological conditions.

[$([OH][C,S,P]=O),$([nH]1nnnc1)]

The SMARTS above would only match the acid hydroxyl and the tetrazole NH. When a carboxylic acid
Carboxylic acid
Carboxylic acids are organic acids characterized by the presence of at least one carboxyl group. The general formula of a carboxylic acid is R-COOH, where R is some monovalent functional group...

 deprotonates the negative charge is delocalised over both oxygen atoms and it may be desirable to designate both as anionic. This can achieved using the following SMARTS.

[$([OH])C=O),$(O=C[OH])]

Applications of SMARTS

The precise and transparent substructural specification that SMARTS allows has been exploited in a number of applications.

Substructural filters defined in SMARTS have been used to identify undesirable compounds when performing strategic pooling of compounds for high-throughput screening. The REOS (rapid elimination of swill) procedure uses SMARTS to filter out reactive, toxic and otherwise undesirable moieties from databases of chemical structures.

RECAP (Retrosynthetic Combinatorial Analysis Procedure) uses SMARTS to define bond types. RECAP is a molecule editor
Molecule editor
A molecule editor is a computer program for creating and modifying representations of chemical structures.Molecule editors can manipulate chemical structure representations in either two- or three-dimensions. Two-Dimensional editors generate output used as illustrations or for querying chemical...

 which generates fragments of structures by breaking bonds of defined types and the original link points in these are specified using isotopic labels. Searching databases of biologically active compounds for occurrences of fragments allows privileged structural motifs to be identified. The Molecular Slicer is similar to RECAP and has been used to identify fragments that are commonly found in marketed oral drugs.

The Leatherface program is a general purpose molecule editor
Molecule editor
A molecule editor is a computer program for creating and modifying representations of chemical structures.Molecule editors can manipulate chemical structure representations in either two- or three-dimensions. Two-Dimensional editors generate output used as illustrations or for querying chemical...

 which allows automated modification of a number of substructural features of molecules in databases, including protonation state, hydrogen count, formal charge, isotopic weight and bond order. The molecular editing rules used by Leatherface are defined in SMARTS. Leatherface can be used to standardise tautomeric
Tautomer
Tautomers are isomers of organic compounds that readily interconvert by a chemical reaction called tautomerization. This reaction commonly results in the formal migration of a hydrogen atom or proton, accompanied by a switch of a single bond and adjacent double bond...

 and ionization states and to set and enumerate these in preparation of databases for virtual screening
Virtual screening
Virtual screening is a computational technique used in drug discovery research. By using computers, it deals with the quick search of large libraries of chemical structures in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or...

. Leatherface has been used in Matched Molecular Pair Analysis, which enables the effects of structural changes (e.g. substitution of hydrogen with chlorine) to be quantified, over a range of structural types.

ALADDIN, is a pharmacophore
Pharmacophore
thumb|right|300px|An example of a pharmacophore model.A pharmacophore is an abstract description of molecular features which are necessary for molecular recognition of a ligand by a biological macromolecule....

 matching program that uses SMARTS to define recognition points (e.g. neutral hydrogen bond
Hydrogen bond
A hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, such as nitrogen, oxygen or fluorine, that comes from another molecule or chemical group. The hydrogen must be covalently bonded to another electronegative atom to create the bond...

 acceptor) of pharmacophores. A key problem in pharmacophore matching is that functional groups that are likely to be ionised at physiological pH
PH
In chemistry, pH is a measure of the acidity or basicity of an aqueous solution. Pure water is said to be neutral, with a pH close to 7.0 at . Solutions with a pH less than 7 are said to be acidic and solutions with a pH greater than 7 are basic or alkaline...

are typically registered in their neutral forms in structural databases. The ROCS shape matching program allows atom types to be defined using SMARTS.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK