Simplified molecular input line entry specification
Encyclopedia
The simplified molecular-input line-entry specification or SMILES is a specification in form of a line notation
Line notation
Line notation is a typographical notation system using ASCII characters, most often used for chemical nomenclature.Line notation is a word and symbol description of an electrochemical cell widely used in chemistry.-Chemistry:...

 for describing the structure of chemical
Chemistry
Chemistry is the science of matter, especially its chemical reactions, but also its composition, structure and properties. Chemistry is concerned with atoms and their interactions with other atoms, and particularly with the properties of chemical bonds....

 molecule
Molecule
A molecule is an electrically neutral group of at least two atoms held together by covalent chemical bonds. Molecules are distinguished from ions by their electrical charge...

s using short ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 strings
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

. SMILES strings can be imported by most molecule editor
Molecule editor
A molecule editor is a computer program for creating and modifying representations of chemical structures.Molecule editors can manipulate chemical structure representations in either two- or three-dimensions. Two-Dimensional editors generate output used as illustrations or for querying chemical...

s for conversion back into two-dimensional drawings or three-dimensional
Dimension
In physics and mathematics, the dimension of a space or object is informally defined as the minimum number of coordinates needed to specify any point within it. Thus a line has a dimension of one because only one coordinate is needed to specify a point on it...

 models of the molecules.

The original SMILES specification was developed by Arthur Weininger and David Weininger
David Weininger
David Weininger is a chemist and entrepreneur. He is a co-founder of Daylight Chemical Information Systems, a company in Santa Fe, New Mexico that does rapid analysis of massive chemical databases. Weininger is the inventor of Simplified Molecular Input Line Entry Specification , a universal...

 in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc. In 2007, an open standard
Open standard
An open standard is a standard that is publicly available and has various rights to use associated with it, and may also have various properties of how it was designed . There is no single definition and interpretations vary with usage....

 called "OpenSMILES" was developed by the Blue Obelisk
Blue Obelisk
Blue Obelisk is an informal group of chemists who promote Open Data, Open Source, and Open Standards; it was initiated by Peter Murray-Rust and others in 2005...

 open-source chemistry community. Other 'linear' notations include the Wiswesser Line Notation
Wiswesser Line Notation
Wiswesser Line Notation, also referred to as WLN, invented by William J. Wiswesser in 1949, was the first line notation capable of precisely describing complex molecules. It was the basis of ICI Ltd's CROSSBOW database system developed in the late 1960's...

 (WLN), ROSDAL and SLN
SYBYL Line Notation
The SYBYL line notation or SLN is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings....

 (Tripos Inc).

In July 2006, the IUPAC
International Union of Pure and Applied Chemistry
The International Union of Pure and Applied Chemistry is an international federation of National Adhering Organizations that represents chemists in individual countries. It is a member of the International Council for Science . The international headquarters of IUPAC is located in Zürich,...

 introduced the InChI
International Chemical Identifier
The IUPAC International Chemical Identifier is a textual identifier for chemical substances, designed to provide a standard and human-readable way to encode molecular information and to facilitate the search for such information in databases and on the web...

 as a standard for formula representation. SMILES is generally considered to have the advantage of being slightly more human-readable than InChI; it also has a wide base of software support with extensive theoretical (e.g., graph theory
Graph theory
In mathematics and computer science, graph theory is the study of graphs, mathematical structures used to model pairwise relations between objects from a certain collection. A "graph" in this context refers to a collection of vertices or 'nodes' and a collection of edges that connect pairs of...

) backing.

Terminology

The term SMILES refers to a line notation for encoding molecular structures and specific instances should strictly be called SMILES strings. However, the term SMILES is also commonly used to refer to both a single SMILES string and a number of SMILES strings; the exact meaning is usually apparent from the context. The terms Canonical and Isomeric can lead to some confusion when applied to SMILES. The terms describe different attributes of SMILES strings and are not mutually exclusive.

Typically, a number of equally valid SMILES can be written for a molecule. For example, CCO, OCC and C(O)C all specify the structure of ethanol
Ethanol
Ethanol, also called ethyl alcohol, pure alcohol, grain alcohol, or drinking alcohol, is a volatile, flammable, colorless liquid. It is a psychoactive drug and one of the oldest recreational drugs. Best known as the type of alcohol found in alcoholic beverages, it is also used in thermometers, as a...

. Algorithms have been developed to ensure the same SMILES is generated for a molecule regardless of the order of atoms in the structure. This SMILES is unique for each structure, although dependent on the canonicalisation algorithm used to generate it, and is termed the Canonical SMILES. These algorithms first convert the SMILES to an internal representation of the molecular structure and do not simply manipulate strings as is sometimes thought. Various algorithms for generating Canonical SMILES have been developed, including those by Daylight Chemical Information Systems, OpenEye Scientific Software, MEDIT, Chemical Computing Group, MolSoft LLC, and the Chemistry Development Kit
Chemistry Development Kit
The Chemistry Development Kit is an open-source Java library for Chemoinformatics and Bioinformatics. It is available for Windows, Unix, and Mac OS...

. A common application of Canonical SMILES is indexing and ensuring uniqueness of molecules in a database
Chemical database
A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.- Chemical structures :...

.

SMILES notation allows the specification of configuration at tetrahedral centers
Molecular configuration
The configuration of a molecule is the permanent geometry that results from the spatial arrangement of its bonds. The ability of the same set of atoms to form two or more molecules with different configurations is stereoisomerism...

, and double bond geometry. These are structural features that cannot be specified by connectivity alone and SMILES which encode this information are termed Isomeric SMILES. A notable feature of these rules is that they allow rigorous partial specification of chirality. The term Isomeric SMILES is also applied to SMILES in which isotope
Isotope
Isotopes are variants of atoms of a particular chemical element, which have differing numbers of neutrons. Atoms of a particular element by definition must contain the same number of protons but may have a distinct number of neutrons which differs from atom to atom, without changing the designation...

s are specified.

Graph-based definition

In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first
Depth-first search
Depth-first search is an algorithm for traversing or searching a tree, tree structure, or graph. One starts at the root and explores as far as possible along each branch before backtracking....

 tree traversal
Tree traversal
In computer science, tree-traversal refers to the process of visiting each node in a tree data structure, exactly once, in a systematic way. Such traversals are classified by the order in which the nodes are visited...

 of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree
Spanning tree (mathematics)
In the mathematical field of graph theory, a spanning tree T of a connected, undirected graph G is a tree composed of all the vertices and some of the edges of G. Informally, a spanning tree of G is a selection of edges of G that form a tree spanning every vertex...

. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree.

Atoms

Atom
Atom
The atom is a basic unit of matter that consists of a dense central nucleus surrounded by a cloud of negatively charged electrons. The atomic nucleus contains a mix of positively charged protons and electrically neutral neutrons...

s are represented by the standard abbreviation of the chemical element
Chemical element
A chemical element is a pure chemical substance consisting of one type of atom distinguished by its atomic number, which is the number of protons in its nucleus. Familiar examples of elements include carbon, oxygen, aluminum, iron, copper, gold, mercury, and lead.As of November 2011, 118 elements...

s, in square brackets, such as [Au] for gold
Gold
Gold is a chemical element with the symbol Au and an atomic number of 79. Gold is a dense, soft, shiny, malleable and ductile metal. Pure gold has a bright yellow color and luster traditionally considered attractive, which it maintains without oxidizing in air or water. Chemically, gold is a...

. Brackets can be omitted for the "organic subset" of B, C, N, O, P, S, F, Cl, Br, and I. All other elements must be enclosed in brackets. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water
Water
Water is a chemical substance with the chemical formula H2O. A water molecule contains one oxygen and two hydrogen atoms connected by covalent bonds. Water is a liquid at ambient conditions, but it often co-exists on Earth with its solid state, ice, and gaseous state . Water also exists in a...

 is simply O.

An atom holding one or more electrical charge(s) is enclosed in brackets (whichever is), followed by the symbol H if it is bonded to one or more atoms of hydrogen (these ones are followed by their number so, except if there is one only: NH4 for ammonium
Ammonium
The ammonium cation is a positively charged polyatomic cation with the chemical formula NH. It is formed by the protonation of ammonia...

), then by the sign '+' for a positive charge or by '-' for an negative charge. Number of charges is specified after the sign (except if there is one only); however, it is also possible write the sign as many as the ion hold of charges: instead of "Ti+4", one can also write "Ti++++" (Titanium
Titanium
Titanium is a chemical element with the symbol Ti and atomic number 22. It has a low density and is a strong, lustrous, corrosion-resistant transition metal with a silver color....

 IV, Ti4+). Thus, the hydroxide
Hydroxide
Hydroxide is a diatomic anion with chemical formula OH−. It consists of an oxygen and a hydrogen atom held together by a covalent bond, and carrying a negative electric charge. It is an important but usually minor constituent of water. It functions as a base, as a ligand, a nucleophile, and a...

 anion is represented by [OH-], the oxonium
Oxonium ion
The oxonium ion in chemistry is any oxygen cation with three bonds. The simplest oxonium ion is the hydronium ion H3O+. Another oxonium ion frequently encountered in organic chemistry is obtained by protonation or alkylation of a carbonyl group e.g...

 cation is [OH3+] and the cobalt
Cobalt
Cobalt is a chemical element with symbol Co and atomic number 27. It is found naturally only in chemically combined form. The free element, produced by reductive smelting, is a hard, lustrous, silver-gray metal....

 III cation (Co3+) is either [Co+3] or [Co+++].

Bonds

Bonds between aliphatic
Aliphatic compound
In organic chemistry, aliphatic compounds are acyclic or cyclic, non-aromatic carbon compounds.Thus, aliphatic compounds are opposite to aromatic compounds.- Structure :...

 atoms are assumed to be single unless specified otherwise and are implied by adjacency in the SMILES. For example the SMILES for ethanol
Ethanol
Ethanol, also called ethyl alcohol, pure alcohol, grain alcohol, or drinking alcohol, is a volatile, flammable, colorless liquid. It is a psychoactive drug and one of the oldest recreational drugs. Best known as the type of alcohol found in alcoholic beverages, it is also used in thermometers, as a...

 can be written as CCO. Ring closure labels are used to indicate connectivity between non-adjacent atoms in the SMILES, which for cyclohexane
Cyclohexane
Cyclohexane is a cycloalkane with the molecular formula C6H12. Cyclohexane is used as a nonpolar solvent for the chemical industry, and also as a raw material for the industrial production of adipic acid and caprolactam, both of which being intermediates used in the production of nylon...

 and dioxane can be written as C1CCCCC1 and O1CCOCC1 respectively. For a second ring, the label will be 2 (naphthalene
Naphthalene
Naphthalene is an organic compound with formula . It is a white crystalline solid with a characteristic odor that is detectable at concentrations as low as 0.08 ppm by mass. As an aromatic hydrocarbon, naphthalene's structure consists of a fused pair of benzene rings...

: c1cccc2c1cccc2), and so on. After 9, the label must be preceded by a '%', in order to differentiate it from two different labels bonded to the same atom (~C12~ will mean the atom of carbon hold the ring closure labels 1 and 2, whereas ~C%12~ will indicate one label only, the 12). Double, triple, and quadruple bonds
Chemical bond
A chemical bond is an attraction between atoms that allows the formation of chemical substances that contain two or more atoms. The bond is caused by the electromagnetic force attraction between opposite charges, either between electrons and nuclei, or as the result of a dipole attraction...

 are represented by the symbols '=', '#', and '$' respectively as illustrated by the SMILES O=C=O (carbon dioxide
Carbon dioxide
Carbon dioxide is a naturally occurring chemical compound composed of two oxygen atoms covalently bonded to a single carbon atom...

,) C#N (hydrogen cyanide,) and [Ga-]$[As+] (gallium arsenide).

Aromaticity

Aromatic
Aromaticity
In organic chemistry, Aromaticity is a chemical property in which a conjugated ring of unsaturated bonds, lone pairs, or empty orbitals exhibit a stabilization stronger than would be expected by the stabilization of conjugation alone. The earliest use of the term was in an article by August...

 C, O, S and N atoms are shown in their lower case 'c', 'o', 's' and 'n' respectively. Benzene
Benzene
Benzene is an organic chemical compound. It is composed of 6 carbon atoms in a ring, with 1 hydrogen atom attached to each carbon atom, with the molecular formula C6H6....

, pyridine
Pyridine
Pyridine is a basic heterocyclic organic compound with the chemical formula C5H5N. It is structurally related to benzene, with one C-H group replaced by a nitrogen atom...

 and furan
Furan
Furan is a heterocyclic organic compound, consisting of a five-membered aromatic ring with four carbon atoms and one oxygen. The class of compounds containing such rings are also referred to as furans....

 can be represented respectively by the SMILES c1ccccc1, n1ccccc1 and o1cccc1. Bonds between aromatic atoms are, by default, aromatic although these can be specified explicitly using the ':' symbol. Aromatic atoms can be singly bonded to each other and biphenyl
Biphenyl
Biphenyl is an organic compound that forms colorless crystals. It has a distinctively pleasant smell. Biphenyl is an aromatic hydrocarbon with a molecular formula 2...

 can be represented by c1ccccc1-c2ccccc2. Aromatic nitrogen bonded to hydrogen, as found in pyrrole
Pyrrole
Pyrrole is a heterocyclic aromatic organic compound, a five-membered ring with the formula C4H4NH. It is a colourless volatile liquid that darkens readily upon exposure to air. Substituted derivatives are also called pyrroles, e.g., N-methylpyrrole, C4H4NCH3...

 must be represented as [nH] and imidazole
Imidazole
Imidazole is an organic compound with the formula C3H4N2. This aromatic heterocyclic is a diazole and is classified as an alkaloid. Imidazole refers to the parent compound, whereas imidazoles are a class of heterocycles with similar ring structure, but varying substituents...

 is written in SMILES notation as n1c[nH]cc1.

The Daylight and OpenEye algorithms for generating canonical SMILES differ in their treatment of aromaticity.

Branching

Branches are described with parentheses, as in CCC(=O)O for propionic acid
Propionic acid
Propanoic acid is a naturally occurring carboxylic acid with chemical formula CH3CH2COOH. It is a clear liquid with a pungent odor...

 and C(F)(F)F for fluoroform
Fluoroform
Fluoroform is the chemical compound with the formula CHF3. It is one of the "haloforms", a class of compounds with the formula CHX3 . Fluoroform is used in diverse niche applications and is produced as a by-product of the manufacture of Teflon...

. Substituted rings can be written with the branching point in the ring as illustrated by the SMILES COc(c1)cccc1C#N (see depiction) and COc(cc1)ccc1C#N (see depiction) which encode the 3 and 4-cyanoanisole isomers. Writing SMILES for substituted rings in this way can make them more human-readable.

Stereochemistry

Configuration around double bonds is specified using the characters "/" and "\". For example, F/C=C/F (see depiction) is one representation of trans-difluoroethene, in which the fluorine atoms are on opposite sides of the double bond, whereas F/C=C\F (see depiction) is one possible representation of cis-difluoroethene, in which the Fs are on the same side of the double bond, as shown in the figure.

Configuration at tetrahedral carbon is specified by @ or @@. L-Alanine, the more common enantiomer
Enantiomer
In chemistry, an enantiomer is one of two stereoisomers that are mirror images of each other that are non-superposable , much as one's left and right hands are the same except for opposite orientation. It can be clearly understood if you try to place your hands one over the other without...

 of the amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...

 alanine
Alanine
Alanine is an α-amino acid with the chemical formula CH3CHCOOH. The L-isomer is one of the 20 amino acids encoded by the genetic code. Its codons are GCU, GCC, GCA, and GCG. It is classified as a nonpolar amino acid...

 can be written as N[C@@H](C)C(=O)O (see depiction). The @@ specifier indicates that, when viewed from nitrogen along the bond to the chiral center, the sequence of substituents hydrogen (H), methyl (C) and carboxylate (C(=O)O) appear clockwise. D-Alanine can be written as N[C@H](C)C(=O)O (see depiction). The order of the substituents in the SMILES string is very important and D-alanine can also be encoded as N[C@@H](C(=O)O)C (see depiction).

Isotopes

Isotopes are specified with a number equal to the integer isotopic mass preceding the atomic symbol. Benzene
Benzene
Benzene is an organic chemical compound. It is composed of 6 carbon atoms in a ring, with 1 hydrogen atom attached to each carbon atom, with the molecular formula C6H6....

 in which one atom is carbon-14
Carbon-14
Carbon-14, 14C, or radiocarbon, is a radioactive isotope of carbon with a nucleus containing 6 protons and 8 neutrons. Its presence in organic materials is the basis of the radiocarbon dating method pioneered by Willard Libby and colleagues , to date archaeological, geological, and hydrogeological...

 is written as [14c]1ccccc1 and deuterochloroform is [2H]C(Cl)(Cl)Cl.

Application on some molecules

Molecule Structure SMILES Formula
Dinitrogen N≡N N#N
Methyl isocyanate
Methyl isocyanate
Methyl isocyanate is an organic compound with the molecular formula CH3NCO. Synonyms are isocyanatomethane, methyl carbylamine, and MIC. Methyl isocyanate is an intermediate chemical in the production of carbamate pesticides . It has also been used in the production of rubbers and adhesives...

 (MIC)
CH3–N=C=O CN=C=O
Copper(II) sulfate Cu2+ SO42- [Cu+2].[O-]S(=O)(=O)[O-]
Œnanthotoxin
Oenanthotoxin
Oenanthotoxin is a toxin extracted from hemlock water dropwort and other plants of the genus Oenanthe. It is a central nervous system poison, and acts as a noncompetitive gamma-aminobutyric acid antagonist. This toxin played some role in euthanasia in ancient Sardinia, for inducing risus...

 (C17H22O2)
CCC[C@@H](O)CC\C=C\C=C\C#CC#C\C=C\CO
Pyrethrin
Pyrethrin
The pyrethrins are a pair of natural organic compounds that have potent insecticidal activity. Pyrethrins are neurotoxins that attack the nervous systems of all insects. When present in amounts not fatal to insects, they still appear to have an insect repellent effect. Pyrethrins are gradually...

 II (C22H28O5)
COC(=O)C(\C)=C\C1C(C)(C)[C@H]1C(=O)O[C@@H]2C(C)=C(C(=O)C2)CC=CC=C
Aflatoxin
Aflatoxin
Aflatoxins are naturally occurring mycotoxins that are produced by many species of Aspergillus, a fungus, the most notable ones being Aspergillus flavus and Aspergillus parasiticus. Aflatoxins are toxic and among the most carcinogenic substances known...

 B1 (C17H12O6)
O1C=C[C@H]([C@H]1O2)c3c2cc(OC)c4c3OC(=O)C5=C4CCC(=O)5
Glucose
Glucose
Glucose is a simple sugar and an important carbohydrate in biology. Cells use it as the primary source of energy and a metabolic intermediate...

 (glucopyranose) (C6H12O6)
OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O)1
Bergenin
Bergenin
Bergenin alias Cuscutin is trihydroxybenzoic acid glycoside. It is the C-glycoside of 4-O-methyl gallic acid. It possesses an O-demethylated derivative called norbergenin. These are chemical compounds and drugs of Ayurveda, commonly known as Paashaanbhed...

 (cuscutin) (a resin
Resin
Resin in the most specific use of the term is a hydrocarbon secretion of many plants, particularly coniferous trees. Resins are valued for their chemical properties and associated uses, such as the production of varnishes, adhesives, and food glazing agents; as an important source of raw materials...

) (C14H16O9)
OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H]2[C@@H]1c3c(O)c(OC)c(O)cc3C(=O)O2
A pheromone
Pheromone
A pheromone is a secreted or excreted chemical factor that triggers a social response in members of the same species. Pheromones are chemicals capable of acting outside the body of the secreting individual to impact the behavior of the receiving individual...

 of the Californian scale insect
Scale insect
The scale insects are small insects of the order Hemiptera, generally classified as the superfamily Coccoidea. There are about 8,000 species of scale insects.-Ecology:...

CC(=O)OCCC(/C)=C\C[C@H](C(C)=C)CCC=C
2S,5R-Chalcogran: a pheromone
Pheromone
A pheromone is a secreted or excreted chemical factor that triggers a social response in members of the same species. Pheromones are chemicals capable of acting outside the body of the secreting individual to impact the behavior of the receiving individual...

 of the bark beetle Pityogenes chalcographus
CC[C@H](O1)CC[C@@]12CCCO2
Vanillin
Vanillin
Vanillin is a phenolic aldehyde, an organic compound with the molecular formula C8H8O3. Its functional groups include aldehyde, ether, and phenol. It is the primary component of the extract of the vanilla bean. It is also found in Leptotes bicolor, roasted coffee and the Chinese red pine...

O=Cc1ccc(O)c(OC)c1
Melatonin
Melatonin
Melatonin , also known chemically as N-acetyl-5-methoxytryptamine, is a naturally occurring compound found in animals, plants, and microbes...

 (C13H16N2O2)
CC(=O)NCCC1=CNc2c1cc(OC)cc2
Flavopereirin (C17H15N2) CCc(c1)ccc2[n+]1ccc3c2Nc4c3cccc4
Nicotine
Nicotine
Nicotine is an alkaloid found in the nightshade family of plants that constitutes approximately 0.6–3.0% of the dry weight of tobacco, with biosynthesis taking place in the roots and accumulation occurring in the leaves...

 (C10H14N2)
CN1CCC[C@H]1c2cccnc2
Alpha-thujone
Thujone
Thujone is a ketone and a monoterpene that occurs naturally in two diastereomeric forms: -α-thujone and -β-thujone. It has a menthol odor. Even though it is best known as a chemical compound in the spirit absinthe, recent tests show absinthe contains only small quantities of thujone, and may or may...

 (C10H16O)
CC(C)[C@@]12C[C@@H]1[C@@H](C)C(=O)C2
Thiamin (C12H17N4OS+)
(vitamin B1)
OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2


Illustration with a molecule with more of 9 rings, the Cephalostatin-1 (a steroidic trisdecacyclic pyrazine
Pyrazine
Pyrazine is a heterocyclic aromatic organic compound with the chemical formula C4H4N2.Pyrazine is a symmetrical molecule with point group D2h. Derivatives like phenazine are well known for their antitumor, antibiotic and diuretic activity. Pyrazine is less basic in nature than pyridine, pyridazine...

 with the empirical formula
Empirical formula
In chemistry, the empirical formula of a chemical compound is the simplest positive integer ratio of atoms of each element present in a compound. An empirical formula makes no reference to isomerism, structure, or absolute number of atoms. The empirical formula is used as standard for most ionic...

 C54H74N2O10 isolated from the Indian Ocean
Indian Ocean
The Indian Ocean is the third largest of the world's oceanic divisions, covering approximately 20% of the water on the Earth's surface. It is bounded on the north by the Indian Subcontinent and Arabian Peninsula ; on the west by eastern Africa; on the east by Indochina, the Sunda Islands, and...

 hemichordate Cephalodiscus gilchristi):
Will give, starting by the left-most methyl group on the figure:

C[C@@](C)(O1)C[C@@H](O)[C@@]1(O2)[C@@H](C)[C@@H]3CC=C4[C@]3(C2)C(=O)C[C@H]5[C@H]4CC[C@@H](C6)[C@]5(C)Cc(n7)c6nc(C[C@@]89(C))c7C[C@@H]8CC[C@@H]%10[C@@H]9C[C@@H](O)[C@@]%11(C)C%10=C[C@H](O%12)[C@]%11(O)[C@H](C)[C@]%12(O%13)[C@H](O)C[C@@]%13(C)CO

(Notice the '%' in front of the index of the ring closure labels upper to 9, see the section "Bonds", higher).

Other examples of SMILES

The SMILES notation is described extensively in the SMILES theory manual provided by Daylight Chemical Information Systems and a number of illustrative examples are presented. Daylight's depict utility provides users with the means to check their own examples of SMILES and is a valuable educational tool.

Extensions

SMARTS
Smiles arbitrary target specification
SMiles ARbitrary Target Specification is a language for specifying substructural patterns in molecules. The SMARTS line notation is expressive and allows extremely precise and transparent substructural specification and atom typing....

 is a line notation for specification of substructural patterns in molecules. While it uses many of the same symbols as SMILES, it also allows specification of wildcard
Wildcard character
-Telecommunication:In telecommunications, a wildcard character is a character that may be substituted for any of a defined subset of all possible characters....

 atoms and bonds, which can be used to define substructural queries for chemical database
Chemical database
A chemical database is a database specifically designed to store chemical information. This information is about chemical and crystal structures, spectra, reactions and syntheses, and thermophysical data.- Chemical structures :...

 searching. One common misconception is that SMARTS-based substructural searching involves matching of SMILES and SMARTS strings. In fact, both SMILES and SMARTS strings are first converted to internal graph representations which are searched for subgraph isomorphism
Isomorphism
In abstract algebra, an isomorphism is a mapping between objects that shows a relationship between two properties or operations.  If there exists an isomorphism between two structures, the two structures are said to be isomorphic.  In a certain sense, isomorphic structures are...

. SMIRKS is a line notation for specifying reaction transforms.

Conversion

SMILES can be converted back to 2-dimensional representations using Structure Diagram Generation algorithms (Helson, 1999). This conversion is not always unambiguous. Conversion to 3-dimensional representation is achieved by energy minimization approaches. There are many downloadable and web-based conversion utilities.

See also

  • SMILES arbitrary target specification
    Smiles arbitrary target specification
    SMiles ARbitrary Target Specification is a language for specifying substructural patterns in molecules. The SMARTS line notation is expressive and allows extremely precise and transparent substructural specification and atom typing....

     SMARTS language for specification of substructural queries.
  • SYBYL Line Notation
    SYBYL Line Notation
    The SYBYL line notation or SLN is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings....

     (another line notation)
  • Molecular Query Language
    Molecular Query Language
    The Molecular Query Language was designed to allow more complex, problem-specific search methods in chemoinformatics....

     – query language
    Query language
    Query languages are computer languages used to make queries into databases and information systems.Broadly, query languages can be classified according to whether they are database query languages or information retrieval query languages...

     allowing also numerical properties, e.g. physicochemical values or distances
  • Chemistry Development Kit
    Chemistry Development Kit
    The Chemistry Development Kit is an open-source Java library for Chemoinformatics and Bioinformatics. It is available for Windows, Unix, and Mac OS...

     (2D layout and conversion)
  • International Chemical Identifier
    International Chemical Identifier
    The IUPAC International Chemical Identifier is a textual identifier for chemical substances, designed to provide a standard and human-readable way to encode molecular information and to facilitate the search for such information in databases and on the web...

     (InChI), the free and open alternative to SMILES by the IUPAC
    International Union of Pure and Applied Chemistry
    The International Union of Pure and Applied Chemistry is an international federation of National Adhering Organizations that represents chemists in individual countries. It is a member of the International Council for Science . The international headquarters of IUPAC is located in Zürich,...

    .
  • OpenBabel
    OpenBabel
    OpenBabel is free software, a chemical expert system mainly used for converting chemical file formats. Due to the strong relationship to informatics this program belongs more to the category cheminformatics than to molecular modelling. It is available for Windows, Unix, and Mac OS...

    , JOELib
    JOELib
    JOELib is a free software chemical expert system mainly used for converting chemical file formats. Because of its strong relationship to informatics, this program belongs more to the category cheminformatics than to molecular modelling. It is available for Windows, Unix and other systems supporting...

    , OELib
    OELib
    OELib was an Open Source Cheminformatics library. Its actual GPLed C++ and Java successors are OpenBabel and JOELib. Its commercial successor is called OEChem.- External links :* *...

     (conversion)

Specifications


SMILES related software utilities

  • NCI/CADD Chemical Identifier Resolver – resolves or generates SMILES from chemical names, CAS Registry Numbers, InChI/InChIKey and many other chemical structure file formats
  • NCI/CADD Online SMILES Translator and Structure File Generator – Java online molecule editor
  • PubChem server side structure editor – online molecule editor
  • smi23d – 3D Coordinate Generation
  • Daylight Depict – Translate a SMILES formula into graphics
  • GIF/PNG-Creator for 2D Plots of Chemical Structures
  • JME molecule editor
  • ACD/ChemSketch
  • Marvin by ChemAxon
    Chemaxon
    ChemAxon is a software company specializing in application programming interfaces and end user applications for cheminformatics and life science research...

     – online chemical editor/viewer and SMILES generator/converter
  • Instant JChem by ChemAxon
    Chemaxon
    ChemAxon is a software company specializing in application programming interfaces and end user applications for cheminformatics and life science research...

     – desktop application for storing/generating/converting/visualizing/searching SMILES structures, particularly batch processing; personal edition free
  • JChem for Excel by ChemAxon
    Chemaxon
    ChemAxon is a software company specializing in application programming interfaces and end user applications for cheminformatics and life science research...

     – MS Excel add-in for storing/generating/converting/visualizing/searching SMILES structures
  • Smormo-Ed – a molecule editor for Linux which can read and write SMILES
  • InChI.info – an unofficial InChI website featuring on-line converter from InChI and SMILES to molecular drawings
  • Balloon – A free program for 3D coordinate generation and conformational analysis.
  • Indigo – an open-source cross-platform cheminformatics library with a plugin for IUPAC-compliant molecule and reaction 2D structural formula rendering.
  • Open Babel – an open-source chemical toolbox allowing anyone to search, convert, analyze, or store biochemical data.
  • Bioclipse – a free and open source workbench for the life sciences
  • MolEngine – A .NET cheminformatics toolkit to read/write SMILES, generate 2D coordinate from SMILES, and convert SMILES from/into other Chemical file format
    Chemical file format
    This article discusses some common molecular file formats, including usage and converting between them.-Distinguishing formats:Chemical information is usually provided as files or streams and many formats have been created, with varying degrees of documentation. The format can be found by three...

    s.
  • JSDraw – A cross-platform javascript chemical structure editor to generate SMILES and SMARTS.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK