Smiles



         


chemicals using ASCII alpha-numeric strings. They were originally invented as a means for allowing the input and output of chemical information on early character based computer terminals. The SMILES specification was developed by David Weininger in the late 1980s. It has since been modified and extended by others and most notably by Daylight Chemical Information Systems Inc. Other 'linear' notations include the Wiswesser Line Notation (WLN), ROSDAL and SLN (Tripos Inc).

[Top]

Graph based definition

In terms of a graph based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree-traversal of a chemical graph. The chemical graph is first trimmed to remove Hydrogen atoms and cycles are broken to make it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Brackets are used to indicate points of branching on the tree.

[Top]

Examples

Atoms are represented by the standard abbreviation of the chemical elements, in square brackets, such as [Au] for gold. Hydroxide anion is [OH-]. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water is simply O and that for ethanol is CCO. The double-bonded carbon dioxide is represented as O=C=O and the triple-bonded hydrogen cyanide as C#N. Cyclohexane is represented as C1CCCCC1, the idea being that the two ones label the same position in the molecule, thus forming a ring with six carbons. Branches are described with parentheses, as in CCC(=O)O for propionic acid and FC(F)F, or alternatively C(F)(F)F, for fluoroform.

[Top]

Extensions

SMARTS is a modifications of SMILES that allows, in addition to the SMILES elements, the specification of wildcard atoms and bonds specifications. This is used in specifying search structures and is widely used in chemical database search applications. This practise has led to a common misconception that chemical substructure search is achieved computationally by matching SMILES/SMARTS strings, when in fact it is achieved by the computationally more intensive search for sub-graph isomorphism in the graphs reconstructed from the SMILES representations.

Since SMILES is generated by tree-traversal, the string can vary depending on the root node chosen as well as the order in which nodes are encountered. A unique or 'canonical' form of the SMILES representation can be generated by applying rules to preprocess the tree before tree-traversal. A common application of unique SMILES is for exact matching of two structures and also for ensuring uniqueness among molecules in a database.

Important enhancements to SMILES include extensions to store information on stereochemistry.

[Top]




  View Live Article   This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License