Chemistry ID

SMILES

Simplified Molecular Input Line Entry System

An open chemical notation that represents the structure of a molecule as an ASCII string. Developed in the late 1980s by David Weininger and now maintained as a public specification, SMILES uses letters for atoms, digits for ring closures, and a small set of symbols for bonds and stereochemistry. SMILES is human-readable, compact, and the de facto input format for cheminformatics software. A SMILES string converts losslessly to and from a chemical structure.

Updated May 2, 2026

The Simplified Molecular Input Line Entry System (SMILES) is an open chemical notation that represents the structure of a molecule as an ASCII string. Developed in the late 1980s by David Weininger at Daylight Chemical Information Systems and now maintained as a public specification through the OpenSMILES project, SMILES uses letters for atoms, digits for ring closures, and a small set of symbols for bond types and stereochemistry. SMILES is human-readable for small molecules, compact, and the de facto input format for cheminformatics software. A SMILES string converts losslessly to and from a chemical structure, and can be canonicalised so that the same molecule always produces the same string.

What a SMILES looks like

Substance	SMILES
Water	`O`
Methane	`C`
Ethanol	`CCO`
Acetic acid	`CC(=O)O`
Benzene	`c1ccccc1`
Caffeine	`CN1C=NC2=C1C(=O)N(C(=O)N2C)C`
Sodium chloride	`[Na+].[Cl-]`
Sulphuric acid	`OS(=O)(=O)O`
Sodium hydroxide	`[Na+].[OH-]`

The notation is compact. Methane is one letter. Water is one letter. Ethanol is three letters. The complexity grows with the molecule but stays human-readable up to ~30-50 atom structures.

SMILES syntax in brief

Element	Syntax
Atoms	Capital letters for organic atoms (`C`, `N`, `O`, `S`, `P`, `F`, `Cl`, `Br`, `I`); lowercase for aromatic atoms (`c`, `n`, `o`); brackets for everything else (`[Na+]`, `[OH-]`, `[Fe+3]`)
Bonds	Single bond is implicit; `=` for double, `#` for triple, `:` for aromatic (also implicit in lowercase atoms)
Branches	Parentheses: `CC(C)C` is isobutane (a methyl branch on the second carbon)
Ring closures	Matching digits: `C1CCCCC1` is cyclohexane (open ring at first 1, close at second 1)
Disconnections	Period: `[Na+].[Cl-]` is sodium chloride as separate ions
Stereochemistry	`@` or `@@` for tetrahedral configuration; `/` and `\` for cis/trans on double bonds
Charges	Inside brackets: `[NH4+]`, `[OH-]`, `[Cu+2]`
Isotopes	Mass number prefix in brackets: `[13C]`, `[2H]`

The full grammar is more nuanced (aromatic rings, charge layers, hydrogen counts, isotopes), but the core syntax is the seven elements above. Most cheminformatics software auto-generates SMILES from a structure, so day-to-day users rarely write SMILES by hand.

Canonical SMILES vs arbitrary SMILES

A given molecule has many valid SMILES strings. Acetic acid can be written as CC(=O)O, OC(=O)C, O=C(O)C, or OC(C)=O, all valid, all the same molecule. For database use, a canonicalisation algorithm produces a unique canonical SMILES per molecule. Canonical SMILES is to SMILES what InChIKey is to InChI: a deterministic single representation per substance.

Different software canonicalisation algorithms produce different canonical strings, so canonical SMILES is software-specific. PubChem’s canonical SMILES is not the same as RDKit’s canonical SMILES. This is the main reason InChIKey is preferred over canonical SMILES for cross-database interchange, the InChI algorithm is a single specification that everyone implements identically.

When SMILES is the right notation

SMILES is the right notation for:

Computational chemistry input. Quantum chemistry software, molecular dynamics, machine-learning models, and most cheminformatics tools accept SMILES as input.
Compact human-readable representation of small molecules in technical documents, R&D notebooks, and patents.
Substructure search queries in chemistry databases. SMARTS (an extension of SMILES) is the dominant query language for “find me all molecules containing this fragment.”
AI-friendly chemistry content, like InChI, SMILES gives an AI engine a deterministic way to identify the molecule. SMILES is more compact and arguably more readable for small molecules.

SMILES is the wrong notation for:

Mixtures and undefined substances. SMILES represents a single defined molecule, not a mixture or a polymer with variable composition.
Customs and commercial documents. No customs authority asks for SMILES on a commercial invoice. CAS number and product name are the standard.
Stereochemistry that is unknown. SMILES can either specify stereochemistry or omit it; it cannot represent “we know this is one of two stereoisomers but do not know which.”
Large macromolecules. A SMILES for a large protein or polymer becomes unreadably long.

SMILES vs InChI vs IUPAC name

The three open structural notations differ in purpose:

Identifier	Strength	Weakness
SMILES	Compact, human-readable for small molecules, fast computational input	Multiple valid forms per molecule unless canonicalised; software-specific canonicalisation
InChI	Single canonical algorithm; InChIKey is hyperlink-friendly; cross-database stable	Long, less human-readable, harder to write by hand
IUPAC name	Systematic and pronounceable; common in scientific literature	Long for complex molecules; multiple valid IUPAC names possible; not machine-readable without parsing

For chemistry content tuned for AI and search engines, including all three identifiers (CAS, InChIKey, SMILES) on a product page is the most thorough approach. Different AI engines and different downstream users prefer different identifiers.

How Chinese factories produce SMILES for export documentation

Most Chinese factories generate SMILES from internal product master data using free software:

The molecular structure is drawn or imported into a chemistry editor (ChemDraw, MarvinSketch, BIOVIA Draw, or open-source tools like ChemAxon Marvin or RDKit-Python).
Canonical SMILES is generated from the structure using the editor’s canonicalisation function.
The SMILES is added to the product master data and propagated to the SDS, the product page, and any data exchange with downstream buyers.

For factories producing pharmaceutical intermediates, fine chemicals, or specialty chemicals where downstream chemistry matters, SMILES is part of the standard product data. For bulk industrial chemical factories, SMILES is rarely included unless specifically requested.

Common SMILES mistakes

Three patterns recur:

Inconsistent canonical SMILES across databases. Two databases listing the same chemical with different canonical SMILES strings are using different canonicalisation algorithms. Confirm by comparing InChIKey instead.
Aromatic notation inconsistency. Benzene can be written c1ccccc1 (Kekule lower-case) or C1=CC=CC=C1 (Kekule upper-case). Different software prefers different forms; both are valid but only one will round-trip cleanly through a given pipeline.
Salt and counterion handling. A salt like sodium acetate can be [Na+].CC(=O)[O-] (separate ions) or CC(=O)O[Na] (covalent representation). The first is correct for ionic species in solution; the second can mislead a reader into expecting a covalent compound.

InChI is the IUPAC-developed open identifier with deterministic InChIKey hash. IUPAC Name is the systematic chemical name. CAS Number is the proprietary registry identifier. EC Number is the EU regulatory identifier. SMILES sits alongside these as a structural notation; the four together cover most chemical-identification use cases.

Reference: https://opensmiles.org/

Free download

Free PDF: the section-by-section MSDS verification template Sourzi runs on every Chinese factory document before it is accepted on a shipment.

Need this on your next shipment?

We handle the documentation chain.

Every chemical we ship from Shanghai or Qingdao goes out with the COA, MSDS, DG declaration, and inspection certificate the destination port will ask for. Send us your spec and we will quote it with the paperwork already mapped.

Request a Quote

SMILES

What a SMILES looks like

SMILES syntax in brief

Canonical SMILES vs arbitrary SMILES

When SMILES is the right notation

SMILES vs InChI vs IUPAC name

How Chinese factories produce SMILES for export documentation

Common SMILES mistakes

Other terms you'll see on the same shipment

InChI, International Chemical Identifier

IUPAC name, IUPAC Name

CAS, CAS Registry Number

EC, EC Number

Other Sourzi references the same buyer reads next

InChI, International Chemical Identifier

IUPAC name, IUPAC Name

CAS, CAS Registry Number

EC, EC Number

GHS Pictograms, GHS Pictograms

We handle the documentation chain.

SMILES

What a SMILES looks like

SMILES syntax in brief

Canonical SMILES vs arbitrary SMILES

When SMILES is the right notation

SMILES vs InChI vs IUPAC name

How Chinese factories produce SMILES for export documentation

Common SMILES mistakes

Related terms

Other terms you'll see on the same shipment

InChI, International Chemical Identifier

IUPAC name, IUPAC Name

CAS, CAS Registry Number

EC, EC Number

Other Sourzi references the same buyer reads next

InChI, International Chemical Identifier

IUPAC name, IUPAC Name

CAS, CAS Registry Number

EC, EC Number

GHS Pictograms, GHS Pictograms

We handle the documentation chain.