cmiles package

Subpackages

Submodules

cmiles.generator module

Generate canonical identifiers for chemical databases, specifically quantum chemical data.

cmiles.generator.get_inchi_and_key(molecule)[source]

Generate inchi and inchikey. Uses RDKit which uses the inchi API

Parameters:
molecule: rdkit.Chem.Mol

If an oechem.OEMol is provided, will convert it to an rdkit.Chem.Mol

Returns:
tuple (inchi, inchi_key)
cmiles.generator.get_iupac(molecule)[source]

Generate IUPAC name

Parameters:
molecule :

oechem.OEMol

Returns:
str:

iupac name

Notes

Will only be generated if has openeye license

cmiles.generator.get_molecule_ids(molecule_input, toolkit='openeye', strict=True, **kwargs)[source]

Generate a dictionary of canonical identifiers

The molecule_input can be either a JSON serialised molecule (see QCSchema) or an isomeric SMILES with all steroechemistry defined.

Required fields for the QCSchema molecule:

  1. symbols
  2. geometry
  3. connectivity
Parameters:
molecule_input: dict or str

A JSON serialized QC molecule or an isomeric SMILES

toolkit: str, optional, default ‘openeye’

toolkit to use for canonicalization. Currently supports openeye and rdkit

strict: bool, optional. Default True

If true, will raise an exception if SMILES is missing explicit H.

**permute_xyz: bool, optional, default False

Only use if input molecule is in QCSchema format.

If True, the geometry will be permuted to reflect the canononical atom order in the mapped SMILES. get_molecule_ids will return the permuted QCSchema. cmiles identifiers will be in the identifiers field

If False, the map indices in the mapped SMILES will reflect the order of the atoms in the input QCSchema.

Returns:
dict

If permute_xyz=True, will return permuted qcschema with cmiles identifiers in identifiers field.

cmiles.generator.get_unique_protomer(molecule)[source]

Generate unique protomer for all tuatomers and charge states of the moelcule.

Requires openeye license

Parameters:
molecule: oechem.OEMol

Will convert rdkit.Chem.Mol to oechem.OEMol if openeye is installed and license is valid

Returns:
str

unique protomer

cmiles.generator.standardize_tautomer(iso_can_smi)[source]

Standardize tautomer to one universal tautomer.

Parameters:
iso_can_smi: str

isomeric SMILES

Returns:
str:

standardized tautomer

Notes

Does not standardize for ionization states. In some cases preforms better than oequacpac.OEGetUniqueProtomer. See notebook

cmiles.utils module

Utility functions for cmiles generator

cmiles.utils.add_explicit_hydrogen(molecule)[source]

Add explicit hydrogen to molecule

Parameters:
molecule:

oechem.OEMol or rdkit.Chem.Mol

Returns:
molecule

oechem.OEMol or rdkit.Chem.Mol with explict hydrogen

cmiles.utils.get_atom_map(molecule, mapped_smiles, **kwargs)[source]

Get mapping of map index -> atom index

Parameters:
molecule:

oechem.OEMol or rdkit.Chem.Mol

mapped_smiles: str

explicit hydrogen mapped SMILES

Returns:
atom_map: dict

dictionary mapping {map_index: atom_index}

cmiles.utils.get_charge(molecule)[source]

Get charge state of molecule

Parameters:
molecule:

oechem.OEMol or rdkit.Chem.Mol

Returns:
int

total charge of molecule

cmiles.utils.get_connectivity_table(molecule, atom_map)[source]

Generate connectivity table

Parameters:
molecule:

oechem.Mol or rdkit.Chem.Mol

atom_map: dict

{map_idx : atom_idx}

Returns:
list: list of lists

lists of atoms bonded and the bond order [[map_idx_1, map_idx_2, bond_order] …]

cmiles.utils.has_atom_map(molecule)[source]

Check if molecule has atom map indices. Will return True even if only one atom has map index

Parameters:
molecule:

oechem.Mol or rdkit.Chem.Mol

Returns:
bool

True if has one map index. False if molecule has no map indices

cmiles.utils.has_explicit_hydrogen(molecule)[source]

Check if molecule has explicit hydrogen.

Parameters:
molecule:

oechem.OEMol or rdkit.Chem.Mol

Returns:
bool

True if has all explicit H. False otherwise.

cmiles.utils.has_stereo_defined(molecule)[source]

Checks if molecule has all stereo defined.

Parameters:
molecule:

oechem.OEMol or rdkit.Chem.Mol

Returns:
bool

True if all stereo defined, False otherwise

Notes

This does not check if all chirality or bond stereo are consistent. The best way to check is to try to generate a 3D conformer. If stereo information is inconsistent, this will fail.

cmiles.utils.invert_atom_map(atom_map)[source]

Invert atom map {map_idx:atom_idx} –> {atom_idx:map_idx}

Parameters:
atom_map: dict

{map_idx:atom_idx}

Returns:
dict

{atom_idx:map_idx}

cmiles.utils.is_map_canonical(molecule)[source]

Check if map indices on molecule is in caononical order

Parameters:
molecule:

oechem.Mol or rdkit.Chem.Mol

Returns:
bool
cmiles.utils.is_missing_atom_map(molecule)[source]

Check if any atom in molecule is missing atom map index

Parameters:
molecule:

oechem.Mol or rdkit.Chem.Mol

Returns:
bool

True if even if only one atom map is missing. False if all atoms have atom maps.

cmiles.utils.load_molecule(inp_molecule, toolkit='openeye', **kwargs)[source]

Load molecule.

Input is restrictive. Allowed inputs are:

  1. Isomeric SMILES
  2. JSON serialized molecule
Parameters:
inp_molecule: str or dict

isomeric SMILES or QCSChema

toolkit: str, optional, default openeye.

cheminformatics toolkit to use

Returns:
molecule:

oechem.OEMOl or rdkit.Chem.Mol

cmiles.utils.mol_from_json(inp_molecule, toolkit='openeye', **kwargs)[source]

Load a molecule from QCSchema

see QCSchema

Required fields for the QCSchema molecule:
  1. symbols
  2. geometry
  3. connectivity
Parameters:
inp_molecule: dict

QCSchema molecule with symbols, geometry and connectivity

toolkit: str, optional. Default openeye

cheminformatics toolkit to use. Currently supports openeye and rdkit

**permute_xyz: bool, optional, default False

If False, will add flag to molecule such that the mapped SMILES retains the order of serialized geometry. If True, mapped SMILES will be in canonical order and serialized geometry will have to be reordered.

Returns:
molecule

oechem.OEMol or rdkit.Chem.Mol

cmiles.utils.mol_to_hill_molecular_formula(molecule)[source]

Generate Hill sorted empirical formula.

Hill sorted first lists C and H and then all other symbols in alphabetical order

Parameters:
molecule:
`oechem.OEMol` or `rdkit.Chem.Mol`
Returns:
str

hill sorted empirical formula

cmiles.utils.mol_to_map_ordered_qcschema(molecule, molecule_ids, multiplicity=1, **kwargs)[source]

Genereate JSON serialize following QCSchema specs

Geometry, symbols and connectivity table ordered according to map indices in mapped SMILES

Parameters:
molecule:

oechem.OEMol or rdkit.Chem.Mol molecuel must have a conformer.

molecule_ids: dict

cmiles generated molecular ids.

multiplicity: int, optional, defualt 1

multiplicity of molecule

Returns:
dict

JSON serialized molecule following QCSchema specs

cmiles.utils.mol_to_smiles(molecule, **kwargs)[source]

Generate canonical smiles from molecule

Parameters:
molecule:

oechem.OEMol or rdkit.Chem.Mol

**isomeric: bool, optional, default True

If False, SMILES will not include stereo information

**explicit_hydrogen: bool, optional, default True

If True, SMILES will have explicit hydrogen.

**mapped: bool, optional, default True

If True, SMILES will have map indices

Example: O=O will be [O:1]=[O:2]

Returns:
str

SMILES

cmiles.utils.permute_qcschema(json_mol, molecule_ids, **kwargs)[source]
permute geometry and symbols to correspond to map indices on mapped SMILES
Parameters:
json_mol: dict

JSON serialized molecule.

Required fields: symbols, geometry, connectivity and multiplicity

molecule_ids: dict

cmiles generated molecular ids

Returns:
dict

JSON serialized molecule. symbols, geometry, and connectivity ordered according to map indices on mapped SMILES.

Also includes identifiers field with cmiles generated identifiers.

cmiles.utils.remove_atom_map(molecule, keep_map_data=True)[source]

Remove atom map from molecule

Parameters:
molecule

oechem.OEMol or rdkit.Chem.Mol

keep_map_data: bool, optional, default True

If True, will save map indices in atom data

cmiles.utils.restore_atom_map(molecule)[source]

Restore atom map from atom data in place

Parameters:
molecule

oechem.OEMol or rdkit.Chem.Mol

Module contents

Generate canonical identifiers for chemical databases, specifically quantum chemical data.