cmiles package¶
Subpackages¶
Submodules¶
cmiles.generator module¶
Generate canonical identifiers for chemical databases, specifically quantum chemical data.
-
cmiles.generator.
get_inchi_and_key
(molecule)[source]¶ Generate inchi and inchikey. Uses RDKit which uses the inchi API
Parameters: - molecule: rdkit.Chem.Mol
If an oechem.OEMol is provided, will convert it to an rdkit.Chem.Mol
Returns: - tuple (inchi, inchi_key)
-
cmiles.generator.
get_iupac
(molecule)[source]¶ Generate IUPAC name
Parameters: - molecule :
oechem.OEMol
Returns: - str:
iupac name
Notes
Will only be generated if has openeye license
-
cmiles.generator.
get_molecule_ids
(molecule_input, toolkit='openeye', strict=True, **kwargs)[source]¶ Generate a dictionary of canonical identifiers
The molecule_input can be either a JSON serialised molecule (see QCSchema) or an isomeric SMILES with all steroechemistry defined.
Required fields for the QCSchema molecule:
- symbols
- geometry
- connectivity
Parameters: - molecule_input: dict or str
A JSON serialized QC molecule or an isomeric SMILES
- toolkit: str, optional, default ‘openeye’
toolkit to use for canonicalization. Currently supports openeye and rdkit
- strict: bool, optional. Default True
If true, will raise an exception if SMILES is missing explicit H.
- **permute_xyz: bool, optional, default False
Only use if input molecule is in QCSchema format.
If True, the geometry will be permuted to reflect the canononical atom order in the mapped SMILES.
get_molecule_ids
will return the permuted QCSchema.cmiles
identifiers will be in the identifiers fieldIf False, the map indices in the mapped SMILES will reflect the order of the atoms in the input QCSchema.
Returns: - dict
If
permute_xyz=True
, will return permuted qcschema with cmiles identifiers in identifiers field.
-
cmiles.generator.
get_unique_protomer
(molecule)[source]¶ Generate unique protomer for all tuatomers and charge states of the moelcule.
Requires openeye license
Parameters: - molecule: oechem.OEMol
Will convert rdkit.Chem.Mol to oechem.OEMol if openeye is installed and license is valid
Returns: - str
unique protomer
-
cmiles.generator.
standardize_tautomer
(iso_can_smi)[source]¶ Standardize tautomer to one universal tautomer.
Parameters: - iso_can_smi: str
isomeric SMILES
Returns: - str:
standardized tautomer
Notes
Does not standardize for ionization states. In some cases preforms better than oequacpac.OEGetUniqueProtomer. See notebook
cmiles.utils module¶
Utility functions for cmiles generator
-
cmiles.utils.
add_explicit_hydrogen
(molecule)[source]¶ Add explicit hydrogen to molecule
Parameters: - molecule:
oechem.OEMol or rdkit.Chem.Mol
Returns: - molecule
oechem.OEMol or rdkit.Chem.Mol with explict hydrogen
-
cmiles.utils.
get_atom_map
(molecule, mapped_smiles, **kwargs)[source]¶ Get mapping of map index -> atom index
Parameters: - molecule:
oechem.OEMol or rdkit.Chem.Mol
- mapped_smiles: str
explicit hydrogen mapped SMILES
Returns: - atom_map: dict
dictionary mapping {map_index: atom_index}
-
cmiles.utils.
get_charge
(molecule)[source]¶ Get charge state of molecule
Parameters: - molecule:
oechem.OEMol or rdkit.Chem.Mol
Returns: - int
total charge of molecule
-
cmiles.utils.
get_connectivity_table
(molecule, atom_map)[source]¶ Generate connectivity table
Parameters: - molecule:
oechem.Mol or rdkit.Chem.Mol
- atom_map: dict
{map_idx : atom_idx}
Returns: - list: list of lists
lists of atoms bonded and the bond order [[map_idx_1, map_idx_2, bond_order] …]
-
cmiles.utils.
has_atom_map
(molecule)[source]¶ Check if molecule has atom map indices. Will return True even if only one atom has map index
Parameters: - molecule:
oechem.Mol or rdkit.Chem.Mol
Returns: - bool
True if has one map index. False if molecule has no map indices
-
cmiles.utils.
has_explicit_hydrogen
(molecule)[source]¶ Check if molecule has explicit hydrogen.
Parameters: - molecule:
oechem.OEMol or rdkit.Chem.Mol
Returns: - bool
True if has all explicit H. False otherwise.
-
cmiles.utils.
has_stereo_defined
(molecule)[source]¶ Checks if molecule has all stereo defined.
Parameters: - molecule:
oechem.OEMol or rdkit.Chem.Mol
Returns: - bool
True if all stereo defined, False otherwise
Notes
This does not check if all chirality or bond stereo are consistent. The best way to check is to try to generate a 3D conformer. If stereo information is inconsistent, this will fail.
-
cmiles.utils.
invert_atom_map
(atom_map)[source]¶ Invert atom map {map_idx:atom_idx} –> {atom_idx:map_idx}
Parameters: - atom_map: dict
{map_idx:atom_idx}
Returns: - dict
{atom_idx:map_idx}
-
cmiles.utils.
is_map_canonical
(molecule)[source]¶ Check if map indices on molecule is in caononical order
Parameters: - molecule:
oechem.Mol or rdkit.Chem.Mol
Returns: - bool
-
cmiles.utils.
is_missing_atom_map
(molecule)[source]¶ Check if any atom in molecule is missing atom map index
Parameters: - molecule:
oechem.Mol or rdkit.Chem.Mol
Returns: - bool
True if even if only one atom map is missing. False if all atoms have atom maps.
-
cmiles.utils.
load_molecule
(inp_molecule, toolkit='openeye', **kwargs)[source]¶ Load molecule.
Input is restrictive. Allowed inputs are:
- Isomeric SMILES
- JSON serialized molecule
Parameters: - inp_molecule: str or dict
isomeric SMILES or QCSChema
- toolkit: str, optional, default openeye.
cheminformatics toolkit to use
Returns: - molecule:
oechem.OEMOl or rdkit.Chem.Mol
-
cmiles.utils.
mol_from_json
(inp_molecule, toolkit='openeye', **kwargs)[source]¶ Load a molecule from QCSchema
see QCSchema
Required fields for the QCSchema molecule:- symbols
- geometry
- connectivity
Parameters: - inp_molecule: dict
QCSchema molecule with symbols, geometry and connectivity
- toolkit: str, optional. Default openeye
cheminformatics toolkit to use. Currently supports openeye and rdkit
- **permute_xyz: bool, optional, default False
If False, will add flag to molecule such that the mapped SMILES retains the order of serialized geometry. If True, mapped SMILES will be in canonical order and serialized geometry will have to be reordered.
Returns: - molecule
oechem.OEMol or rdkit.Chem.Mol
-
cmiles.utils.
mol_to_hill_molecular_formula
(molecule)[source]¶ Generate Hill sorted empirical formula.
Hill sorted first lists C and H and then all other symbols in alphabetical order
Parameters: - molecule:
- `oechem.OEMol` or `rdkit.Chem.Mol`
Returns: - str
hill sorted empirical formula
-
cmiles.utils.
mol_to_map_ordered_qcschema
(molecule, molecule_ids, multiplicity=1, **kwargs)[source]¶ Genereate JSON serialize following QCSchema specs
Geometry, symbols and connectivity table ordered according to map indices in mapped SMILES
Parameters: - molecule:
oechem.OEMol or rdkit.Chem.Mol molecuel must have a conformer.
- molecule_ids: dict
cmiles generated molecular ids.
- multiplicity: int, optional, defualt 1
multiplicity of molecule
Returns: - dict
JSON serialized molecule following QCSchema specs
-
cmiles.utils.
mol_to_smiles
(molecule, **kwargs)[source]¶ Generate canonical smiles from molecule
Parameters: - molecule:
oechem.OEMol or rdkit.Chem.Mol
- **isomeric: bool, optional, default True
If False, SMILES will not include stereo information
- **explicit_hydrogen: bool, optional, default True
If True, SMILES will have explicit hydrogen.
- **mapped: bool, optional, default True
If True, SMILES will have map indices
Example: O=O will be
[O:1]=[O:2]
Returns: - str
SMILES
-
cmiles.utils.
permute_qcschema
(json_mol, molecule_ids, **kwargs)[source]¶ - permute geometry and symbols to correspond to map indices on mapped SMILES
Parameters: - json_mol: dict
JSON serialized molecule.
Required fields: symbols, geometry, connectivity and multiplicity
- molecule_ids: dict
cmiles generated molecular ids
Returns: - dict
JSON serialized molecule. symbols, geometry, and connectivity ordered according to map indices on mapped SMILES.
Also includes identifiers field with cmiles generated identifiers.
Module contents¶
Generate canonical identifiers for chemical databases, specifically quantum chemical data.