It's time to stop generating SMILES with RDKit

Overview

If you don't understand the default options and somehow use Chem.MolToSmiles, it's time to graduate. A memo summarized by that.

environment

RDkit 2020.03.5

Option description

option Description
iosomericSmiles Include information about stereochemistry in SMILES. The default is true
kekuleSmiles Uses Kekule format (no aromatic bonds) with SMILES. Default is false
rootedAtAtom If not negative, this forces SMILES to start at a particular atom. The default is-1
canonical If false, it will not be normalized. The default is true.
allBondsExplicit If true, all bond orders are explicitly printed in the output SMILES. The default is false.
allHsExplicit If true, all H counts are explicitly output in the output SMILES. The default is false.

let's try it

Create such a method to check the operation other than rootedAtAtom.

def generate_smiles(old_smiles, isometric=True, kekule=False, allBondsExplicit=False, allHsExplicit=False, canonical=True):
    print(f"\n\ngenerate smiles {old_smiles}")
    print(f"prev smiles = {old_smiles}")
    old_mol = Chem.MolFromSmiles(old_smiles)
    new_smiles = Chem.MolToSmiles(old_mol, isomericSmiles=isometric, kekuleSmiles=kekule,
                                 allBondsExplicit=allBondsExplicit, allHsExplicit=allHsExplicit, canonical=canonical)

    print(f"new smiles = {new_smiles}")

SMILES to be tested

Let's check with this guy who has 3D information and aromatic ring information.

C[C@H]1COC2=C1C(=O)C(=O)c1c2ccc2c1CCCC2(C)C

When changing isometicSmiles from the default true to flase

CC1COC2=C1C(=O)C(=O)c1c2ccc2c1CCCC2(C)C

Oh, the 3D information has disappeared.

When canonical is changed from the default true to flaz

C[C@H]1COC2=C1C(=O)C(=O)c1c2ccc2c1CCCC2(C)C

No change. It is possible that the original SMILES was canonical.

When changing kekureSmiles from the default false to true

C[C@H]1COC2=C1C(=O)C(=O)C1:C2:C:C:C2:C:1CCCC2(C)C

c has been capitalized and colons have increased. -And = alternate, isn't it a pattern?

When allBondsExplicit is changed from the default false to true

C-[C@H]1-C-O-C2=C-1-C(=O)-C(=O)-c1:c-2:c:c:c2:c:1-C-C-C-C-2(-C)-C

The single bond has also come out properly. Well, it's noisy (laughs).

When allHsExplicit is changed from the default false to true

[CH3][C@H]1[CH2][O][C]2=[C]1[C](=[O])[C](=[O])[c]1[c]2[cH][cH][c]2[c]1[CH2][CH2][CH2][C]2([CH3])[CH3]

Implicit hydrogen has been revealed. Even more noisy (laughs)

Finally, set allBondsExplicit and allHsExplicit to true at the same time (bonus)

 [CH3]-[C@H]1-[CH2]-[O]-[C]2=[C]-1-[C](=[O])-[C](=[O])-[c]1:[c]-2:[cH]:[cH]:[c]2:[c]:1-[CH2]-[CH2]-[CH2]-[C]-2(-[CH3])-[CH3]

For people who cannot read between lines.

reference

https://www.rdkit.org/docs/source/rdkit.Chem.rdmolfiles.html

Recommended Posts

It's time to stop generating SMILES with RDKit
It's time to install DB with Docker! DB installation for beginners on Docker
How to achieve time wait processing with wxpython
How to measure execution time with Python Part 1
How to measure execution time with Python Part 2
Stop EC2 for specified time + start with Lambda (python)
[Introduction to WordCloud] It's easy to use even with Jetson-nano ♬
How to measure mp3 file playback time with python
To avoid spending time coloring shapes drawn with python
It's too troublesome to display Japanese with Vim's python3.
I tried to implement time series prediction with GBDT