If you don't understand the default options and somehow use Chem.MolToSmiles, it's time to graduate. A memo summarized by that.
RDkit 2020.03.5
option | Description |
---|---|
iosomericSmiles | Include information about stereochemistry in SMILES. The default is true |
kekuleSmiles | Uses Kekule format (no aromatic bonds) with SMILES. Default is false |
rootedAtAtom | If not negative, this forces SMILES to start at a particular atom. The default is-1 |
canonical | If false, it will not be normalized. The default is true. |
allBondsExplicit | If true, all bond orders are explicitly printed in the output SMILES. The default is false. |
allHsExplicit | If true, all H counts are explicitly output in the output SMILES. The default is false. |
Create such a method to check the operation other than rootedAtAtom.
def generate_smiles(old_smiles, isometric=True, kekule=False, allBondsExplicit=False, allHsExplicit=False, canonical=True):
print(f"\n\ngenerate smiles {old_smiles}")
print(f"prev smiles = {old_smiles}")
old_mol = Chem.MolFromSmiles(old_smiles)
new_smiles = Chem.MolToSmiles(old_mol, isomericSmiles=isometric, kekuleSmiles=kekule,
allBondsExplicit=allBondsExplicit, allHsExplicit=allHsExplicit, canonical=canonical)
print(f"new smiles = {new_smiles}")
Let's check with this guy who has 3D information and aromatic ring information.
C[C@H]1COC2=C1C(=O)C(=O)c1c2ccc2c1CCCC2(C)C
CC1COC2=C1C(=O)C(=O)c1c2ccc2c1CCCC2(C)C
Oh, the 3D information has disappeared.
C[C@H]1COC2=C1C(=O)C(=O)c1c2ccc2c1CCCC2(C)C
No change. It is possible that the original SMILES was canonical.
C[C@H]1COC2=C1C(=O)C(=O)C1:C2:C:C:C2:C:1CCCC2(C)C
c has been capitalized and colons have increased. -And = alternate, isn't it a pattern?
C-[C@H]1-C-O-C2=C-1-C(=O)-C(=O)-c1:c-2:c:c:c2:c:1-C-C-C-C-2(-C)-C
The single bond has also come out properly. Well, it's noisy (laughs).
[CH3][C@H]1[CH2][O][C]2=[C]1[C](=[O])[C](=[O])[c]1[c]2[cH][cH][c]2[c]1[CH2][CH2][CH2][C]2([CH3])[CH3]
Implicit hydrogen has been revealed. Even more noisy (laughs)
[CH3]-[C@H]1-[CH2]-[O]-[C]2=[C]-1-[C](=[O])-[C](=[O])-[c]1:[c]-2:[cH]:[cH]:[c]2:[c]:1-[CH2]-[CH2]-[CH2]-[C]-2(-[CH3])-[CH3]
For people who cannot read between lines.
https://www.rdkit.org/docs/source/rdkit.Chem.rdmolfiles.html
Recommended Posts