I tried to organize the pretreatment of compounds often used in Python.
This time, I used the following library. See Resources for installation instructions. MolVS is a library specializing in compound pretreatment, but it seems that it is also incorporated in RDKit.
RDKit : SanitizeMol Kekule formation, confirmation of valence, setting of aromaticity, conjugation, hybridization, etc. are performed. reference: http://rdkit.org/docs/source/rdkit.Chem.rdmolops.html
If you create a mol object from Smiles from RDKit, it looks like it is done by default. Feeling to use after editing the mol object by yourself?
MolVS : Normarize reference: https://molvs.readthedocs.io/en/latest/guide/standardize.html
A series of transformations to fix common drawing errors and standardize feature groups. Is it a charge correction?
Let's try it for the time being.
from rdkit import Chem
from molvs.normalize import Normalizer, Normalization
old_smiles = "[Na]OC(=O)c1ccc(C[S+2]([O-])([O-]))cc1"
print("PREV:" + old_smiles)
old_mol = Chem.MolFromSmiles(old_smiles)
normalizer = Normalizer(normalizations=[Normalization('Sulfone to S(=O)(=O)', '[S+2:1]([O-:2])([O-:3])>>[S+0:1](=[O-0:2])(=[O-0:3])')])
new_mol = normalizer.normalize(old_mol)
new_smiles = Chem.MolToSmiles(new_mol)
print("NEW:" + new_smiles)
Above, the normalization process defined in "Sulfone to S (= O) (= O)" is selectively executed. The result is as follows, and the charges of sulfur atom and oxygen atom have changed. If you generate a Normalizer with no arguments, all the normalization processes defined in MolVS in advance will be performed.
PREV:[Na]OC(=O)c1ccc(C[S+2]([O-])([O-]))cc1
NEW: O=C(O[Na])c1ccc(C[S](=O)=O)cc1
MolVS : TautomerCanonicalizer Reference: https://molvs.readthedocs.io/en/latest/guide/tautomer.html It seems that Tautomer is a set of molecules that easily exchange with each other through the movement of hydrogen atoms. Phenol is said to be a typical example. (Example of phenol) https://en.wikipedia.org/wiki/File:Phenol_tautomers.svg
let's try it.
from rdkit import Chem
from molvs.tautomer import TAUTOMER_TRANSFORMS, TAUTOMER_SCORES, MAX_TAUTOMERS, TautomerCanonicalizer, TautomerEnumerator, TautomerTransform
tautomerCanonicalizer = TautomerCanonicalizer((
TautomerTransform('1,7 aromatic heteroatom H shift r', '[#7,S,O,Se,Te,CX4;!H0]-[#6,#7X2]=[#6]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[NX2,S,O,Se,Te]'),
))
mol = Chem.MolFromSmiles("O=C1CC=CC=C1")
print("prev:" + Chem.MolToSmiles(mol))
mol2 = tautomerCanonicalizer.canonicalize(mol)
print("after: "+ Chem.MolToSmiles(mol2))
Above, the Tautoemer process defined by the rule '1,7 aromatic heteroatom H shift r'is selectively executed by the phenol Tautomer. As a result, phenol is produced as follows. If TautomerCanonicalizer is generated without any arguments, all Tautoemer processes defined in MolVS in advance will be performed.
prev:O=C1C=CC=CC1
after: Oc1ccccc1
MolVS : LargestFragmentChooser Reference: https://molvs.readthedocs.io/en/latest/api.html#molvs-fragment Roughly speaking, when multiple molecules are included, the largest molecule is returned.
let's try it
from rdkit import Chem
from molvs.fragment import LargestFragmentChooser
flagmentChooser1 = LargestFragmentChooser()
old_smiles = "O=S(=O)(Cc1[nH]c(-c2ccc(Cl)s2)c[s+]1)c1cccs1.[Br-]"
print("prev:" + old_smiles)
mol = Chem.MolFromSmiles(old_smiles)
mol2 = flagmentChooser1(mol)
print("after:" + Chem.MolToSmiles(mol2))
In the upper part, LargestFragmentChooser is applied to the ionic bond between the bromine ion and another molecule, but the one in which the bromine ion is removed is generated as shown in the lower part.
prev:O=S(=O)(Cc1[nH]c(-c2ccc(Cl)s2)c[s+]1)c1cccs1.[Br-]
after:O=S(=O)(Cc1[nH]c(-c2ccc(Cl)s2)c[s+]1)c1cccs1
Reference: https://molvs.readthedocs.io/en/latest/api.html#molvs-charge
It attempts to neutralize the ionized acids and bases on the molecule. let's try it.
from molvs.charge import Reionizer, Uncharger
uncharger = Uncharger()
mol = Chem.MolFromSmiles("c1cccc[nH+]1")
print("prev:" + Chem.MolToSmiles(mol))
mol2 = uncharger(mol)
print("after:" + Chem.MolToSmiles(mol2))
The top is a molecule containing ionized acids and bases, but when Uncharger is applied, it is neutralized as shown below.
prev:c1cc[nH+]cc1
after:c1ccncc1
Some of the things that could not be introduced this time were processing such as "MolVS: reionization" and "MolVS: Disconnect metals", but the explanation is omitted because the target compound could not be imagined. See Resources for details.