RDKit usage memo


The environment and things that should be installed



Create a virtual environment for RDKit with conda

$ conda create -c rdkit -n (Virtual environment name) rdkit

If you try using import rdkit as it is, it is said that there is no library of ○○, so download it ( libxrender1 in my case)

$ sudo apt install libxrender1

Others Install matplotlib and scikit-learn with $ conda install XXXX (And tabulate for article creation)

Try out

Display of molecules

from rdkit import Chem
from rdkit.Chem import Draw

#Changed the molecule written in SMILES notation to a format called mol file
molecule_1 = Chem.MolFromSmiles('Cc1ccccc1')

#Convert mol format file to image(display)

[Output result] 001_DrawTest.png

Creating a fingerprint

Obtain data on SMILES and solubility of compounds by referring to the following articles. Introduction to Chemical Informatics with RDKit and Scikit-learn

smile XXXX logS
0 O=C(C)N 60-35-5 1.58
1 NNC 60-34-4 1.34
2 O=C(C)O 64-19-7 1.22
3 N1CCCC1 123-75-1 1.15
4 O=C(N)NO 127-07-1 1.12

Converts what is saved in SMILE format to MOL format. Then get a fingerprint. --radius: Radius. How many atoms ahead should be considered from the atom of interest?

smiles = df['smile']
molecules = [Chem.MolFromSmiles(smile) for smile in smiles]

#Create only one fingerprint as a trial
from rdkit.Chem import AllChem
molecule_1 = molecules[0]

fingerprint = AllChem.GetMorganFingerprintAsBitVect(mol=molecule_1, radius=2, nBits=2048)
# -> rdkit.DataStructs.cDataStructs.Data of class called ExplicitBitVect is created

# ->Changed to something like 000000000000000100000000000000100
# ->After that, you can use this and plunge into the machine learning model as you like.

