The environment and things that should be installed
Create a virtual environment for RDKit with conda
$ conda create -c rdkit -n (Virtual environment name) rdkit
If you try using import rdkit
as it is, it is said that there is no library of ○○, so download it ( libxrender1
in my case)
$ sudo apt install libxrender1
Others Install matplotlib
and scikit-learn
with $ conda install XXXX
(And tabulate
for article creation)
from rdkit import Chem
from rdkit.Chem import Draw
#Changed the molecule written in SMILES notation to a format called mol file
molecule_1 = Chem.MolFromSmiles('Cc1ccccc1')
#Convert mol format file to image(display)
Draw.MolToImage(molecule_1)
[Output result]
Obtain data on SMILES and solubility of compounds by referring to the following articles. Introduction to Chemical Informatics with RDKit and Scikit-learn
smile | XXXX | logS | |
---|---|---|---|
0 | O=C(C)N | 60-35-5 | 1.58 |
1 | NNC | 60-34-4 | 1.34 |
2 | O=C(C)O | 64-19-7 | 1.22 |
3 | N1CCCC1 | 123-75-1 | 1.15 |
4 | O=C(N)NO | 127-07-1 | 1.12 |
Converts what is saved in SMILE format to MOL format. Then get a fingerprint. --radius: Radius. How many atoms ahead should be considered from the atom of interest?
smiles = df['smile']
molecules = [Chem.MolFromSmiles(smile) for smile in smiles]
#Create only one fingerprint as a trial
from rdkit.Chem import AllChem
molecule_1 = molecules[0]
fingerprint = AllChem.GetMorganFingerprintAsBitVect(mol=molecule_1, radius=2, nBits=2048)
# -> rdkit.DataStructs.cDataStructs.Data of class called ExplicitBitVect is created
fingerprint.ToBitString()
# ->Changed to something like 000000000000000100000000000000100
# ->After that, you can use this and plunge into the machine learning model as you like.
Recommended Posts