Here, we will explain about RDKit, which is indispensable for chemoinformatics. I will summarize the basic method using Python.
To use RDKit, it is recommended to install Anaconda and install it with conda
.
$ conda install -c rdkit rdkit
When using it, import it as follows.
from rdkit import Chem
For example, to save the structure of the compound shown in SMILES as a png file, do as follows.
from rdkit import Chem
molecule = Chem.MolFromSmiles(Compound SMILES)
Chem.Draw.MolToFile(molecule, 'file name.png')
It can also be created from a mol file.
from rdkit import Chem
molecule = Chem.MolFromMolFile(Compound mol file)
Chem.Draw.MolToFile(molecule, 'file name.png')
To calculate the descriptor of a compound read by SMILES:
from rdkit import Chem
from rdkit.ML.Descriptors import MoleculeDescriptors
smiles_list = [List of SMILES of target compounds]
target_descriptors = []
for desc in Chem.Descriptors.descList:
target_descriptors.append(desc[0]) #desc is a tuple of descriptor names and related information.
print(len(target_descriptors))
print(target_descirptors)
descriptor_calculator = MoleculeDescriptors.MolecularDescriptorCalculator(target_descriptors)
descriptors = []
for smiles in smiles_list:
molecule = Chem.MolFromSmiles(smiles)
descriptors.append(descriptor_calculator.CalcDescriptors(molecule))
print(descriptors)
Here, I explained how to use RDKit in Python. If you understand this content, you will be able to easily calculate the descriptor of a compound.
How can chemoinformatics help pharmaceutical companies? What kind of knowledge do you need?
Recommended Posts