Introduction

Here, we will explain about RDKit, which is indispensable for chemoinformatics. I will summarize the basic method using Python.

Installation and import

To use RDKit, it is recommended to install Anaconda and install it with conda.

$ conda install -c rdkit rdkit

When using it, import it as follows.

from rdkit import Chem

Molecule reading and writing

For example, to save the structure of the compound shown in SMILES as a png file, do as follows.

from rdkit import Chem


molecule = Chem.MolFromSmiles(Compound SMILES)
Chem.Draw.MolToFile(molecule, 'file name.png')

It can also be created from a mol file.

from rdkit import Chem


molecule = Chem.MolFromMolFile(Compound mol file)
Chem.Draw.MolToFile(molecule, 'file name.png')

Calculation of compound descriptor

To calculate the descriptor of a compound read by SMILES:

from rdkit import Chem
from rdkit.ML.Descriptors import MoleculeDescriptors


smiles_list = [List of SMILES of target compounds]
target_descriptors = []
for desc in Chem.Descriptors.descList:
    target_descriptors.append(desc[0]) #desc is a tuple of descriptor names and related information.
print(len(target_descriptors))
print(target_descirptors)

descriptor_calculator = MoleculeDescriptors.MolecularDescriptorCalculator(target_descriptors)
descriptors = []
for smiles in smiles_list:
    molecule = Chem.MolFromSmiles(smiles)
    descriptors.append(descriptor_calculator.CalcDescriptors(molecule))
print(descriptors)

Summary

Here, I explained how to use RDKit in Python. If you understand this content, you will be able to easily calculate the descriptor of a compound.

Reference materials / links

How can chemoinformatics help pharmaceutical companies? What kind of knowledge do you need?