Create a command to search for similar compounds from the target database with RDKit and check the processing time

Introduction

I was wondering how long it would take for a query compound to search for similar compounds in the target database (just SDF) with RDKit, so I wrote a command.

Source

When calculating similarity, it is common to generate a fingerprint and calculate the similarity score using the Tanimoto coefficient. Fingerprints are bits of chemical structure and there are various methods. Here, I tried using major MACCS Keys with a small number of bits.

import argparse
from rdkit import Chem
from rdkit.Chem import Descriptors, AllChem
from rdkit import rdBase, Chem, DataStructs


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-query", type=str, required=True)
    parser.add_argument("-target_db", type=str, required=True)
    args = parser.parse_args()

    #Read query
    mol_block = ""
    with open(args.query) as f:
        for line in f:
            mol_block += line

    query_mol = Chem.MolFromMolBlock(mol_block)

    #Loading SDF
    target_sdf_sup = Chem.SDMolSupplier(args.target_db)

    #FingerPrint calculation(query)
    query_fp = AllChem.GetMACCSKeysFingerprint(query_mol)

    #FingerPrint calculation(target)
    target_fps = [AllChem.GetMACCSKeysFingerprint(mol) for mol in target_sdf_sup]

    for i, target_fp in enumerate(target_fps):
        result = DataStructs.TanimotoSimilarity(query_fp, target_fp)
        print(i, result)


if __name__ == "__main__":
    main()

How to use

Like this. Thank you argparse.

usage: StructureSimilaritySearch.py [-h] -query QUERY -target_db TARGET_DB

optional arguments:
  -h, --help            show this help message and exit
  -query QUERY(mol)
  -target_db TARGET_DB(sdf)

processing time

As usual, search by targeting 1024 train data of Solubility of RDkit. query is appropriate. Then, it will be returned in about 1 second. If it is 10,000 units, it seems that it will be reasonable as it is.

reference

Recommended Posts

Create a command to search for similar compounds from the target database with RDKit and check the processing time
I tried to create serverless batch processing for the first time with DynamoDB and Step Functions
Zip-compress any file with the [shell] command to create a file and delete the original file.
I want to create a lunch database [EP1] Django study for the first time
I want to create a lunch database [EP1-4] Django study for the first time
I want to create a Dockerfile for the time being.
Create an audio file with the text-to-speech function with Google Text To Speak and check the text as a guide for the speech for 3 minutes.
Create a model to store information from the Google Books API for intuitive handling and testing
Create a summary table by product and time by processing the data extracted from a certain POS system
Search for Twitter keywords with tweepy and write the results to Excel
SSH login to the target server from Windows with a click of a shortcut
Create a clean DB for testing with FastAPI and unittest the API with pytest
How to create a shortcut command for LINUX
Create a command to get the work log
How to use the grep command to recursively search directories and files to a specified depth
Try to create a battle record table with matplotlib from the data of "Schedule-kun"
Perform a Twitter search from Python and try to generate sentences with Markov chains.
Check the processing time and the number of calls for each process in python (cProfile)
[Python] Measures and displays the time required for processing
"Stop committing Japanese files to git on Mac> <" For the time being, I wrote a script to search for incompatible Japanese files on Mac and Linux.
[Golang] Command to check the supported GOOS and GOARCH in a list (Check the supported platforms of the build)
Try to generate a cyclic peptide from an amino acid sequence with Python and RDKit
Create a script for your Pepper skill in a spreadsheet and load SayText directly from the script
Give the history command a date and time and collect the history files of all users with a script
Tips for Python beginners to use the Scikit-image example for themselves 8 Processing time measurement and profiler
Search for large files on Linux from the command line
Probably the easiest way to create a pdf with Python3
[Python] Create a date and time list for a specified period
Try a similar search for Image Search using the Python SDK [Search]
Create a Twitter BOT with the GoogleAppEngine SDK for Python
I want to get information from fstab at the ssh connection destination and execute a command
Try to generate a death metal jacket image with DCGAN + scrape the metal database site for that
I made a function to check if the webhook is received in Lambda for the time being
I want to record the execution time and keep a log.
Define the reaction pattern with SMARTS with RDKit and generate a reactant
Create an alias for Route53 to CloudFront with the AWS API
Create a striped illusion with gamma correction for Python3 and openCV3
Create a color picker for the color wheel with Python + Qt (PySide)
How to make a command to read the configuration file with pyramid
[Go] Create a CLI command to change the extension of the image
How to create a label (mask) for segmentation with labelme (semantic segmentation mask)
[EC2] How to install and download chromedriver from the command line
I tried to create Bulls and Cows with a shell program
A command to easily check the speed of the network on the console
Create custom Django commands and run them from the command line
For the time being, I want to convert files with ffmpeg !!
Create a function to get the contents of the database in Go
Create a REST API to operate dynamodb with the Django REST Framework
Create and return a CP932 CSV file for Excel with Chalice
[Python] How to create a dictionary type list, add / change / delete elements, and extract with a for statement
I want to use only the SMTP MAIL FROM command and RCPT TO command without sending mail with Python's smtplib
Upload data to s3 of aws with a command and update it, and delete the used data (on the way)