Convert SDF to CSV quickly

Introduction

I wrote a script to convert SDF, which is a compound data format, to CSV quickly.

specification

--Read the properties in SDF and output as CSV items --The properties of each compound do not necessarily have the same properties (the properties that do not have are empty).

Source

SDF2CSVConvert.py


import pandas as pd
from rdkit import Chem
import argparse
from collections import defaultdict


def main():

    parser = argparse.ArgumentParser()
    parser.add_argument("-input", type=str, required=True)
    parser.add_argument("-output", type=str, required=True)
    parser.add_argument("-save_name", action='store_true', help="store header line as _Name")
    args = parser.parse_args()

    #Loading SDF(Read all parameter names the first time)
    sdf_sup = Chem.SDMolSupplier(args.input)
    Props = []
    if args.save_name:
        Props.append("_Name")

    for mol in sdf_sup:
        for name in mol.GetPropNames():
            if name not in Props:
                Props.append(name)

    #Dictionary to store data
    param_dict = defaultdict(list)

    #Loading SDF(The second time, the parameters of the compound are acquired. Otherwise an error)
    sdf_sup = Chem.SDMolSupplier(args.input)
    for mol in sdf_sup:
        #Get name
        for name in Props:
            if mol.HasProp(name):
                param_dict[name].append(mol.GetProp(name))
            else:
                param_dict[name].append(None)

    #Convert at once with pandas
    df = pd.DataFrame(data=param_dict)
    df.to_csv(args.output, index=False)


if __name__ == "__main__":
    main()

Commentary

The SDF is loaded first to know the properties of all compounds. Then, the value of the property of each compound is read in the second reading. If the compound does not have properties, None is included. Finally, the dictionary type that stores the properties was thrown into Pandas and output to CSV. In addition, the first line of SDF can be saved with the property "_Name" with -save_name. See source for other arguments.

Output example

The Solubility data of RDKit looks like this.

_Name,ID,NAME,SOL,SMILES,SOL_classification
3-methylpentane,5,3-methylpentane,-3.68,CCC(C)CC,(A) low
"2,4-dimethylpentane",10,"2,4-dimethylpentane",-4.26,CC(C)CC(C)C,(A) low
1-pentene,15,1-pentene,-2.68,CCCC=C,(B) medium
cyclohexene,20,cyclohexene,-2.59,C1CC=CCC1,(B) medium
"1,4-pentadiene",25,"1,4-pentadiene",-2.09,C=CCC=C,(B) medium
cycloheptatriene,30,cycloheptatriene,-2.15,C1=CC=CC=CC1,(B) medium
1-octyne,35,1-octyne,-3.66,CCCCCCC#C,(A) low
ethylbenzene,40,ethylbenzene,-2.77,c1ccccc1CC,(B) medium
"1,3,5-trimethylbenzene",45,"1,3,5-trimethylbenzene",-3.4,c1c(C)cc(C)cc1C,(A) low
indane,50,indane,-3.04,c(c(ccc1)CC2)(c1)C2,(A) low
isobutylbenzene,55,isobutylbenzene,-4.12,c1ccccc1CC(C)C,(A) low
n-hexylbenzene,60,n-hexylbenzene,-5.21,c1ccccc1CCCCCC,(A) low

Recommended Posts

Convert SDF to CSV quickly
Convert to HSV
How to convert csv to tsv in CLI
[Python] Convert csv file delimiters to tab delimiters
Convert from PDF to CSV with pdfplumber
[Python] Convert from DICOM to PNG or CSV
Convert UTF-8 CSV files to read in Excel
Convert XLSX to CSV on the command line
Convert 202003 to 2020-03 with pandas
Convert kanji to kana
Convert jupyter to py
Convert keras-yolo3 to onnx
Convert dict to array
Convert json to excel
I convert AWS JSON data to CSV like this
How to convert JSON file to CSV file with Python Pandas
How to convert Json file to CSV format or EXCEL format
Convert hexadecimal string to binary
[python] Convert date to string
Convert numpy int64 to python int
[Python] Convert list to Pandas [Pandas]
Convert HTML to text file
[Python] Convert Shift_JIS to UTF-8
Convert IP address to decimal
Batch convert all xlsx files in the folder to CSV files
Write to csv with Python
Convert genbank file to gff file
Convert python 3.x code to python 2.x
[Python] Convert CSV file uploaded to S3 to JSON file with AWS Lambda
After calling the Shell file on Python, convert CSV to Parquet.
Convert PDF of Go To Eat Hokkaido campaign dealer list to CSV
Python hand play (RDKit descriptor calculation: SDF to CSV using Pandas)
[Good By Excel] python script to generate sql to convert csv to table
Convert Tweepy Status object to JSON
Function to convert Excel column to number
Convert markdown to PDF in Python
[Python] Write to csv file with Python
Convert A4 PDF to A3 every 2 pages
Convert wma to mp3 on Mac
Convert some Japanese names to antonyms
convert ggplot based graph to html
Convert list to DataFrame with python
Convert sentences to vectors with gensim
How to convert 0.5 to 1056964608 in one shot
Python> list> Convert double list to single list
Convert from pdf to txt 2 [pyocr]
How to convert Tensorflow model to Lite
Convert a string to an image
[Django] Command to output QuerySet to csv
[Python] Convert natural numbers to ordinal numbers
Convert decimal numbers to n-ary numbers [python]
Program to convert Japanese to station name
How to convert from .mgz to .nii.gz
Convert PDF to image with ImageMagick
A tool to convert Juniper config
Raise local CSV to Google SpreadSheet
Python> tuple> Convert double tuple to single tuple
Convert XML document stored in XML database (BaseX) to CSV format (using Python)
I want to convert a table converted to PDF in Python back to CSV
Convert the spreadsheet to CSV and upload it to Cloud Storage with Cloud Functions
Convert PDF of Kumamoto Prefecture Go To EAT member store list to CSV