Free from hard-coding functions with SymPy

Introduction

Using SymPy, a computer algebra library, I performed an operation on a column of a Pandas data frame using a function defined in an external file, and output the result.

Command specifications

Like this. train is the input file, output is the output file, and function is the function definition file.

$ python command/calculate_function.py -h
usage: calculate_function.py [-h] -train TRAIN -function FUNCTION -output
                             OUTPUT

optional arguments:
  -h, --help          show this help message and exit
  -train TRAIN        input function file.
  -function FUNCTION  input function file.
  -output OUTPUT      output csv file.

How to specify the function

The function definition file looks like this.

exp,cos(x),NewExp
exp,exp(x),ExpExp
exp,sin(x),SinExp

The first column is the column name to be calculated, and the second column name is the function expression. By the way, x means the value of the column to be calculated. The third column is the column name that stores the calculation result.

let's try it

The source looks like this. The exec command is used to recognize the function expression as a python source.

calculate_function.py


import argparse
import csv
import pandas as pd
import numpy as np
from sympy import *
import csv


def main():

    parser = argparse.ArgumentParser()
    parser.add_argument("-train", type=str, required=True, help="input function file.")
    parser.add_argument("-function", type=str, required=True, help="input function file.")
    parser.add_argument("-output", type=str, required=True, help="output csv file.")

    args = parser.parse_args()

    df = pd.read_csv(args.train, index_col=0)

    #Data reading
    file = open(args.function, 'r')
    data = csv.reader(file)
    for row in data:
        exec('x=Symbol("x")')
        exec('f='+str(row[1]))
        exec('func = lambdify((x), f, "numpy")')
        exec('df["{0}"] = func(df["{1}"])'.format(row[2], row[0]))
    file.close()

    df.to_csv(args.output)


if __name__ == "__main__":
    main()

Run

Input file

CMPD_CHEMBLID,exp,smiles
CHEMBL596271,3.54,Cn1c(CN2CCN(CC2)c3ccc(Cl)cc3)nc4ccccc14
CHEMBL1951080,-1.18,COc1cc(OC)c(cc1NC(=O)CSCC(=O)O)S(=O)(=O)N2C(C)CCc3ccccc23
CHEMBL1771,3.69,COC(=O)[C@@H](N1CCc2sccc2C1)c3ccccc3Cl
CHEMBL234951,3.37,OC[C@H](O)CN1C(=O)C(Cc2ccccc12)NC(=O)c3cc4cc(Cl)sc4[nH]3
CHEMBL565079,3.1,Cc1cccc(C[C@H](NC(=O)c2cc(nn2C)C(C)(C)C)C(=O)NCC#N)c1
CHEMBL317462,3.14,OC1(CN2CCC1CC2)C#Cc3ccc(cc3)c4ccccc4

The function file is the file shown in the example of the function specification file.

Output result

CMPD_CHEMBLID,exp,smiles,NewExp,ExpExp,SinExp
CHEMBL596271,3.54,Cn1c(CN2CCN(CC2)c3ccc(Cl)cc3)nc4ccccc14,-0.9216800341052034,34.46691919085739,-0.3879509179417303
CHEMBL1951080,-1.18,COc1cc(OC)c(cc1NC(=O)CSCC(=O)O)S(=O)(=O)N2C(C)CCc3ccccc23,0.38092482436688185,0.30727873860113125,-0.9246060124080203
CHEMBL1771,3.69,COC(=O)[C@@H](N1CCc2sccc2C1)c3ccccc3Cl,-0.8533559001656995,40.044846957286715,-0.5213287903544065
CHEMBL234951,3.37,OC[C@H](O)CN1C(=O)C(Cc2ccccc12)NC(=O)c3cc4cc(Cl)sc4[nH]3,-0.9740282491988521,29.07852705779708,-0.22642652177388314
CHEMBL565079,3.1,Cc1cccc(C[C@H](NC(=O)c2cc(nn2C)C(C)(C)C)C(=O)NCC#N)c1,-0.9991351502732795,22.197951281441636,0.04158066243329049
CHEMBL317462,3.14,OC1(CN2CCC1CC2)C#Cc3ccc(cc3)c4ccccc4,-0.9999987317275395,23.103866858722185,0.0015926529164868282

It seems that some results are coming out.

in conclusion

Sympy seems to be able to give more complicated formulas such as conditional branching, so I would like to write about it again.

Recommended Posts

Free from hard-coding functions with SymPy
Use C ++ functions from python with pybind11
Prevent Heroku (free tier) from sleeping with Django
Overlay graphs with sympy
With Sympy, don't worry
Curry arbitrary functions with Python ....
[Python] Solve equations with sympy
Getting Started with Python Functions
Parallel processing with local functions
Equation of motion with sympy
Insert from pd.DataFrame with psycopg2
Study from Python Hour3: Functions
With skype, notify with skype from python!
How to connect to Cloud Firestore from Google Cloud Functions with python code
Play with GCP free frame ② ~ Airflow (on Compute Engine), Cloud Functions ~