It is said that CSV is applied with Pandas DataFrame, but only full column Insert is applied from CSV to DB? So, I made a function to update (UPDATE) DataFrame to DB.
For the first time before, I started using Pandas DataFrame seriously. Then, I knew that to_csv could suddenly convert the value of DB to CSV, and said, "Oh, awesome!". .. ..
Then, what is the opposite? For example, what if you want to export from DB to CSV, and some external tool edits the CSV value and reflects it again?
Well, if you think about it, what do you think about Primary Key? With that in mind, the common specifications that are made into a library by OSS do not support it. Well, maybe it's a story of honestly applying SQL.
So I made a niche process.
Process to update by giving Pandas DataFrame, primary key, and columns to be updated.
A list type that has one character string type as the primary key and multiple character strings as the update target column group. It feels like creating and committing SQL that specifies the big win record with the character string specified as the primary key and makes the other columns the target column of SET of UPDATE.
Well ... that's it. Well, it's a defeated version, so there are various restrictions.
・ Primary key consists of one column ・ Various columns are fixed (primary key is character, others are float) ・ DB is PostgreSQL ・ Other ...
def get_connection_string():
hostname = localhost
port = '5432'
database = 'db01'
username = 'user01'
password = 'manager'
return f'postgresql://{username}:{password}@{hostname}:{port}/{database}'
def get_cur():
conn = psycopg2.connect(get_connection_string())
cur = conn.cursor()
return conn, cur
#update
# I :Data to be updated
#Primary key column
#Columns to be updated
def update_data(df, column_key, columns):
import pandas as pd
conn, cur = get_cur()
for i in range(len(df)):
paras = list()
col_sqls = list()
for column in columns:
paras.append(float(df.at[i, column]))
col_sqls.append(f' {column} = %s ')
sql = f'''
UPDATE
RDKIT
SET
{','.join(col_sqls)}
WHERE
SampleName = '{df.at[i, 'samplename']}'
'''
print(sql)
print(paras)
cur.execute(sql, paras)
conn.commit()
#Write from CSV to DB
def test05():
import pandas as pd
df = pd.read_csv('abc.csv')
# df = pd.read_csv('edited.csv')
column_key = 'samplename'
columns = ['maxestateindex', 'maxabsestateindex', 'heavyatommolwt', 'exactmolwt', 'qed', 'molwt']
update_data(df, column_key, columns)
#Export from DB to CSV
def test06():
import pandas as pd
sql = f'''
SELECT
SampleName
, MaxEStateIndex, MaxAbsEStateIndex, HeavyAtomMolWt, ExactMolWt, qed, MolWt
FROM
RDKIT
'''
conn, cur = get_cur()
df = pd.read_sql(sql=sql, con=conn)
df.to_csv('exported.csv')
def main():
test05()
# test06()
if __name__ == '__main__':
main()
Hmm ... not good enough. There is a feeling of defeat. Let's think a little more at a later date. .. ..