What is this article?

It is said that CSV is applied with Pandas DataFrame, but only full column Insert is applied from CSV to DB? So, I made a function to update (UPDATE) DataFrame to DB.

For the first time before, I started using Pandas DataFrame seriously. Then, I knew that to_csv could suddenly convert the value of DB to CSV, and said, "Oh, awesome!". .. ..

Then, what is the opposite? For example, what if you want to export from DB to CSV, and some external tool edits the CSV value and reflects it again?

Well, if you think about it, what do you think about Primary Key? With that in mind, the common specifications that are made into a library by OSS do not support it. Well, maybe it's a story of honestly applying SQL.

So I made a niche process.

In other words, what did you make?

Process to update by giving Pandas DataFrame, primary key, and columns to be updated.

A list type that has one character string type as the primary key and multiple character strings as the update target column group. It feels like creating and committing SQL that specifies the big win record with the character string specified as the primary key and makes the other columns the target column of SET of UPDATE.

Well ... that's it. Well, it's a defeated version, so there are various restrictions.

Restrictions

・ Primary key consists of one column ・ Various columns are fixed (primary key is character, others are float) ・ DB is PostgreSQL ・ Other ...

Completion



def get_connection_string():
    hostname = localhost
    port = '5432'
    database = 'db01'
    username = 'user01'
    password = 'manager'
    return f'postgresql://{username}:{password}@{hostname}:{port}/{database}'


def get_cur():
    conn = psycopg2.connect(get_connection_string())
    cur = conn.cursor()
    return conn, cur
    
#update
# I :Data to be updated
#Primary key column
#Columns to be updated
def update_data(df, column_key, columns):
    import pandas as pd
    conn, cur = get_cur()

    for i in range(len(df)):
        paras = list()
        col_sqls = list()
        for column in columns:
            paras.append(float(df.at[i, column]))
            col_sqls.append(f' {column} = %s ')
        sql = f'''
            UPDATE
                    RDKIT
            SET
                    {','.join(col_sqls)}
            WHERE
                    SampleName = '{df.at[i, 'samplename']}'
        '''
        print(sql)
        print(paras)
        cur.execute(sql, paras)

    conn.commit()


#Write from CSV to DB
def test05():
    import pandas as pd
    df = pd.read_csv('abc.csv')
    # df = pd.read_csv('edited.csv')

    column_key = 'samplename'
    columns = ['maxestateindex', 'maxabsestateindex', 'heavyatommolwt', 'exactmolwt', 'qed', 'molwt']
    update_data(df, column_key, columns)


#Export from DB to CSV
def test06():
    import pandas as pd

    sql = f'''
        SELECT
                SampleName
              , MaxEStateIndex, MaxAbsEStateIndex, HeavyAtomMolWt, ExactMolWt, qed, MolWt
        FROM
                RDKIT
    '''
    conn, cur = get_cur()
    df = pd.read_sql(sql=sql, con=conn)
    df.to_csv('exported.csv')


def main():
    test05()
    # test06()


if __name__ == '__main__':
    main()

Impressions

Hmm ... not good enough. There is a feeling of defeat. Let's think a little more at a later date. .. ..

Python hand play (CSV is applied with Pandas DataFrame, but only full-column Insert from CSV to DB?)

What is this article?

In other words, what did you make?

Restrictions

Completion

Impressions