This is Miyano from estie CTO (@estie_mynfire).
Since estie constructs real estate data from various resources, pandas is used for data shaping. for that reason ** Data created with pandas-> DataBase ** It is important to carry out the flow smoothly.
Until now, there were only replace and append, so I had to do my best on the pandas side to mold and then update. (This is very annoying)
Meanwhile, the other day, pandas had the long-awaited Pull request for adding Sql Upsert function, so I tried to move it. This should explode your work efficiency ...! !!
It means to do Insert and Update. There are two main functions of Upsert in Sql.
Therefore, Python engineers have a desire to update tables with confidence ** directly from pandas.
'''
Attributes:
df (pd.DataFrame): any DataFrame
tablename (str):table name
'''
import pandas as pd
from sqlalchemy import create_engine
con = create_engine('mysql+mysqlconnector://[user]:[pass]@[host]:[port]/[schema]')
# upsert_keep ->Basically do nothing(?)There is a possibility of a bug, so I will look at it in the future.
df.to_sql(tablename, con, if_exist='upsert_keep', index=False)
# upsert_overwrite ->Update those that exist and Insert if they do not exist. As intended
df.to_sql(tablename, con, if_exist='upsert_overwrite', index=False)
Env From here, I will write a concrete story. The environment is as follows.
Setting
Environment
git clone https://github.com/V0RT3X4/pandas.git
cd pandas
git checkout -b sql-upsert
git pull origin sql-upsert
git rev-parse HEAD #Current commit hash value
# d0eb251075883902280cba6cd0dd9a1a1c4a69a4
Installation from sources
pip install cython
# Successfully installed cython-0.29.14
python setup.py build_ext --inplace --force #It takes a lot of time. Let's wait patiently.
Mysql server
Set up a mysql server and create a users table for local testing.
users table (** id column is the primary key **)
id | name | age |
---|---|---|
0 | taro | 20 |
1 | jiro | 17 |
2 | saburo | 18 |
mysql.server start --skip-grant-tables #Start a mysql server that allows you to log in without a password
mysql #Login
mysql> CREATE DATABASE testdb;
mysql> USE testdb;
mysql> CREATE TABLE users (
id INT NOT NULL,
name VARCHAR(256),
age INT
);
mysql> ALTER TABLE users ADD PRIMARY KEY (id);
mysql> INSERT INTO users (id, name, age) VALUES
(0, 'taro', 20),
(1, 'jiro', 19),
(2, 'saburo', 18);
From here, I will try Sql Upsert using pandas and sqlalchemy.
Db connect
First of all, from the connection with DB. You will need sqlalchemy, so if you don't have it, please do pip install sqlalchemy
from sqlalchemy import create_engine
import pandas as pd
con = create_engine('mysql+mysqlconnector://@localhost/testdb')
# format: 'mysql+mysqlconnector://[user]:[pass]@[host]:[port]/[schema]'
users = pd.read_sql('SELECT * FROM users;', con)
users
# id name age
#0 0 taro 20
#1 1 jiro 19
#2 2 saburo 18
Upsert
First, rewrite the DataFrame
users.loc[users.id==0, 'name'] = 'syo'
users = pd.concat([users, pd.DataFrame({'id': [3], 'name': ['shiro'], 'age': [28]})])
# id name age
#0 0 syo 20
#1 1 jiro 19
#2 2 saburo 18
#0 3 shiro 28
The basic syntax of the to_sql method is
df.to_sql(tablename, con, if_exist, index=False)
# df: pd.DataFrame
# tablename:table name
# if_exist: {'fail', 'replace', 'append', 'upsert_overwrite', 'upsert_keep'}
# * fail: Raise a ValueError.
# * replace: Drop the table before inserting new values.
# * append: Insert new values to the existing table.
# * upsert_overwrite: Overwrite matches in database with incoming data.
# * upsert_keep: Keep matches in database instead of incoming data.
(Check pandas / core / generic.py for other arguments!)
upsert_keep
3 shiro 28
, but it wasn't.users.to_sql('users', con, if_exists='upsert_keep', index=False)
pd.read_sql('SELECT * FROM users;', con) #Verification
# id name age
#0 0 taro 20
#1 1 jiro 19
#2 2 saburo 18
upsert_overwrite
users.to_sql('users', con, if_exists='upsert_overwrite', index=False)
pd.read_sql('SELECT * FROM users;', con) #Verification
# id name age
#0 0 syo 20
#1 1 jiro 19
#2 2 saburo 18
#3 3 shiro 28
It seems necessary to look around the _upsert_keep_processing method of pandas / io / sql.py. As soon as the cause is known, we will update the pull request and article.
At estie, we are always looking for engineers who are enthusiastic about new technologies and full-stack engineers! https://www.wantedly.com/companies/company_6314859/projects
estie -> https://www.estie.jp estiepro -> https://pro.estie.jp Company site-> https://www.estie.co.jp
Recommended Posts