PostgreSQL-For those who want to INSERT at high speed

Let's improve the performance of mass inserts

For mass inserts into PostgresQL Let's use the utility for inserting multiple records together. Alternatively, review the commit execution frequency to reduce unnecessary processing costs.

Actually measure with multiple patterns and compare the performance. In psycopg2, one of the libraries that access PostgreSQL with Python As a utility for inserting multiple items at once Use a function called execute_values. Also, change the commit execution unit and change the commit execution interval for actual measurement.

Preparation for confirmation

Test table

Test DDL


CREATE TABLE test01
(key serial PRIMARY KEY,
 text_val text);

Test program

For simplicity, we will allow you to enter the number of executions and commits interactively. Include the part that creates the parameters so that they can be compared in the same range of time as much as possible. Initialize the values of TRUNCATE and SEQUENCE objects to reset the table each time before execution.

execute_values defaults to 100 lines. If you enter more than that, you must pass more than the number of inputs to the argument page_size.

Test program


import psycopg2
from psycopg2.extras import execute_values
from time import time

_text = 'test' * 300

conn = psycopg2.connect('dbname=dbname host=localhost user=username password=password')
cur = conn.cursor()

#Table reset
reset_query_tbl = "TRUNCATE TABLE test01;"
cur.execute(reset_query_tbl)
reset_query_seq = "SELECT setval ('test01_key_seq', 1, false);"
cur.execute(reset_query_seq)
count_query = "SELECT count(key) FROM test01;"
cur.execute(count_query)
(count,) = cur.fetchone()
conn.commit()
print('■ State before execution')
print('Number of data: ', count)
print()

#Test settings
print('■ INSERT method')
print('Please select the INSERT method from the following(0/1/2/3)')
print(' 0:INSERT one by one&commit.')
print(' 1:INSERT one by one&Multiple commit at once.')
print(' 2:INSERT one by one&Commit all cases at once.')
print(' 3:Collectively INSERT&commit.')
test_type = int(input('INSERT method>'))
test_cnt  = int(input('Number of INSERTs>'))
if test_type == 1:
    commit_cnt  = int(input('COMMIT interval>'))

#Test execution & result
print()
print('■ Execution result')
if test_type==0:

    query = "INSERT INTO test01 (text_val) VALUES ('%s');" % _text
    st = time()
    for _ in range(test_cnt):
        cur.execute(query)
        conn.commit()

    elapsed_time = time() - st
    print('elapsed time(s): ', elapsed_time)

if test_type==1:
    query = "INSERT INTO test01 (text_val) VALUES ('%s');" % _text
    st = time()
    for i in range(test_cnt):
        cur.execute(query)
        if (i + 1) % commit_cnt == 0:
            conn.commit()

    conn.commit()

    elapsed_time = time() - st
    print('elapsed time(s): ', elapsed_time)

if test_type==2:

    query = "INSERT INTO test01 (text_val) VALUES ('%s');" % _text
    st = time()
    for _ in range(test_cnt):
        cur.execute(query)

    conn.commit()

    elapsed_time = time() - st
    print('elapsed time(s): ', elapsed_time)

if test_type==3:

    query = "INSERT INTO test01 (text_val) VALUES %s;"
    st = time()

    params = []
    for _ in range(test_cnt):
        params += [(_text,)]

    ret = execute_values(cur, query, params, page_size=test_cnt)
    conn.commit()
    elapsed_time = time() - st
    print('elapsed time(s): ', elapsed_time)

cur.execute(count_query)
(count,) = cur.fetchone()
print('Number of data: ', count)

cur.close()
conn.close()

Performance results

The results are for reference only. Of course, it depends on the environment and machine specifications.

--Windows10 64bit / number of logical processors 4 / memory 8GB

INSERT method Number of INSERTs COMMIT count processing time(S)
INSERT one by one(execute) &COMMIT one by one. 100,000 100,000 40.9
INSERT one by one(execute) &10 COMMITs each. 100,000 10,000 25.4
INSERT one by one(execute) &100 COMMITs at a time. 100,000 1,000 24.1
INSERT one by one(execute) & 1,000 COMMITs each. 100,000 100 27.2
INSERT one by one(execute) & 10,000 each COMMIT. 100,000 10 25.9
INSERT one by one(execute) &COMMIT all together. 100,000 1 24.4
Collectively INSERT(execute_values) &COMMIT all together. 100,000 1 8.8

Conclusion

execute_values, fast. Use it if you can.

Recommended Posts

PostgreSQL-For those who want to INSERT at high speed
How to create large files at high speed
Anxible points for those who want to introduce Ansible
For those who want to write Python with vim
For those who want to start machine learning with TensorFlow2
Reference reference for those who want to code in Rhinoceros / Grasshopper
PyPI registration steps for those who want to make a PyPI debut
[Python] How to get divisors of natural numbers at high speed
Loose articles for those who want to start natural language processing
Python techniques for those who want to get rid of beginners
I analyzed Airbnb data for those who want to stay in Amsterdam
Join Azure Using Go ~ For those who want to start and know Azure with Go ~
For those who want to learn Excel VBA and get started with Python
5 Reasons Processing is Useful for Those Who Want to Get Started with Python
[Flask + Keras] How to infer multiple models at high speed on the server
I want to INSERT a DataFrame into MSSQL
The first step of machine learning ~ For those who want to implement with python ~
Environment construction for those who want to study python easily with VS Code (for Mac)
Tutorial to infer the model learned in Tensorflow with C ++/OpenVINO at high speed