Article content

Previously, a colleague told me that it would take time to output the table data to CSV, so I gave some advice.

I haven't measured the time, so I'll leave a note based on a simple verification of how fast it will be.

Premise

-The contents of the table are output as they are. ・ PostgreSQL 12.1

Prepared data

As for the prepared data, we prepared 10 million data that differ by the following ids.

-[ RECORD 1 ]--------
id     | 1
data1  | aiueo
data2  | kakikukeko
data3  | sasshisuseso
data4  | tachitsuteto
data5  | naninuneno
data6  | hahihuheho
data7  | mamimumemo
data8  | yayuyo
data9  | rarirurero
data10 | wawon

Output method 1

Output method 1 is to acquire all the data and convert it to CSV format in the program.

SQL

select * from demo

Output code

`csv_out.py`


import psycopg2
import time

def get_connection():
    return psycopg2.connect("host=localhost port=5432 dbname=sampledb user=postgres password=postgres")
    
path = "csv1.csv"
with get_connection() as conn:
    with conn.cursor() as cur:
        start = time.time()
        cur.execute("select * from demo")
        exec_time = time.time() - start
        print(exec_time)
        start = time.time()
        with open(path, "w") as f:
            i = 1
            for row in cur:
                csv = ''
                for data in row:
                    csv = csv + str(data) + ','
                f.write(csv[:-1] + '\n')
                
                if i % 1000 == 0:
                    f.flush()
                i += 1
        exec_time = time.time() - start
        print(exec_time)

Output method 2

It is a method to convert to SQL format with SQL.

SQL

select 
    id | ',' | 
    data1 | ',' | 
    data2 | ',' | 
    data3 | ',' | 
    data4 | ',' | 
    data5 | ',' | 
    data6 | ',' | 
    data7 | ',' | 
    data8 | ',' | 
    data9 | ',' | 
    data10 as data
from demo

Output code

`csv_out2.py`


import psycopg2
import time
def get_connection():
    return psycopg2.connect("host=localhost port=5432 dbname=sampledb user=postgres password=postgres")
    
path = "csv2.csv"
with get_connection() as conn:
    with conn.cursor() as cur:
        start = time.time()
        cur.execute("select id || ',' || data1 || ',' || data2 || ',' || data3 || ',' || data4 || ',' || data5 || ',' || data6 || ',' || data7 || ',' || data8 || ',' || data9 || ',' || data10 as data from demo")
        exec_time = time.time() - start
        print(exec_time)
        start = time.time()

        with open(path, "w") as f:
            i = 1
            for row in cur:
                for data in row:
                    f.write(data + '\n')
                if i % 1000 == 0:
                    f.flush()
                i += 1
        exec_time = time.time() - start
        print(exec_time)

Output result

The results are as follows.

processing	Output method 1	Output method 2
SQL	11.68s	13.35s
File output	56.95s	15.05s

Output method 2 was able to output faster.

Impressions

Personally, I thought it would be a little faster, but it may be like this because the number of output columns is small. The more columns you output, the more effective it will be.

If you want to connect to a DB server and output CSV, I think that the Export command and COPY TO command can output faster. However, in the maintenance and development of business systems, the process of outputting table data as it is is relatively common, so this method is used in such cases.

A small story that outputs table data in CSV format at high speed