I tried hundreds of millions of SQLite with python

mongodb has also reached its limit.

Mongodb shortest introduction (1) Install & start on EC2 & suddenly put tens of thousands [Mongodb Shortest Introduction (2) I searched for tens of thousands] (http://qiita.com/tottoko/items/68e61daf6eafb03d4b4b) [Mongodb Shortest Introduction (3) I tried to speed up even millions] (http://qiita.com/tottoko/items/b6c9430b1e0ab7cb409c)

Mr. mongo who is strict from tens of millions of data. Since the wall of 100 million has come into view, SQL Lite has appeared in a surprising place.

[Impact NoSQL KVS Easy DB mongo redis sqlite What was God speed locally? .. .. ] (http://qiita.com/wan-liner/items/ee07442e5bd4afd5b1b5)

If no processing is required ↑ This was the strongest.

Strictly speaking, it is better to keep the file as gz in a CSV reader. It is physically faster to decompress the compression in memory than to read the file from a regular disk. Of course, except in cases where reading files is faster than memory (SSD? Nvme?). So, CSV has columns, so sometimes I want to use placeholders.

To make the SQL easier to read, I would like you to use a placeholder with the column name: column_name1 to escape or automatically sanitize it.

Use the column name as a key in the CSV row list to make a dictionary in one shot:

data=dict(zip(keys,row))

keys is a list of column names Example: ('id','title') row is a CSV row separated by a delimiter Example: 1111 "\ t" aaaa Now data is a dictionary like {id: 1111, title: aaaa}. It's magic.

cursor.execute('insert into table (:id,:title)',data)

You can use placeholders like mysql. When it's not a dictionary?

cursor.execute('insert into table (?,?)',row)

Also, magic conn.text_factory = str

You didn't tell me that, so you got angry! sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings

# coding: utf-8
import sqlite3
import csv
import gzip

conn = sqlite3.connect("honyahonya.db")
conn.text_factory = str
csr = conn.cursor()

keys=(
'id',
'title',
'date',
'article'
)

with gzip.open("honyahonya.csv.gz", 'rU') as f:
    reader = csv.reader(f, delimiter="\t")
    for row in reader:
        #If you are confident in the data, it is faster to omit checking the number of columns
        if len(row)==len(key):
            d=dict(zip(keys,row))
            csr.execute('insert into tracks (id,title,date,article) values(:id,:title,:date,:article)' , d)
            conn.commit()