Work memorandum (pymongo) Part 2. Convenient operation (bulk_write)

About this article

This article is a continuation of the recently posted article (https://qiita.com/rsm223_rip/items/141eb146ad610215e5f7). This time I will write about bulk_write of pymongo.


What is bulk_write?

--Instead of creating a query for each write to db and writing it, ** generate a large number of queries and write all at once with the bulk_write function **, and write ** db It is a very convenient operation ** to improve the throughput by doing it all together and reducing the round trip of the network. 。

See also: pymongo 3.9.0 document: Bulk Write Operations


What kind of operation is possible?

You can use the same operations as usual with the following operations.

  1. InsertOne * Multiple inserts are possible with insert_many
  2. ReplaceOne,
  3. UpdateOne, UpdateMany,
  4. DeleteOne, DeleteMany

Method of operation

Basically, all you have to do is create an object for each operation, put it in a list and pass it to the bulk_write function.

Write normally

main.py


from pprint import pprint
from pymongo import MongoClient
from pymongo import UpdateOne,InsertOne
from pymongo.errors import BulkWriteError

client = MongoClient()
db = client["Collection"]["table"]

#Delete all documents in db
# db.delete_many({}) 

# _Increment id and x to insert 1000 documents
opList = [ InsertOne({"_id":i,"x":i}) for i in range(0,1000)]

#When writing is successful, from the return value,
#Details can be obtained from the thrown error when an exception occurs
try:    
    result = db.bulk_write(opList)
    print("At the end of normal")
    pprint(result.bulk_api_result)
except BulkWriteError as bwe:
    print("When an exception occurs")
    pprint(bwe.details)
'''
{'writeErrors': [],
 'writeConcernErrors': [],
 'nInserted': 1000,
 'nUpserted': 0,
 'nMatched': 0,
 'nModified': 0,
 'nRemoved': 0,
 'upserted': []}
'''


When writing without worrying about the order

** Just specify ordered = False in the bulk_write option ** (Even if you are doing an illegal operation, all operations are tried, and you can see the details from the return value or exception.)

--First, after executing the above script, try inserting while incrementing the documents from _id 500 to 1500. (Since they are inserted in order, no one can be inserted and an error should occur.)

main.py


# _Increment id and x to insert 1000 documents
opList = [ InsertOne({"_id":i,"x":i}) for i in range(500,1500)]

#From the return value when writing is successful, from the error thrown when an exception occurs
#Details can be obtained
try:    
    result = db.bulk_write(opList)
    print("At the end of normal")
    pprint(result.bulk_api_result)
except BulkWriteError as bwe:
    print("When an exception occurs")
    pprint(bwe.details)

#Output result
# (You can see that the insertion of the first document failed and the subsequent writing was not possible)
'''
When an exception occurs
{'nInserted': 0,
 'nMatched': 0,
 'nModified': 0,
 'nRemoved': 0,
 'nUpserted': 0,
 'upserted': [],
 'writeConcernErrors': [],
 'writeErrors': [{'code': 11000,
                  'errmsg': 'E11000 duplicate key error collection: '
                            'Collection.table index: _id_ dup key: { _id: 500 '
                            '}',
                  'index': 0,
                  'keyPattern': {'_id': 1},
                  'keyValue': {'_id': 500},
                  'op': {'_id': 500, 'x': 500}}]}

'''

--Next, set ordered = False in the bulk_write option and perform batch writing.

main.py


# _Increment id and x to insert 1000 documents
opList = [ InsertOne({"_id":i,"x":i}) for i in range(500,1500)]

#From the return value when writing is successful, from the error thrown when an exception occurs
#Details can be obtained
try:    
    result = db.bulk_write(opList,ordered=False)
    print("At the end of normal")
    pprint(result.bulk_api_result)
except BulkWriteError as bwe:
    print("When an exception occurs")
    pprint(bwe.details)

#An exception is thrown, but 500 documents can be inserted,
#You can confirm that you can get the reason for each failed write
'''
When an exception occurs
{'nInserted': 500,
 'nMatched': 0,
 'nModified': 0,
 'nRemoved': 0,
 'nUpserted': 0,
 'upserted': [],
 'writeConcernErrors': [],
 'writeErrors': [{'code': 11000,
                  'errmsg': 'E11000 duplicate key error collection: '
                            'Collection.table index: _id_ dup key: { _id: 500 '
                            '}',
                  'index': 0,
                  'keyPattern': {'_id': 1},
                  'keyValue': {'_id': 500},
                  'op': {'_id': 500, 'x': 500}},
                 {'code': 11000,
                  'errmsg': 'E11000 duplicate key error collection: '
                            'Collection.table index: _id_ dup key: { _id: 501 '
                            '}',
                  'index': 1,
                  'keyPattern': {'_id': 1},
                  'keyValue': {'_id': 501},
                  'op': {'_id': 501, 'x': 501}},
                 {'code': 11000,
                  'errmsg': 'E11000 duplicate key error collection: '
                            'Collection.table index: _id_ dup key: { _id: 502 '
                            '}',


          ~~~~The following is omitted~~~~

'''


The bulk_write of pymongo looks like this. If there is a request, I will add various things.

Recommended Posts

Work memorandum (pymongo) Part 2. Convenient operation (bulk_write)
Work memorandum (pymongo) Part 1. Basic operation
Python basic memorandum part 2