Hello everyone! I'm @hiroki_tanaka, the producer of Mayu Sakuma.
I'm involved in maintaining Rails applications, and the other day I discovered that there was a lot of unwanted data in my production environment. That's why I considered how to delete a large amount of data in Rails, so I summarized what I investigated.
First, Rails has two data deletion methods, destroy and delete. I would like to briefly summarize the differences between them.
Deletes one record specified via ActiveRecord.
Callback methods (such as
before_destroy and ʻafter_destroy
) and validation work through ActiveRecord. Also, if there is a related Model for which dependent:: destroy` is set for the Model to be deleted, the set Model is also deleted.
destroy will only return false and will not return an exception if an error occurs during execution and it cannot be deleted.
In contrast, destroy! Returns an exception.
Therefore, if you want to explicitly catch the error when deleting, it is better to use destroy !.
Issue SQL (DELETE statement) directly to DB without going through ActiveRecord to delete the target record.
The callback method and validation don't work because it doesn't go through ActiveRecord.
Also, even if there is a Model associated with
dependent:: destroy in the Model to be deleted, it will not be deleted.
The behavior at the time of failure is the same as destroy, it only returns false and does not return an exception.
Since delete! Does not exist in delete, I think it is better to use destroy! Obediently when "I want to delete data and return an error if it fails".
Only one record can be deleted with destroy / destroy !, but history_all can specify multiple records and deletes all the specified records.
Like destroy, history_all also goes through ActiveRecord, so the callback method and validation ·
dependent:: destroy work.
Like destroy, destory_all causes an error at runtime, and if the deletion process fails in the middle, it only returns false and does not return an exception.
However, there is no method called destory_all !, so if you want to delete a large amount of data, but if it fails, you need to return an error properly. (The method will be described later.)
Deletes the specified multiple records without going through ActiveRecord.
Like delete, the callback method and validation ·
dependent:: destroy do not work.
Like delete, delete_all also causes an error at runtime, and if the delete process fails in the middle, it only returns false and does not return an exception.
Personally, I don't think there are many situations where it is used, but the processing without ActiveRecord is faster than destroy and story_all.
Therefore, I think that it can be used in situations where you want to delete a large amount of data at once without worrying about exceptions, callbacks, and related items.
I would like to see how to delete a large amount of data in the main subject.
dependent:: destroyassociation Model: Yes
Models to be deleted this time include callbacks and Models associated with
As a requirement, the related Model also needs to be deleted, and I want to explicitly catch if there is an error during processing.
Delete / delete_all / destroy_all is not available at this point.
Then, you need to perform the deletion process behind the scenes where the production application is running.
So, I don't want to take a method that puts an extreme load on the DB.
animals = Animal.where(type: 'dog') #Extraction of data to be deleted animals.each do |animal| animal.destroy! end
The simplest way to think about it is to have code like this. However, there are two problems.
――The load is heavy because destroy continues to run 100,000 times. --If an error occurs during the deletion process and the process fails, the data deleted up to that point will remain deleted without rolling back. (The redo does not work.)
Therefore, if there is spare capacity in the DB, if the application has a clear closing time, and if there is no problem even if the deletion does not roll back if it fails in the middle, this method is fine if data integrity is not a problem. I think.
animals = Animal.where(type: 'dog') #Extraction of data to be deleted ActiveRecord::Base.transaction do animals.each do |animal| animal.destroy! end end
The deletion process of method (1) is one transaction. If an error occurs in the middle of the deletion process by making it one transaction, all the deletion processes will be rolled back and redo will be effective. Therefore, if you use this method, you can safely delete all the data that requires data integrity.
One caveat is that you must always use destroy! When you explicitly create a transaction. This is because destroy does not raise an exception and just returns false even if an error occurs, so the process does not stop and the transaction cannot be exited.
animals = Animal.where(type: 'dog') #Extraction of data to be deleted ActiveRecord::Base.transaction do animals.in_batches.each do |delete_target_animals| delete_target_animals.map(&:destroy!) sleep(0.1) end end
The above method is the method adopted this time.
Use ActiveRecord :: Relation # in_batches method to combine 100,000 records in 1000 units. And destroy! For each of the chunks.
Then, when the deletion process for every 1000 items is completed, the process is stopped for 0.1 seconds with
sleep (0.1) to reduce the load on the DB.
Also, since a large transaction is placed on the outside, even if an error occurs during the deletion process, everything will be rolled back and redoing will be effective, so it is safe.
animals = Animal.where(type: 'dog') #Extraction of data to be deleted ActiveRecord::Base.transaction do animals.in_batches(of: 10000).each do |delete_target_animals| delete_target_animals.map(&:destroy!) sleep(0.1) end end
I think there are various ways to delete large amounts of data, depending on your requirements. Therefore, if you have any best practices, I would love to hear from you (o._.) O Peco