[RUBY] Do you really need that preload? ~ Utilization of delayed load ~

First look at the code below.

review = Review.preload(:user, :book).find_by(id: review_id)

What do you do when you see code like this? I don't have to add preload. I think I will point out.

In this article, I'd like to explain why this preload is unnecessary.

What is preload

If you add preload, you can get the specified related data at the same time. In the case of this example, when you get the review, you also get the related user and book at the same time.

Below is the result of executing with irb. When you get the review, user and book are also SELECTed, and you can see that SQL is not issued where you actually use it.

irb(main):011:0> review_id = 15
=> 15
irb(main):012:0> review = Review.preload(:user, :book).find_by(id: review_id)
  Review Load (0.8ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`id` = 15 LIMIT 1
  User Load (0.5ms)  SELECT `users`.* FROM `users` WHERE `users`.`id` = 1
  Book Load (0.8ms)  SELECT `books`.* FROM `books` WHERE `books`.`id` = 1
=> #<Review id: 15, content: "hogehoge", user_id: 1, book_id: 1, status: "draft", created_at: "2020-06-15 14:21:23", updated_at: "2020-06-15 14:21:23">
irb(main):013:0> review.user
=> #<User id: 1, name: "1234567890", created_at: "2019-12-12 05:43:52", updated_at: "2019-12-12 05:43:52">
irb(main):014:0> review.book
=> #<Book id: 1, title: "book1", created_at: "2020-06-15 14:21:15", updated_at: "2020-06-15 14:21:15">

When should preload be used?

It is mainly used as a countermeasure for N + 1. N + 1 will not be described in detail here, but it is an event in which the acquisition SQL of related data is issued one by one in a loop etc. as shown below.

irb(main):022:0> Review.all.each do |review|
irb(main):023:1*   review.book
irb(main):024:1> end
  Review Load (0.6ms)  SELECT `reviews`.* FROM `reviews`
  Book Load (0.3ms)  SELECT `books`.* FROM `books` WHERE `books`.`id` = 1 LIMIT 1
  Book Load (0.3ms)  SELECT `books`.* FROM `books` WHERE `books`.`id` = 2 LIMIT 1
  Book Load (0.4ms)  SELECT `books`.* FROM `books` WHERE `books`.`id` = 3 LIMIT 1
  Book Load (0.3ms)  SELECT `books`.* FROM `books` WHERE `books`.`id` = 4 LIMIT 1
  Book Load (2.7ms)  SELECT `books`.* FROM `books` WHERE `books`.`id` = 5 LIMIT 1

In the above, since the book is not acquired in advance, SQL is issued one by one at review.book. If you attach prealod and get the book in advance, it will be as follows.

irb(main):025:0> Review.all.preload(:book).each do |review|
irb(main):026:1*   review.book
irb(main):027:1> end
  Review Load (0.8ms)  SELECT `reviews`.* FROM `reviews`
  Book Load (0.7ms)  SELECT `books`.* FROM `books` WHERE `books`.`id` IN (1, 2, 3, 4, 5)

You can see that the book related to the review that was obtained in Review.all before entering the loop is obtained in one SQL. SQL is not issued during the loop because it can be acquired collectively before the loop. In general, issuing SQL is a costly process, so performing SQL once will improve performance. Even in the above example, if you look at the total execution time of SQL, you can see that there is a difference in performance.

Why don't you add it this time?

But what about the first example? Since we have only obtained one review, it cannot be N + 1 in the loop as before.

I think that preloading means that you may use it at least later. Let's look at the following example.

#Get user
#I will use user and review later, so preload it
review = Review.preload(:user, :book).find_by(id: review_id)

#use user
review.user

#use book
review.book

Since preload is attached, user and book are also acquired when review is acquired. The execution result is as follows.

irb(main):007:0> review = Review.preload(:user, :book).find_by(id: review_id)
  Review Load (0.8ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`id` = 36 LIMIT 1
  User Load (0.5ms)  SELECT `users`.* FROM `users` WHERE `users`.`id` = 1
  Book Load (0.4ms)  SELECT `books`.* FROM `books` WHERE `books`.`id` = 1
=> #<Review id: 36, content: "", user_id: 1, book_id: 1, status: "draft", created_at: "2020-06-30 15:20:01", updated_at: "2020-06-30 15:20:01">
irb(main):008:0> review.user
=> #<User id: 1, name: "1234567890", created_at: "2019-12-12 05:43:52", updated_at: "2019-12-12 05:43:52">
irb(main):009:0> review.book
=> #<Book id: 1, title: "book1", created_at: "2020-06-15 14:21:15", updated_at: "2020-06-15 14:21:15">

So what if you didn't have preload?

irb(main):010:0> review = Review.find_by(id: review_id)
  Review Load (0.7ms)  SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`id` = 36 LIMIT 1
=> #<Review id: 36, content: "", user_id: 1, book_id: 1, status: "draft", created_at: "2020-06-30 15:20:01", updated_at: "2020-06-30 15:20:01">
irb(main):011:0> review.user
  User Load (0.5ms)  SELECT `users`.* FROM `users` WHERE `users`.`id` = 1 LIMIT 1
=> #<User id: 1, name: "1234567890", created_at: "2019-12-12 05:43:52", updated_at: "2019-12-12 05:43:52">
irb(main):012:0> review.book
  Book Load (0.6ms)  SELECT `books`.* FROM `books` WHERE `books`.`id` = 1 LIMIT 1
=> #<Book id: 1, title: "book1", created_at: "2020-06-15 14:21:15", updated_at: "2020-06-15 14:21:15">

When you get the review, you don't get the user and book, the SQL is issued where you are using it. However, since there is only one review, the number of SQL issued is the same. In the case of this example, the efficiency is the same with or without preload.

But what about the following example?

#Get user
#I will use user and review later, so preload it
review = Review.preload(:user, :book).find_by(id: review_id)

#Use user under certain conditions
if hoge
  review.user
end

#Use book under certain conditions
if fuga
  review.book
end

Since preload is attached, user and book are also acquired when review is acquired. If hoge or fuga is true, both user and review are used, so the number of SQL is the same with or without preload.

But what about false? For example, if hoge is false, user is not used, so the user obtained by preload is not used. If fuga is false, book will not be used either.

What if I didn't add preload this time?

#Get user
review = Review.find_by(id: review_id)

#Use user under certain conditions
if hoge
  review.user
end

#Use book under certain conditions
if fuga
  review.book
end

When you get a review, you don't get a user or book. If hoge or fuga is true, review and book will be acquired at the place to be used. If false, it will not be retrieved.

In the case of this implementation, it can be obtained only when using it. By the way, an implementation that retrieves data when needed in this way is called lazy loading.

Did you know which one is more efficient? When preloading one model, the number of SQL is the same as when preload is not attached even if all the models acquired by preload are used. If there is even one pattern that is not used depending on the conditions, the number of SQL will increase.

If you want to get only one item like the first example, it is meaningless to preload it, but it will be inefficient, so be careful.

Finally

If you are new to Rails and somehow know preload and eager_load, you should just add it, right? I think many people think that.

Many reviewers point out N + 1, but I feel that few people point out useless preloads like this one (individual impression).

It is easy to think that there is nothing wrong with adding preload and eager_load that defeat N + 1, but there are patterns that become inefficient like this time, so those who were not aware of it should be aware of it. Let's go.

Recommended Posts

Do you really need that preload? ~ Utilization of delayed load ~
Do you need a memory-aware implementation of Java?
Do you need dotenv-rails?
Convenient use of Map that you do not know unexpectedly