[Ruby] Is that preload really needed? ~ Utilizing lazy loading ~

5 minute read

First look at the code below.

review = Review.preload(:user, :book).find_by(id: review_id)

What do you do when you see code like this? I don’t need to add preload. I will point out.

In this article, I want to explain why this preload is unnecessary.

What is #preload If you add preload, you can get specified related data at the same time. In this case, when you get the review, you also get the related user and book at the same time.

Below is the result of running with irb. When I got the review, I also select the user and book, and you can see that SQL is not issued in the actual use.

irb(main):011:0> review_id = 15
=> 15
irb(main):012:0> review = Review.preload(:user, :book).find_by(id: review_id)
  Review Load (0.8ms) SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`id` = 15 LIMIT 1
  User Load (0.5ms) SELECT `users`.* FROM `users` WHERE `users`.`id` = 1
  Book Load (0.8ms) SELECT `books`.* FROM `books` WHERE `books`.`id` = 1
=> #<Review id: 15, content: "hogehoge", user_id: 1, book_id: 1, status: "draft", created_at: "2020-06-15 14:21:23", updated_at: "2020-06 -15 14:21:23">
irb(main):013:0> review.user
=> #<User id: 1, name: "1234567890", created_at: "2019-12-12 05:43:52", updated_at: "2019-12-12 05:43:52">
irb(main):014:0> review.book
=> #<Book id: 1, title: "book1", created_at: "2020-06-15 14:21:15", updated_at: "2020-06-15 14:21:15">

When should # preload be used? It is mainly used for N+1 measures. N+1 will not be described in detail here, but it is an event in which related data acquisition SQL is issued one by one in a loop as shown below.

irb(main):022:0> Review.all.each do |review|
irb(main):023:1* review.book
irb(main):024:1> end
  Review Load (0.6ms) SELECT `reviews`.* FROM `reviews`
  Book Load (0.3ms) SELECT `books`.* FROM `books` WHERE `books`.`id` = 1 LIMIT 1
  Book Load (0.3ms) SELECT `books`.* FROM `books` WHERE `books`.`id` = 2 LIMIT 1
  Book Load (0.4ms) SELECT `books`.* FROM `books` WHERE `books`.`id` = 3 LIMIT 1
  Book Load (0.3ms) SELECT `books`.* FROM `books` WHERE `books`.`id` = 4 LIMIT 1
  Book Load (2.7ms) SELECT `books`.* FROM `books` WHERE `books`.`id` = 5 LIMIT 1

In the above, since the book is not acquired in advance, SQL is issued one by one at the review.book. It is as follows when prealod is attached and the book is acquired in advance.

irb(main):025:0> Review.all.preload(:book).each do |review|
irb(main):026:1* review.book
irb(main):027:1> end
  Review Load (0.8ms) SELECT `reviews`.* FROM `reviews`
  Book Load (0.7ms) SELECT `books`.* FROM `books` WHERE `books`.`id` IN (1, 2, 3, 4, 5)

You can see that I got the book related to the review I got with Review.all with a single SQL before entering the loop. SQL is not issued during the loop because it can be obtained collectively before the loop. Generally, issuing SQL is a costly process, so performance is improved by executing SQL once. Even in the above example, you can see that there is a difference in performance when looking at the total execution time of SQL.

Why don’t I have to attach this time?

But what about the first example? Since we have only got one review, it is unlikely that it will be N+1 in the loop as before.

I think that preloading means that you may use it at least later. Let’s look at the following example.

get # user
# Preload because we will use user and review later
review = Review.preload(:user, :book).find_by(id: review_id)

Use # user
review.user

# use book
review.book

Since preload is attached, user and book are also acquired when the review is acquired. The execution result is as follows.

irb(main):007:0> review = Review.preload(:user, :book).find_by(id: review_id)
  Review Load (0.8ms) SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`id` = 36 LIMIT 1
  User Load (0.5ms) SELECT `users`.* FROM `users` WHERE `users`.`id` = 1
  Book Load (0.4ms) SELECT `books`.* FROM `books` WHERE `books`.`id` = 1
=> #<Review id: 36, content: "", user_id: 1, book_id: 1, status: "draft", created_at: "2020-06-30 15:20:01", updated_at: "2020-06- 30 15:20:01">
irb(main):008:0> review.user
=> #<User id: 1, name: "1234567890", created_at: "2019-12-12 05:43:52", updated_at: "2019-12-12 05:43:52">
irb(main):009:0> review.book
=> #<Book id: 1, title: "book1", created_at: "2020-06-15 14:21:15", updated_at: "2020-06-15 14:21:15">

But what if you didn’t add preload?

irb(main):010:0> review = Review.find_by(id: review_id)
  Review Load (0.7ms) SELECT `reviews`.* FROM `reviews` WHERE `reviews`.`id` = 36 LIMIT 1
=> #<Review id: 36, content: "", user_id: 1, book_id: 1, status: "draft", created_at: "2020-06-30 15:20:01", updated_at: "2020-06- 30 15:20:01">
irb(main):011:0> review.user
  User Load (0.5ms) SELECT `users`.* FROM `users` WHERE `users`.`id` = 1 LIMIT 1
=> #<User id: 1, name: "1234567890", created_at: "2019-12-12 05:43:52", updated_at: "2019-12-12 05:43:52">
irb(main):012:0> review.book
  Book Load (0.6ms) SELECT `books`.* FROM `books` WHERE `books`.`id` = 1 LIMIT 1
=> #<Book id: 1, title: "book1", created_at: "2020-06-15 14:21:15", updated_at: "2020-06-15 14:21:15">

When you get the review, the user and book are not obtained, and SQL is issued where you are using. However, the number of SQL issued is the same because there is only one review. In this case, the efficiency is the same with or without preload.

But what about the next example?

get # user
# Preload because we will use user and review later
review = Review.preload(:user, :book).find_by(id: review_id)

# Use user under certain conditions
if hoge
  review.user
end

# Use book under certain conditions
if fuga
  review.book
end

Since preload is attached, user and book are also acquired when the review is acquired. If hoge and fuga are true, both user and review are used, so the number of SQLs is the same whether you are preloading or not.

But what about false? For example, if hoge is false, user is not used, so the user acquired by preload will not be used. If fuga is false, book is not used as well.

What happens if you don’t have preload this time?

get # user
review = Review.find_by(id: review_id)

# Use user under certain conditions
if hogereview.user
end

# Use book under certain conditions
if fuga
  review.book
end

When you get a review, you won’t get a user or book. If hoge or fuga is true, the review or book will be acquired at the location where it is used. If false, it will not be retrieved.

In case of this implementation, it can be acquired only when it is used. By the way, this kind of implementation that gets data when needed is called lazy loading.

Did you know which one is more efficient? When preload is applied to one model, the number of SQLs is the same as when no preload is added even if all the models acquired by preload are used. If there is even one pattern that is not used depending on the condition, the number of SQLs will increase.

Note that preloading is meaningless and rather inefficient when fetching only one record, as in the first example.

Finally

If you are new to Rails and somehow know preload or eager_load, you should just attach it for the time being, right? I think many people think that.

There are many reviewers who point out N+1, but I don’t think there are many people who point out useless preload like this one (personal impression).

It’s easy to think that adding preload or eager_load that defeats N+1 isn’t bad, but as there is a pattern that makes it inefficient like this time, those who were not aware are conscious. Let’s do it.