Currently, as a portfolio creation, we are developing something like an article posting service, and as one of the functions, we created a function to display rankings such as day / week / month on the screen (hereinafter referred to as ranking display function). There is. The contents of the table design + α that was done to create the ranking display function are described in this article.
I wrote this article for the purpose of documenting the thinking process, thinking that it may be for myself in the future who will have the opportunity to develop similar functions in the future, or for those who will create similar functions in the future. ing.
As a premise, the ranking display function shall display the ranking of 5 items of daily / weekly / monthly / general / trend. Each ranking is determined as follows.
type | Evaluation criteria |
---|---|
Daily ranking | Total number of likes per day |
Weekly ranking | 1 week(7 days)Total of "likes" |
Monthly ranking | 1 month(30 days)Total of "likes" |
Overall ranking | Total number of likes for the entire period |
Trend ranking | Results of evaluation based on daily "likes" & daily "accesses" |
Since it is considered that it is not necessary to always display the latest status of the ranking, it is possible to carry out the aggregation once in a certain period.
In this chapter, we will describe the thinking process of table design for the ranking display function. Hereinafter, the "number of likes" and "number of accesses" are collectively referred to as "count data".
If you care about performance from the beginning, your thinking will be complicated, so don't think about it. If everything is simple, it's better than that, so I first wondered if I could make the table simple.
What I thought about was that the following are candidates for the table that are related to count data from the function overview, but is it really necessary for each count data for one week / one month / whole period? ・ Daily count data ・ Weekly count data ・ Monthly count data ・ Count data for the entire period
The minimum unit of data required to aggregate the count data for one week / one month / whole period is the daily count data. Therefore, it is possible for the program to obtain the count data for each period as long as the count data for each day is recorded. Based on this idea, the function itself can be realized with the following table.
Figure 1. Article table
The data in the table in Fig. 1 is divided into the following two highly independent use cases.
Data types | Data handling phase | Renewal opportunity |
---|---|---|
(Count data) | Show ranking | Press the like button/When accessing the article page |
Other than those above | Display the article page | Article creation/When editing |
In the case of the table in FIG. 1, data other than the count data cannot be updated together during the update of the count data. Therefore, by dividing the table in FIG. 1 as follows, both can be updated at the same time.
Figure 2. Table after splitting count data
After that, we will proceed with the examination of performance based on the table in Fig. 2. The table in Figure 2 still has performance problems (not to mention). Considering that there are two major problems, we considered a design to solve each problem.
** (Problem 1) Ranking aggregation every time the page is displayed = Ranking aggregation calculation time is large ** ** (Problem 2) Store daily count data = Performance degradation due to increased number of records **
If you have the count data for each day, you can calculate the count for one week / one month / whole period, but if you turn it over, the count data for each period will be aggregated every time the user accesses the page where the ranking is displayed. Must.
In order to solve this problem, you can keep the aggregated result of the count data for 1 day / 1 week / 1 month / all period in another table, and when displaying the ranking, you only need to refer to that table. I thought it was good (aggregation is performed at regular intervals by periodic execution processing).
Figure 3. After adding the category data aggregation table to the table in Figure 2.
By doing this, it is not necessary to aggregate each time, and the rating table adds / updates daily count data, and the aggreagate_points table refers to the count data (when displaying the ranking). Can be done.
In the table of FIG. 1, since the count data for each day is held for each article, the number of records increases exponentially as the number of posted articles and the number of working days increases. As a result, the capacity of the DB is reduced and the data search speed is reduced. Therefore, we considered two measures to reduce the number of records.
** (Measures 1) Up to 30 days' worth of records per article ** The count data for the entire period can be kept up to date by continuing to add the daily count data, so there is no problem even if the count data for the past 30 days is reduced (the count data for the entire period is as shown in Fig. 2). Hold another table).
** (Measures 2) Collect data for the past 30 days into one record ** Even if the data held in the table in Fig. 1 is reduced in the past 30 days, it cannot be said that the number of records has decreased because the number of records = the total number of posted articles x 30 (if the number of posted articles is 10,000, the number of records is 300,000). In order to obtain the total count data for one month, it is necessary to extract 30 data of the article with the following SQL each time, so the more data there is, the longer it takes to search.
SELECT * FROM rating WHERE articl_id=[Id of the article];
In that case, I thought that if the data for 30 days from the beginning was made into a set of data, the trouble of extracting 30 data could be saved and the number of records = the total number of posted articles could be suppressed.
The count data of the day is updated frequently, but the other past count data only adds the latest data / deletes the oldest data. Therefore, if the past 30 days are put together in a format that allows these operations to be easily performed on the program side, there is no problem even if they are saved as one record data.
Some DBs can store JSON or XML data (Postgresql used this time can store both), and JSON or XML is easier to handle on the program side, so 30 days worth of data is JSON or XML. I will store it in one record in the format. There is a concern that performance will drop if the data format is JSON or XML, but the timing of making changes to the data is insignificant from the overall performance because it is performed at regular intervals by periodic execution processing.
Figure 4. After applying the record number reduction measures to the table in Figure 2.
-Example when the count data for the past 30 days is converted to JSON format
{
rating_info: [
{
"favorite_count":"12"
"date":"2019-04-17"
},
{
"favorite_count":"15"
"date":"2019-04-16"
},
:(Abbreviation)
}
-Example of converting the count data for the past 30 days into XML format
<rating_info>
<day_rating>
<favorite_count>12</favorite_count>
<date>2019-04-17</date>
</day_rating>
<day_rating>
<favorite_count>15</favorite_count>
<date>2019-04-16</date>
</day_rating>
</rating_info>
Combining the results considered in Section 3.2 gives the following table.
Figure 5. Final table
As a result, it was possible to separate the tables to be accessed by user operations as follows, so if you create an index for each table using id as a key, you can maintain performance even if there is a lot of data stored in the table. ..
table | role | Table reference opportunity | Table update opportunity | Reference frequency | Update frequency |
---|---|---|---|---|---|
daily_rating | Counting of the day | Periodic execution processing | When displaying the article page/Press the like button | Once a day (*1) |
High (*2) |
history_rating | History storage for the past 30 days | Periodic execution processing | Periodic execution processing | Once a day (*1) |
Once a day (*1) |
aggregate_points | weekly/Monthly/Comprehensive count data retention | When the ranking page is displayed | Periodic execution processing | High | Once a day (*1) |
(* 1) When performing periodic execution processing once every 24 hours (* 2) Addition / deletion of records depends on addition / deletion of articles
I'm sorry to mention the programming elements. This time, Java + Spring Framework is used, but since Spring Framework has a function that can execute tasks regularly, we will use this.
@Scheduled(cron = "0 0 0 * * *", zone = "Asia/Tokyo")
public void updateAggregateData() {
//...
}
From left to right, the cron options include seconds (0-59), minutes (0-59), hours (0-23), days (1-31), months (1-12), days of the week (0: days, 1: 1). Mon, 2: Tue, 3: Wed, 4: Thu, 5: Fri, 6: Sat, 7: Sun), so if you use the above source, it can be executed regularly at 0:00:00 every day. It becomes.
Spell it appropriately with the last crushed text.
As I was writing, I thought that the count data such as 1 week / 1 month at the beginning is not necessary because it is obtained from the data for each day. I thought that I removed it, but considering the performance, I thought that it was necessary after all, and I flipped my hand and created a plunge element. The lesson is that counting data such as 1 week / 1 month should be put in the table for the time being so that there is no loss.
The content I wrote this time was also written for the personal purpose of properly documenting what the overwhelmingly sensual contributor thought in his mind with his senses and images. I checked to see if I made a terrible mistake, but it took me less than 24 hours to write this much, so I reconfirmed myself that I was not good at writing documents (every week for 3 years when I was a student). I used to create 6-8 pages of this article-sized material in Word, but I reconfirmed that it's been too long since the heyday and I don't have the feeling of that time). Output is an issue.
It's no exaggeration to say that you are a beginner in table design and DB, so I think there are many things you can learn by reading documents such as DB design thorough guidance and SQL anti-patterns. So, gain practical skills through the development of the article posting service that was the source of this article, and the development of the service with a large number of tables / columns planned next, and sometimes output learning here as well. I'm thinking.
that's all
(1) Split a table with a large number of columns in a 1: 1 relationship using a primary key
(2) How to periodically execute tasks with Spring Boot
(3) WWW SQL Designer * Used to create ER diagram
Recommended Posts