Introduction

I made a ranking site by aggregating the Amazon products linked from the posted articles of the note. There is a Tech Book Rank based on Qiita for the ranking of technical books, but I want to read business books and practical books, so if it is a notebook, that is the case. I thought there was information.

Note Station | Popular business books, practical books, and technical books are updated daily in ranking format

Originally I made it for personal use and used it as a reference for purchasing books and as a study of ranking processing, but I decided to publish it because it was more convenient than I expected.

Since there are many categories of books, it is possible to filter by the category information up to the second layer. When the Amazon category is displayed to the end, it is very fine, so the resolution is rough.

Also, since information other than books comes out, I try to display it as it is without filtering.

What you can't do

Although it is summarized as TODO, I mainly want to improve the ranking area.

In particular, I'm trying to get rid of the Kindle and physics books being treated separately. I already have the information for name identification, but since the score is calculated by MySQL, I am worried that the query will be very heavy if I include the name identification process.

Also, I would like to manage the narrowing of the range of discoveries because the top rankings of months and years are similar.

After that, other books are also introduced in one article, so I would like to display related books together with the ranking display. Even on Amazon, related products have to follow each page one by one, so if you display related products in a small size in the list display, you can experience as if you were looking at a bookshelf at a bookstore.

About ranking

I'm not an expert, so I can't implement difficult theories and processes, but I paid attention to the following.

--Basically, it is calculated based on the number of articles mentioned, the number of likes of articles and the posting date of articles. ――Attenuate what has passed since posting --Attenuation changes depending on the aggregation period --Some of the same users put a link to their (?) Book in each post, so try to keep the score down. --Example: If there are 10 articles, the score is higher when 10 people write than when 1 person writes --Do not use the user's own score (such as the number of followers of the user). No weight is taken into consideration. --Consider information on SNS (Twitter, Hatena Bookmark, etc.) to some extent

You can see a book like this.

List of technologies used

The technology we are using is nothing new. I wanted to make it with SPA, but since there is only one page, I made it classic. I'm also studying, so I'm thinking of making it a SPA soon.

I didn't want to spend as much money as possible, so I use Conoha's VPS instead of AWS or GCP.

Crawler
- Ruby 2.7
- MySQL 8
- RabbitMQ
Web site
- Bootstrap 4
- SQLite
- Nginx + Puma + Sinatra
- Cloudflare
Datadog

However, the server that crawls and aggregates and the server that distributes ranking information are separated. Ranking information is generated once a day, and the website does not have write processing. So, after generating the generated ranking information on MySQL, it is dumped as a SQLite file and transferred to the VPS for the website. (The data before aggregation is a large number of records and a large data size on MySQL, so I do not want to refer to this)

Dumping ranking information to SQLite is done because the number of records is small, but this reduces memory usage and disk usage, and it can be operated with a small VPS. For now, I think it's working reasonably well (response time less than 50ms).

I want you to try it once.

Note Station | Popular business books, practical books, and technical books are updated daily in ranking format

[RUBY] We have created a ranking site for business books and practical books based on information such as notebooks.

Introduction

What you can't do

About ranking

List of technologies used