I wrote an article because there weren't many tutorials implemented using sample data regarding recommendations.
There are methods that use machine learning etc. to create recommendations, but this is an article on how to create recommendations using methods based on statistics.
I will explain using python and open dataset.
What is a recommendation in the first place? ** It is ** recommending products, services, etc. that the customer may be interested in from the provider side **.
Below is an example of ● MAZON, but if you look at the product page of a certain "comforter", ** "comforter cover" ** and ** "mattress" ** are recommended as the recommended products.
"Comforter", "quilt cover" and "mattress" certainly seem to be related, and I think some people buy them together.
This is exactly the aim, and by making recommendations, you can make people recognize the so-called "buying with you" and other products.
Recommendations can be broadly divided into ** "content-based" ** and ** "transaction-based" **.
Each has its advantages and disadvantages, but since it can be used in combination, it is possible to eliminate each other's disadvantages.
Recommendations that utilize association analysis, which is the theme of this article, correspond to ** "transaction-based recommendations" ** of the above types.
And it is also a "transaction-based recommendation" to recommend "comforter cover" and "mattress" for "comforter" in the previous example of ● MAZON.
** "Transaction-based recommendations" basically result in products that are "buy with you". **
Association analysis is to clarify the relevance of product XY, for example, "when product X is bought, product Y is easy to buy at the same time (or next)". This is exactly what you want to do in the recommendation.
Association analysis is a statistical approach, and detailed explanations and theories are very well organized on the here site. Therefore, please check this site for detailed explanations and theories, and in this article, we will explain abstract concepts without theory.
There are two methods for assessing relevance in association analysis. ** 1. Method using Confidence 2. Method using a lift **
By the way, in this article, we will use the method ** using the lift value of ** 2. 2 is based on 1, so I will explain from 1 to the following.
Simply put, it's a way to find a product Y that changes at the same time (or next) when a product X is bought. See the figure below.
First of all, as a small example, let's assume that you have extracted the data of the customer who bought the comforter as described above.
Looking at this, out of the 6 people, after purchasing the comforter, 2 people have a comforter cover, 2 people have mineral water, and 1 person has other products.
Looking at this result alone, ** the duvet cover and mineral water are most relevant to the comforter. ** **
This idea is the idea of ** confidence **.
There is one point to consider in the concept of confidence in 1. See the figure below.
Let's say we've collected a little more data from these six customers.
In the above example, mineral water is a major product in the first place and is frequently purchased regardless of the comforter. In this case, it is wondering that mineral water is highly relevant only to comforters.
However, if mineral water is often bought as a result, you may think that you should recommend mineral water for that product X as well.
I can't say that this is not a good idea, but I personally think about the following.
** With the idea of lift, it is possible to omit such major products and find related products Y that are characteristic of product X. **
Actually, as shown on the here site, the lift value is calculated from the transaction data, and based on that, the highly relevant product XY It will be a flow as you lead the pair.
An article was added to here.
Recommended Posts