What to make this time

Let's create the "People who bought this product also bought this product" function that often appears on Amazon.

スクリーンショット 2015-04-23 5.53.32.png

Element technology and implementation method

A function generally called the recommendation function (recommended function). There are two main ways to implement recommendations: "collaborative filtering" and "content-based filtering".

In content-based filtering, for example, when implementing the recommended product of "The Old Man and the Sea (Hemingway)" in the above example on a content basis, the attribute tag is added to the product in advance. For example, if you tag with the attribute of author, the book written by the same Hemingway will be displayed as a recommendation.

Collaborative filtering displays products bought by others who bought this product as recommendations.

This time, we will implement "collaborative filtering".

I use redis and python.

redis is KVS Use Redis SortedSet.

redis installation procedure

MacPorts:http://blog.katsuma.tv/2010/03/start_redis.html HomeBrew:http://qiita.com/items/3d2a2fc683ae19302071

Reasons to use redis

It is not realistic to calculate the recommended products each time from the viewpoint of the amount of calculation, and it was necessary to calculate in advance and ** record it in a form that is easy to take out **. (If you can easily retrieve and record, you can use other than Redis without any problem)

What is Sorted Set?

A list that automatically sorts (on the redis side) when data is entered

スクリーンショット 2015-04-23 4.26.57.png

Implementation of collaborative filtering

It can be implemented if the similarity of each product to product X can be obtained as a value.

スクリーンショット 2015-04-23 4.29.36.png

Similarity formula

There are many, but it is common to use the Jaccard index. In the sample data below, the formula for product A is 1/5. 1 means that one customer has purchased both product X and product A. That is, the intersection 5 is the total number of customers who purchased either product X or product A. That is, the union

スクリーンショット 2015-04-23 4.36.07.png

Sample data used this time

スクリーンショット 2015-04-23 5.23.01.png

Implementation

# -*- coding: utf-8 -*-
from __future__ import absolute_import
from __future__ import unicode_literals


def jaccard(e1, e2):
    """
Calculate the Jackard Index
    :param e1: list of int
    :param e2: list of int
    :rtype: float
    """
    set_e1 = set(e1)
    set_e2 = set(e2)
    return float(len(set_e1 & set_e2)) / float(len(set_e1 | set_e2))


def get_key(k):
    return 'JACCARD:PRODUCT:{}'.format(k)

#Customer ID that purchased product X is 1,3,5
product_x = [1, 3, 5]
product_a = [2, 4, 5]
product_b = [1, 2, 3]
product_c = [2, 3, 4, 7]
product_d = [3]
product_e = [4, 6, 7]

#Product data
products = {
    'X': product_x,
    'A': product_a,
    'B': product_b,
    'C': product_c,
    'D': product_d,
    'E': product_e,
}

# redis
import redis
r = redis.Redis(host='localhost', port=6379, db=10)

#Calculate the Jackard Index and record it in the Redis Sorted Set for each product
for key in products:
    base_customers = products[key]
    for key2 in products:
        if key == key2:
            continue
        target_customers = products[key2]
        #Calculate Jackard Index
        j = jaccard(base_customers, target_customers)
        #Record in Redis Sorted Set
        r.zadd(get_key(key), key2, j)

#Example 1 The person who bought the product X also bought this product.
print r.zrevrange(get_key('X'), 0, 2)
# > ['B', 'D', 'A']

#Example 2 The person who bought the product E also bought this product.
print r.zrevrange(get_key('E'), 0, 2)
# > ['C', 'A', 'X']

Let's see the value in redis

Let's check

Products B, D, and A are recommended for those who bought product X. When checked, the similarity is 0.5, 0.33, and 0.2, respectively, so it seems that they are properly recommended.

スクリーンショット 2015-04-23 5.21.23.png

Problems with this method

As the number of customers and products increases, the amount of calculation explodes and dies

Solution

Let's create an inverted index by Amazon http://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf

I implemented collaborative filtering (recommendation) with redis and python