Implement recommendations in Python

Since the latter half of this year, the frequency of plating updates has decreased, but this is the first update in a long time.

The article in Last time is more than 3 months old, and the article with a lot of stock is displayed as more than 1 year ago, Qiita notation ( ≒ Markdown) I'm forgetting, it's miserable ...

This time, I'm going to write collaborative filtering (recommendation) in Python.

That said, there is a article written in Ruby in January of this year, so please refer to it as well.

raw data

Use the same data as Previous article.

We asked a person who had lunch at the company cafeteria of a company A to give a perfect score of 5.0 for the impressions of the past 10 days.

The scores for each menu were as follows. The menu that I have never eaten is-.

name	curry	ramen	Fried rice	sushi	beef bowl	Udon
Mr. Yamada	2.5	3.5	3.0	3.5	2.5	3.0
Tanaka	3.0	3.5	1.5	5.0	3.0	3.5
Mr Sato	2.5	3.0	--	3.5	--	4.0
Mr. Nakamura	--	3.5	3.0	4.0	2.5	4.5
Mr. Kawamura	3.0	4.0	2.0	3.0	2.0	3.0
Suzuki	3.0	4.0	--	5.0	3.5	3.0
Mr. Shimobayashi	--	4.5	--	4.0	1.0	--

Everyone has different taste preferences, and even with the same menu, it seems that the score is high or low depending on the person.

Creation of original data

First, prepare the data in a form that can be handled by Python and call it recommendation_data.py.

dataset = {
    'Yamada': {'curry': 2.5,
           'ramen': 3.5,
           'Fried rice': 3.0,
           'sushi': 3.5,
           'beef bowl': 2.5,
           'Udon': 3.0},
    'Tanaka': {'curry': 3.0,
           'ramen': 3.5,
           'Fried rice': 1.5,
           'sushi': 5.0,
           'Udon': 3.0,
           'beef bowl': 3.5},
    'Sato': {'curry': 2.5,
           'ramen': 3.0,
           'sushi': 3.5,
           'Udon': 4.0},
    'Nakamura': {'ramen': 3.5,
           'Fried rice': 3.0,
           'Udon': 4.5,
           'sushi': 4.0,
           'beef bowl': 2.5},
    'Kawamura': {'curry': 3.0,
           'ramen': 4.0,
           'Fried rice': 2.0,
           'sushi': 3.0,
           'Udon': 3.0,
           'beef bowl': 2.0},
    'Suzuki': {'curry': 3.0,
           'ramen': 4.0,
           'Udon': 3.0,
           'sushi': 5.0,
           'beef bowl': 3.5},
    'Shimobayashi': {'ramen': 4.5,
           'beef bowl': 1.0,
           'sushi': 4.0}}

Data handling

Let's read the data from recommendation_data.py above and display it in Python.

from recommendation_data import dataset
from math import sqrt

print(("Evaluation of Mr. Yamada's curry: {}".format(
    dataset['Yamada']['curry'])))
print(("Evaluation of Mr. Yamada's udon: {}\n".format(
    dataset['Yamada']['Udon'])))
print(("Evaluation of Mr. Sato's curry: {}".format(
    dataset['Sato']['curry'])))
print(("Evaluation of Mr. Sato's udon: {}\n".format(
    dataset['Sato']['Udon'])))

print("Suzuki's rating: {}\n".format((dataset['Suzuki'])))

#=>Evaluation of Mr. Yamada's curry: 2.5
#=>Evaluation of Mr. Yamada's udon: 3.0

#=>Evaluation of Mr. Sato's curry: 2.5
#=>Evaluation of Mr. Sato's udon: 4.0

#=>Suzuki's rating: {'sushi': 5.0, 'Udon': 3.0, 'curry': 3.0, 'beef bowl': 3.5, 'ramen': 4.0}

Implementation of collaborative filtering

There are various similarity measures. Below is the code to find the Euclidean distance.

def similarity_score(person1, person2):
    #The return value is the Euclidean distance between person1 and person2

    both_viewed = {}  #Get items common to both

    for item in dataset[person1]:
        if item in dataset[person2]:
            both_viewed[item] = 1

    #Returns 0 if you don't have a common item
    if len(both_viewed) == 0:
        return 0

    #Euclidean distance calculation
    sum_of_eclidean_distance = []

    for item in dataset[person1]:
        if item in dataset[person2]:
            sum_of_eclidean_distance.append(
                pow(dataset[person1][item] - dataset[person2][item], 2))
    total_of_eclidean_distance = sum(sum_of_eclidean_distance)

    return 1 / (1 + sqrt(total_of_eclidean_distance))

print("Similarity between Mr. Yamada and Mr. Suzuki(Euclidean distance)",
      similarity_score('Yamada', 'Suzuki'))

#=>Similarity between Mr. Yamada and Mr. Suzuki(Euclidean distance) 0.3405424265831667

Below is the code to find the Pearson correlation coefficient. It is often said that better results than Euclidean distance can be obtained in situations where the data is not normalized.

def pearson_correlation(person1, person2):

    #Get both items
    both_rated = {}
    for item in dataset[person1]:
        if item in dataset[person2]:
            both_rated[item] = 1

    number_of_ratings = len(both_rated)

    #Checks for common items, returns 0 if not
    if number_of_ratings == 0:
        return 0

    #Add all preferences for each user
    person1_preferences_sum = sum(
        [dataset[person1][item] for item in both_rated])
    person2_preferences_sum = sum(
        [dataset[person2][item] for item in both_rated])

    #Calculate the square of each user's preferred value
    person1_square_preferences_sum = sum(
        [pow(dataset[person1][item], 2) for item in both_rated])
    person2_square_preferences_sum = sum(
        [pow(dataset[person2][item], 2) for item in both_rated])

    #Calculate and total the ratings between users for each item
    product_sum_of_both_users = sum(
        [dataset[person1][item] * dataset[person2][item] for item in both_rated])

    #Pearson score calculation
    numerator_value = product_sum_of_both_users - \
        (person1_preferences_sum * person2_preferences_sum / number_of_ratings)
    denominator_value = sqrt((person1_square_preferences_sum - pow(person1_preferences_sum, 2) / number_of_ratings) * (
        person2_square_preferences_sum - pow(person2_preferences_sum, 2) / number_of_ratings))
    if denominator_value == 0:
        return 0
    else:
        r = numerator_value / denominator_value
        return r

print("Similarity between Mr. Yamada and Mr. Tanaka(Pearson correlation coefficient)",
      (pearson_correlation('Yamada', 'Tanaka')))

#=>Similarity between Mr. Yamada and Mr. Tanaka(Pearson correlation coefficient) 0.39605901719066977

Calculate similarity

We are looking for the top 3 people who have similar food preferences to Mr. Yamada.

def most_similar_users(person, number_of_users):
    #Returns similar users and their similarity
    scores = [(pearson_correlation(person, other_person), other_person)
              for other_person in dataset if other_person != person]

    #Sort so that the person with the highest similarity comes first
    scores.sort()
    scores.reverse()
    return scores[0:number_of_users]

print("Best 3 people like Mr. Yamada",
      most_similar_users('Yamada', 3))

#=>Best 3 people like Mr. Yamada[(0.9912407071619299, 'Shimobayashi'), (0.7470178808339965, 'Suzuki'), (0.5940885257860044, 'Kawamura')]

Find a recommended menu

Finally, I would like to recommend the recommended menu to Mr. Shimobayashi.

def user_reommendations(person):

    #Seek recommendations from other users' weighted average rankings
    totals = {}
    simSums = {}
    for other in dataset:
        #Don't compare myself
        if other == person:
            continue
        sim = pearson_correlation(person, other)

        #Ignore scores below zero
        if sim <= 0:
            continue
        for item in dataset[other]:

            #Scores for items you don't have yet
            if item not in dataset[person] or dataset[person][item] == 0:

                # Similrity *Score
                totals.setdefault(item, 0)
                totals[item] += dataset[other][item] * sim
                #Sum of similarity
                simSums.setdefault(item, 0)
                simSums[item] += sim

        #Create a normalized list

    rankings = [(total / simSums[item], item)
                for item, total in list(totals.items())]
    rankings.sort()
    rankings.reverse()
    #Return recommended items
    recommendataions_list = [
        recommend_item for score, recommend_item in rankings]
    return recommendataions_list

print("Recommended menu for Mr. Shimobayashi",
      user_reommendations('Shimobayashi'))
#=>Recommended menu for Mr. Shimobayashi['Udon', 'curry', 'Fried rice']

at the end

The source code for this article is here.

Collaborative filtering can be broadly divided into item-based and user-based methods. For a commentary including specific code, read Chapter 2 of "collective intelligence programming". This article follows this.

If you want to know systematically about the algorithms of recommender systems, the following papers written by Toshihiro Kamishima Journal of the Japanese Society for Artificial Intelligence are personal. It is easy to understand and recommended.

Toshihiro Kamishima:Recommender system algorithms(1),Journal of the Japanese Society for Artificial Intelligence, vol.22, no.6, pp.826-837, 2007.
Toshihiro Kamishima:Recommender system algorithms(2),Journal of the Japanese Society for Artificial Intelligence, vol.23, no.1, pp.89-103, 2008.
Toshihiro Kamishima:Recommender system algorithms(3),Journal of the Japanese Society for Artificial Intelligence, vol.23, no.2, pp.248-263, 2008.

If you don't have an environment to read papers, you should refer to the Explanatory material of recommender system on the author's site because it has almost the same contents. ..

In writing this article, I refer to the above paper, collective intelligence programming, and the following articles. collaborative filtering recommendation engine implementation in python http://dataaspirant.com/2015/05/25/collaborative-filtering-recommendation-engine-implementation-in-python/