Since the latter half of this year, the frequency of plating updates has decreased, but this is the first update in a long time.
The article in Last time is more than 3 months old, and the article with a lot of stock is displayed as more than 1 year ago, Qiita notation ( ≒ Markdown) I'm forgetting, it's miserable ...
This time, I'm going to write collaborative filtering (recommendation) in Python.
That said, there is a article written in Ruby in January of this year, so please refer to it as well.
Use the same data as Previous article.
We asked a person who had lunch at the company cafeteria of a company A to give a perfect score of 5.0 for the impressions of the past 10 days.
The scores for each menu were as follows. The menu that I have never eaten is-.
name | curry | ramen | Fried rice | sushi | beef bowl | Udon |
---|---|---|---|---|---|---|
Mr. Yamada | 2.5 | 3.5 | 3.0 | 3.5 | 2.5 | 3.0 |
Tanaka | 3.0 | 3.5 | 1.5 | 5.0 | 3.0 | 3.5 |
Mr Sato | 2.5 | 3.0 | -- | 3.5 | -- | 4.0 |
Mr. Nakamura | -- | 3.5 | 3.0 | 4.0 | 2.5 | 4.5 |
Mr. Kawamura | 3.0 | 4.0 | 2.0 | 3.0 | 2.0 | 3.0 |
Suzuki | 3.0 | 4.0 | -- | 5.0 | 3.5 | 3.0 |
Mr. Shimobayashi | -- | 4.5 | -- | 4.0 | 1.0 | -- |
Everyone has different taste preferences, and even with the same menu, it seems that the score is high or low depending on the person.
First, prepare the data in a form that can be handled by Python and call it recommendation_data.py.
dataset = {
'Yamada': {'curry': 2.5,
'ramen': 3.5,
'Fried rice': 3.0,
'sushi': 3.5,
'beef bowl': 2.5,
'Udon': 3.0},
'Tanaka': {'curry': 3.0,
'ramen': 3.5,
'Fried rice': 1.5,
'sushi': 5.0,
'Udon': 3.0,
'beef bowl': 3.5},
'Sato': {'curry': 2.5,
'ramen': 3.0,
'sushi': 3.5,
'Udon': 4.0},
'Nakamura': {'ramen': 3.5,
'Fried rice': 3.0,
'Udon': 4.5,
'sushi': 4.0,
'beef bowl': 2.5},
'Kawamura': {'curry': 3.0,
'ramen': 4.0,
'Fried rice': 2.0,
'sushi': 3.0,
'Udon': 3.0,
'beef bowl': 2.0},
'Suzuki': {'curry': 3.0,
'ramen': 4.0,
'Udon': 3.0,
'sushi': 5.0,
'beef bowl': 3.5},
'Shimobayashi': {'ramen': 4.5,
'beef bowl': 1.0,
'sushi': 4.0}}
Let's read the data from recommendation_data.py above and display it in Python.
from recommendation_data import dataset
from math import sqrt
print(("Evaluation of Mr. Yamada's curry: {}".format(
dataset['Yamada']['curry'])))
print(("Evaluation of Mr. Yamada's udon: {}\n".format(
dataset['Yamada']['Udon'])))
print(("Evaluation of Mr. Sato's curry: {}".format(
dataset['Sato']['curry'])))
print(("Evaluation of Mr. Sato's udon: {}\n".format(
dataset['Sato']['Udon'])))
print("Suzuki's rating: {}\n".format((dataset['Suzuki'])))
#=>Evaluation of Mr. Yamada's curry: 2.5
#=>Evaluation of Mr. Yamada's udon: 3.0
#=>Evaluation of Mr. Sato's curry: 2.5
#=>Evaluation of Mr. Sato's udon: 4.0
#=>Suzuki's rating: {'sushi': 5.0, 'Udon': 3.0, 'curry': 3.0, 'beef bowl': 3.5, 'ramen': 4.0}
There are various similarity measures. Below is the code to find the Euclidean distance.
def similarity_score(person1, person2):
#The return value is the Euclidean distance between person1 and person2
both_viewed = {} #Get items common to both
for item in dataset[person1]:
if item in dataset[person2]:
both_viewed[item] = 1
#Returns 0 if you don't have a common item
if len(both_viewed) == 0:
return 0
#Euclidean distance calculation
sum_of_eclidean_distance = []
for item in dataset[person1]:
if item in dataset[person2]:
sum_of_eclidean_distance.append(
pow(dataset[person1][item] - dataset[person2][item], 2))
total_of_eclidean_distance = sum(sum_of_eclidean_distance)
return 1 / (1 + sqrt(total_of_eclidean_distance))
print("Similarity between Mr. Yamada and Mr. Suzuki(Euclidean distance)",
similarity_score('Yamada', 'Suzuki'))
#=>Similarity between Mr. Yamada and Mr. Suzuki(Euclidean distance) 0.3405424265831667
Below is the code to find the Pearson correlation coefficient. It is often said that better results than Euclidean distance can be obtained in situations where the data is not normalized.
def pearson_correlation(person1, person2):
#Get both items
both_rated = {}
for item in dataset[person1]:
if item in dataset[person2]:
both_rated[item] = 1
number_of_ratings = len(both_rated)
#Checks for common items, returns 0 if not
if number_of_ratings == 0:
return 0
#Add all preferences for each user
person1_preferences_sum = sum(
[dataset[person1][item] for item in both_rated])
person2_preferences_sum = sum(
[dataset[person2][item] for item in both_rated])
#Calculate the square of each user's preferred value
person1_square_preferences_sum = sum(
[pow(dataset[person1][item], 2) for item in both_rated])
person2_square_preferences_sum = sum(
[pow(dataset[person2][item], 2) for item in both_rated])
#Calculate and total the ratings between users for each item
product_sum_of_both_users = sum(
[dataset[person1][item] * dataset[person2][item] for item in both_rated])
#Pearson score calculation
numerator_value = product_sum_of_both_users - \
(person1_preferences_sum * person2_preferences_sum / number_of_ratings)
denominator_value = sqrt((person1_square_preferences_sum - pow(person1_preferences_sum, 2) / number_of_ratings) * (
person2_square_preferences_sum - pow(person2_preferences_sum, 2) / number_of_ratings))
if denominator_value == 0:
return 0
else:
r = numerator_value / denominator_value
return r
print("Similarity between Mr. Yamada and Mr. Tanaka(Pearson correlation coefficient)",
(pearson_correlation('Yamada', 'Tanaka')))
#=>Similarity between Mr. Yamada and Mr. Tanaka(Pearson correlation coefficient) 0.39605901719066977
We are looking for the top 3 people who have similar food preferences to Mr. Yamada.
def most_similar_users(person, number_of_users):
#Returns similar users and their similarity
scores = [(pearson_correlation(person, other_person), other_person)
for other_person in dataset if other_person != person]
#Sort so that the person with the highest similarity comes first
scores.sort()
scores.reverse()
return scores[0:number_of_users]
print("Best 3 people like Mr. Yamada",
most_similar_users('Yamada', 3))
#=>Best 3 people like Mr. Yamada[(0.9912407071619299, 'Shimobayashi'), (0.7470178808339965, 'Suzuki'), (0.5940885257860044, 'Kawamura')]
Finally, I would like to recommend the recommended menu to Mr. Shimobayashi.
def user_reommendations(person):
#Seek recommendations from other users' weighted average rankings
totals = {}
simSums = {}
for other in dataset:
#Don't compare myself
if other == person:
continue
sim = pearson_correlation(person, other)
#Ignore scores below zero
if sim <= 0:
continue
for item in dataset[other]:
#Scores for items you don't have yet
if item not in dataset[person] or dataset[person][item] == 0:
# Similrity *Score
totals.setdefault(item, 0)
totals[item] += dataset[other][item] * sim
#Sum of similarity
simSums.setdefault(item, 0)
simSums[item] += sim
#Create a normalized list
rankings = [(total / simSums[item], item)
for item, total in list(totals.items())]
rankings.sort()
rankings.reverse()
#Return recommended items
recommendataions_list = [
recommend_item for score, recommend_item in rankings]
return recommendataions_list
print("Recommended menu for Mr. Shimobayashi",
user_reommendations('Shimobayashi'))
#=>Recommended menu for Mr. Shimobayashi['Udon', 'curry', 'Fried rice']
The source code for this article is here.
Collaborative filtering can be broadly divided into item-based and user-based methods. For a commentary including specific code, read Chapter 2 of "collective intelligence programming". This article follows this.
If you want to know systematically about the algorithms of recommender systems, the following papers written by Toshihiro Kamishima Journal of the Japanese Society for Artificial Intelligence are personal. It is easy to understand and recommended.
Toshihiro Kamishima:Recommender system algorithms(1),Journal of the Japanese Society for Artificial Intelligence, vol.22, no.6, pp.826-837, 2007.
Toshihiro Kamishima:Recommender system algorithms(2),Journal of the Japanese Society for Artificial Intelligence, vol.23, no.1, pp.89-103, 2008.
Toshihiro Kamishima:Recommender system algorithms(3),Journal of the Japanese Society for Artificial Intelligence, vol.23, no.2, pp.248-263, 2008.
If you don't have an environment to read papers, you should refer to the Explanatory material of recommender system on the author's site because it has almost the same contents. ..
In writing this article, I refer to the above paper, collective intelligence programming, and the following articles. collaborative filtering recommendation engine implementation in python http://dataaspirant.com/2015/05/25/collaborative-filtering-recommendation-engine-implementation-in-python/
Recommended Posts