I decided to make a recommender system at the hackathon, so make a note of it as a memorandum.
Create a system that recommends snacks that match sake. I wanted to make it a machine learning model, but I decided to use collaborative filtering because I lacked knowledge and time.
Making a recommendation system with Python (user-based collaborative filtering) Notes for implementing simple collaborative filtering in Python Construction of recommender system using collaborative filtering
Collaborative filtering (recommendation system) is a word ** "Recommended for you" "People who bought this product also bought this product" ** That's it. You are familiar with amazon and youtube.
It is roughly divided into two methods, user base and item (product) base. User-based emphasis filtering estimates ** target user ratings (unknown) ** from ** other users' known ratings **.
Crosstab, Euclidean distance, Pearson correlation coefficient,
Similarity is defined as ** "high (low) evaluation of the same product has high similarity" **. Calculate similarity using the evaluation value for each user's product
・ The more similar, the smaller the Euclidean distance value. ・ Take the reciprocal so that similar ones get higher scores ・ If the similarity is maximum, the Euclidean distance is 0, so add 1
score= \frac{1}{(1+Euclidean distance)}
・ Score returns a value from 0 to 1, and the larger the score, the higher the similarity.
For a detailed explanation, see User-based collaborative filtering in Building a recommender system using collaborative filtering.
** (1) Extract people with a target liquor rating of 5 from Excel data (sample snack data) (2) Obtain the similarity between the evaluation of target snacks (only known parts) and sample snack data. ③ Predict the evaluation value of the unknown target snack from the similarity **
The data used here is the evaluation of sake and the evaluation of snacks from 1 to 5. Since it is in the testing stage, we have prepared only four data.
The data of unknown users is defined as follows. Find the expected value for other values for this user.
target_data = [-1, -1, 5.0, -1, 4.0, -1, -1, 1.0, -1, -1, -1] #-1 is N ONE
The expected value of the snack evaluation when sake3 is selected is predicted using the evaluation values of snacks 2 and 5.
def findSameSakeList(sheet, userSakeReputation):
SameSakeList = [] #Two-dimensional list of data with a liquor rating of 5 in the image
sampleLen = len(sheet.col_values(0))-1
for i in range(sampleLen):
row = sheet.row_values(i+1)
if row[userSakeReputation] == 5:
SameSakeList.append(row)
else:
pass
return SameSakeList
sake_number = 2
wb = xlrd.open_workbook(r'C:\Users\daisuke\Desktop\voyage\testdata.xlsx')
sheet = wb.sheet_by_index(0)
samePersonList = findSameSakeList(sheet, sake_number) #Applicable
def get_similarities(samePersonList, target_data):
similarities = []
sampleLen = len(samePersonList)
for j in range(sampleLen):#row number of sheet
distance_list = []
for i, value in enumerate(target_data):
if value == -1:
pass
else:
distance = value - samePersonList[j][i]
distance_list.append(pow(distance, 2))
similarities.append([j, 1/(1+np.sqrt(sum(distance_list)))])
return sorted(similarities, key=lambda s: s[1], reverse=True)
Forecast of evaluation value
weighted evaluation value of the corresponding snack of the sample user=Similarity × sample evaluation value
After that, the total evaluation value is taken and normalized.
Normalized score= \frac{Total score for all users of the weighted evaluation value obtained above}{Total similarity of evaluators}
def predict(samePersonList, similarities):#Calculate the predicted evaluation value by multiplying the similarity by the evaluation value for all same persons.
predict_list = []
for index, value in similarities:
samePersonList[index] = [round(i*value,5) for i in samePersonList[index]] #Round decimals with round
np_samePerson = np.array(samePersonList)
np_samePerson = list(np.mean(np_samePerson, axis=0))
for index, value in enumerate(np_samePerson):
predict_list.append([index, value])
return sorted(predict_list, key= lambda s: s[1], reverse=True)
samePersonList = findSameSakeList(sheet, sake_number ) #A list of values with a rating of 5 for the selected liquor
similarities = get_similarities(samePersonList, target_data)
ranking = predict(samePersonList, similarities)
pprint.pprint(ranking)
#Rank and output normalized socre
#[[2, 1.225635],
#[5, 1.100635],
#[3, 0.850635],
#[1, 0.745125],
#[4, 0.725635],
#[8, 0.620125],
#[9, 0.6103799999999999],
#[7, 0.605505],
#[0, 0.48538],
#[6, 0.125],
#[10, 0.0]]
Sake is also given an expected evaluation
Recommended Posts