Overview

I decided to make a recommender system at the hackathon, so make a note of it as a memorandum.

Create a system that recommends snacks that match sake. I wanted to make it a machine learning model, but I decided to use collaborative filtering because I lacked knowledge and time.

reference

Making a recommendation system with Python (user-based collaborative filtering) Notes for implementing simple collaborative filtering in Python Construction of recommender system using collaborative filtering

Collaborative filtering

Collaborative filtering (recommendation system) is a word ** "Recommended for you" "People who bought this product also bought this product" ** That's it. You are familiar with amazon and youtube.

It is roughly divided into two methods, user base and item (product) base. User-based emphasis filtering estimates ** target user ratings (unknown) ** from ** other users' known ratings **.

Crosstab, Euclidean distance, Pearson correlation coefficient,

How to calculate similarity

Similarity is defined as ** "high (low) evaluation of the same product has high similarity" **. Calculate similarity using the evaluation value for each user's product

・ The more similar, the smaller the Euclidean distance value. ・ Take the reciprocal so that similar ones get higher scores ・ If the similarity is maximum, the Euclidean distance is 0, so add 1

score= \frac{1}{(1+Euclidean distance)}

・ Score returns a value from 0 to 1, and the larger the score, the higher the similarity.

For a detailed explanation, see User-based collaborative filtering in Building a recommender system using collaborative filtering.

Implementation flow

** (1) Extract people with a target liquor rating of 5 from Excel data (sample snack data) (2) Obtain the similarity between the evaluation of target snacks (only known parts) and sample snack data. ③ Predict the evaluation value of the unknown target snack from the similarity **

Data preparation

The data used here is the evaluation of sake and the evaluation of snacks from 1 to 5. Since it is in the testing stage, we have prepared only four data.

The data of unknown users is defined as follows. Find the expected value for other values for this user.

target_data = [-1, -1, 5.0, -1, 4.0, -1, -1, 1.0, -1, -1, -1] #-1 is N ONE

The expected value of the snack evaluation when sake3 is selected is predicted using the evaluation values of snacks 2 and 5.

(1) Extract people with a target liquor rating of 5 from Excel data (sample snack data)

def findSameSakeList(sheet, userSakeReputation):
    SameSakeList = [] #Two-dimensional list of data with a liquor rating of 5 in the image
    sampleLen = len(sheet.col_values(0))-1
    for i in range(sampleLen):
        row = sheet.row_values(i+1)
        if row[userSakeReputation] == 5:
            SameSakeList.append(row)
        else:
            pass
    return SameSakeList

sake_number = 2
wb = xlrd.open_workbook(r'C:\Users\daisuke\Desktop\voyage\testdata.xlsx')
sheet = wb.sheet_by_index(0)
samePersonList = findSameSakeList(sheet, sake_number) #Applicable

(2) Obtain the similarity between the evaluation of target snacks (only known parts) and sample snack data.

def get_similarities(samePersonList, target_data):
    similarities = []
    sampleLen = len(samePersonList)

    for j in range(sampleLen):#row number of sheet
        distance_list = []
        for i, value in enumerate(target_data):
            if value == -1:
                pass
            else:
                distance = value - samePersonList[j][i]
                distance_list.append(pow(distance, 2))

        similarities.append([j, 1/(1+np.sqrt(sum(distance_list)))])

    return sorted(similarities, key=lambda s: s[1], reverse=True)

③ Predict the evaluation value of the unknown target snack from the similarity

Forecast of evaluation value

weighted evaluation value of the corresponding snack of the sample user=Similarity × sample evaluation value

After that, the total evaluation value is taken and normalized.

Normalized score= \frac{Total score for all users of the weighted evaluation value obtained above}{Total similarity of evaluators}

def predict(samePersonList, similarities):#Calculate the predicted evaluation value by multiplying the similarity by the evaluation value for all same persons.
    predict_list = []
    for index, value in similarities:
        samePersonList[index] = [round(i*value,5) for i in samePersonList[index]] #Round decimals with round

    np_samePerson = np.array(samePersonList)
    np_samePerson = list(np.mean(np_samePerson, axis=0))

    for index, value in enumerate(np_samePerson):
        predict_list.append([index, value])
    return sorted(predict_list, key= lambda s: s[1], reverse=True)


samePersonList = findSameSakeList(sheet, sake_number    ) #A list of values with a rating of 5 for the selected liquor
similarities = get_similarities(samePersonList, target_data)
ranking = predict(samePersonList, similarities)
pprint.pprint(ranking)

 #Rank and output normalized socre
 #[[2, 1.225635],
 #[5, 1.100635],
 #[3, 0.850635],
 #[1, 0.745125],
 #[4, 0.725635],
 #[8, 0.620125],
 #[9, 0.6103799999999999],
 #[7, 0.605505],
 #[0, 0.48538],
 #[6, 0.125],
 #[10, 0.0]]

Sake is also given an expected evaluation

Make a recommender system with python