Notes for implementing simple collaborative filtering in Python

About this article

Let's write a simple algorithm for collaborative filtering in Python Collaborative filtering is the so-called "people who see this also see this" mechanism.

It's a very simple algorithm, so it's not something you can actually use for anything, but it can help you easily understand how to work with collaborative filtering algorithms.

If you actually write the code in this article, you'll understand that the logic of ** "People who see this also sees this" ** is not that esoteric in concept.

Useful site for studying collaborative filtering

The code used in this article is based on this site. Those who are comfortable reading English may read the original site.

Here are some other sites that are useful for studying the concept of the recommendation system. Coursera's lecture is especially recommended

The basic concept of collaborative filtering

Consider an algorithm that recommends a recommended movie for a user A. At this time, what is done by the algorithm is simplified as follows.

step ① That user and other users**Degree of similarity**To calculate
↓
step② Extract a set of movies that user A has not seen yet from the movies that other users have watched.
↓
step③ Return a list of highly recommended movies from those movies.
In this selection, the more similar the movie is watched by the user, the higher the weight.

Preparation

Installation of required packages

package.py


from math import sort

Data preparation

The data used here contains the movies watched by some movie lovers and the results of their reviews (scores) in a dictionary format.

dataset.py


dataset={
 'Lisa Rose': {
 'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'Superman Returns': 3.5,'You, Me and Dupree': 2.5, 'The Night Listener': 3.0
  },
 'Gene Seymour': {
 'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5, 'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 3.5
  },
 'Michael Phillips': {
 'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0, 'Superman Returns': 3.5, 'The Night Listener': 4.0
  },
 'Claudia Puig': {
 'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'The Night Listener': 4.5, 'Superman Returns': 4.0, 'You, Me and Dupree': 2.5
  },
 'Mick LaSalle': {
 'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 2.0
  },
 'Jack Matthews': {
 'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5
  },
 'Toby': {
 'Snakes on a Plane':4.5, 'You, Me and Dupree':1.0, 'Superman Returns':4.0
  }
}

"Calculation of similarity" between users

In collaborative filtering, "similarity between users" is calculated first. The key to the design of the algorithm here is how to define ** "some users are similar or dissimilar" **.

There are innumerable definitions for this, depending on the designer's wishes. Here, we define it as "the more users give the same movie a similar score" and "the higher the similarity".

In this case, a function that calculates the similarity between users (person1, person2) can be implemented as follows:

similarity.py


def get_similairty(person1, person2):
  
  ##Take a set of movies that both watched
  set_person1 = set(dataset[person1].keys())
  set_person2 = set(dataset[person2].keys())
  set_both = set_person1.intersection(set_person2)
  
  if len(set_both)==0: #If there is no movie seen in common, set the similarity to 0
    return 0
  
  list_destance = []
  
  for item in set_both:
    #Calculate the square of the difference between review points for the same movie
    #The larger this number, the more "I don't like it"=Can be defined as "not similar"
    distance = pow(dataset[person1][item]-dataset[person2][item], 2) 
    list_destance.append(distance)
  
  return 1/(1+sqrt(sum(list_destance))) #Returns an inverse index of the total discomfort of each movie

Here, the following numbers are defined as similarity. Similarity = `` `1 / (1 + sqrt (sum (list_destance))) ``` ... (1)

Note that sum (list_destance) is the square of the distance between users in the review score space. The larger this distance is, the more similar is expressed, so (1) indicates the degree of similar. When the distance is `0, the degree of similarity is 1, and when the distance is extremely large, the degree of similarity approaches 0.

get_similairty('Lisa Rose','Jack Matthews')
0.3405424265831667

Implement the recommendation function

Recommendation design and implementation ideas are written in the comments

recomend.py


def get_recommend(person, top_N):
  
  totals = {} ; simSums = {} #Make a box to put the recommendation score
  
  #Get a list of users other than yourself and turn the For statement
  # ->To calculate the similarity with each person and the recommendation score of the movie from each person (not yet seen by the person)
  list_others = dataset.keys() ; list_others.remove(person)

  for other in list_others:
    #Get a set of movies that he hasn't seen yet
    set_other = set(dataset[other]); set_person = set(dataset[person])
    set_new_movie = set_other.difference(set_person)

    #Calculate the similarity between a user and the person(sim is 0~Number 1)
    sim = get_similairty(person, other)
    
    # (I haven't seen it yet)Turn the For statement in the list of movies
    for item in set_new_movie:

      # "Similarity x review score"Is calculated as a recommendation level score for all users.
      totals.setdefault(item,0)
      totals[item] += dataset[other][item]*sim 

      #Also, save the integrated value of user similarity and divide the above score by this.
      simSums.setdefault(item,0)
      simSums[item] += sim

  rankings = [(total/simSums[item],item) for item,total in totals.items()]
  rankings.sort()
  rankings.reverse()

  return [i[1] for i in rankings][:top_N]

result

get_recommend('Toby',2)

['The Night Listener', 'Lady in the Water']

Recommended Posts

Notes for implementing simple collaborative filtering in Python
User-based collaborative filtering in python
Implementing a simple algorithm in Python 2
Notes on nfc.ContactlessFrontend () for nfcpy in python
Notes for using python (pydev) in eclipse
Simple gRPC in Python
Python code for k-means method in super simple case
Web scraping notes in python3
Search for strings in Python
Python Tkinter notes (for myself)
Techniques for sorting in Python
Try implementing Yubaba in Python 3
Simple regression analysis in Python
Simple HTTP Server for python
Get Evernote notes in Python
Simple IRC client in python
About "for _ in range ():" in python
A simple way to avoid multiple for loops in Python
Check for memory leaks in Python
Check for external commands in python
Simple OAuth 2 in Python (urllib + oauthlib)
Minimum grammar notes for writing Python
Try implementing extension method in python
Personal notes for python image processing
Notes for me python csv graph
Notes for Python file input / output
Run unittests in Python (for beginners)
Run a simple algorithm in Python
Simple gacha logic written in Python
Notes for using OpenCV on Windows10 Python 3.8.3.
Notes using cChardet and python3-chardet in Python 3.3.1.
WEB scraping with Python (for personal notes)
A simple HTTP client implemented in Python
Inject is recommended for DDD in Python
Create a simple GUI app in Python
Tips for dealing with binaries in Python
Summary of various for statements in Python
Type annotations for Python2 in stub files!
Template for writing batch scripts in python
Process multiple lists with for in Python
Write a simple greedy algorithm in Python
Get a token for conoha in python
Sample for handling eml files in Python
AtCoder cheat sheet in python (for myself)
Write a simple Vim Plugin in Python 3
I searched for prime numbers in python
Tips for making small tools in python
Use pathlib in Maya (Python 2.7) for upcoming Python 3.7
Notes on using code formatter in Python
Notes for Python beginners with experience in other languages 12 (+1) items by function
Type notes to Python scripts for running PyTorch model in C ++ with libtorch
Potential Outcomes (Potential Outcomes) Causal Reasoning Notes in Python Part 1
Template for creating command line applications in Python
Set up a simple HTTPS server in Python 3
CERTIFICATE_VERIFY_FAILED in Python 3.6, the official installer for macOS
Studying Mathematics in Python: Solving Simple Probability Problems
++ and-cannot be used for increment / decrement in python
Try implementing two stacks in one array in Python
Import-linter was useful for layered architecture in Python
Personal notes to doc Python code in Sphinx
Add quotation marks ">" for replying emails in Python3