I need to use recommendations in my work, and I researched the Python recommendation library, so I summarized it as a memorandum. In addition, since the main focus is on a brief introduction to the library and no explanation of the algorithm, etc., please refer to another document as necessary.
I haven't dealt with the recommendations before, but nowadays I feel that I have to study seriously. ..
crab HP: http://muricoca.github.io/crab/ GitHub: https://github.com/muricoca/crab
This library was the first to be found in Python's collaborative filtering implementation. It is said that item-based and user-based collaborative filtering can be calculated, but since the last update of master on GitHub was 4 years ago, it seems that it has not been used much recently. .. It didn't work well in modern environments due to the dependencies of other libraries.
Presentation materials at the conference http://conference.scipy.org/scipy2011/slides/caraciolo_crab_recommendation.pdf
python-recsys HP: http://ocelma.net/software/python-recsys/build/html/index.html GitHub: https://github.com/ocelma/python-recsys
Singular value decomposition and collaborative filtering using neighborhood algorithms are possible. The calculated model can be saved and reused as a file, and there are many methods for evaluation, so this is the easiest to use unless you are pursuing accuracy.
However, it does not support the method using Nonnegative Matrix Factorization (NMF), which is the mainstream in recent years, so if you want to use it, you should implement it using the following ninfa.
By the way, I also needed to calculate the similarity between items this time, so I chose this one.
nimfa HP: http://nimfa.biolab.si GitHub: https://github.com/marinkaz/nimfa
The method using NMF, which has become popular in recent years, does not seem to exist as a recommendation library, but since the matrix operations that are important in implementation are provided as a library, it can be implemented without much difficulty by using this. Seems to be feasible. The implementation algorithms are quite abundant, and there were more than 10 types of Factorization implementations alone. There is a difference. .. (ry
** References on NMF ** [Matrix Factorization Techniques for Recommender Systems] (http://www2.research.att.com/~volinsky/papers/ieeecomputer.pdf) [Basics of non-negative matrix factorization NMF and its application to data / signal analysis] (http://www.kecl.ntt.co.jp/icl/signal/sawada/mypaper/829-833_9_02.pdf) [Non-negative Matrix Factorization] (http://d.hatena.ne.jp/a_bicky/20100325/1269479839)
Spark + MLlib
MLlib - Collaborative Filtering
Spark + MLlib also has a recommendation implementation, so if you want to distribute the amount of data that cannot be scaled up, you should use this. In MLlib, Matrix Factorization is implemented using a technique called Alternate Least Square, and a Python API is also provided.
** References ** [Spark and Matrix Factorization] (http://stanford.edu/~rezab/slides/reza_codeneuro.pdf)
[Implementation of recommendation system in Dataproc using Spark's MLlib] (http://qiita.com/kndt84/items/b975ac9e6552f5289ec9)
When implementing recommendations in Python, if you want to use it easily, I think that using python-recsys is the quickest way. However, it does not support NMF, which is popular these days, so if you want to use NMF, I think it is better to implement it by using nimfa.
Also, if you want to handle a large amount of data that cannot be scaled up, Spark + MLlib has a recommendation implementation and a Python API is also provided, so I think it is better to use this. This has been verified separately and will be introduced in another article.
Recommended Posts