Non-negative Matrix Factorization (NMF) with scikit-learn

NMF is a dimension reduction method and is said to be able to improve the accuracy of recommendations. You can also easily use NMF with the Opportunity Learning Library scikit-learn. This time, the purpose is to try it with concrete examples and understand NMF intuitively. The article What is Matrix Factorization is easy to understand.

What is NMF

To get a rough idea of Matrix Factorization as a recommendation, it's in English, but the following materials are easy to understand. Matrix Factorization Techniques For Recommender Systems

I will briefly explain the image by quoting the figure on this slide. Suppose you're assuming the Netflix Prize, where the NMF method has become so popular, and you're given a matrix that shows how many points a user and that person gave to which video (the Matrix on the left). If the user rated the video, it was rated from 1 to 5, and if it was not rated, 0 was entered. The fact that 0 is included is the heart of this problem, and we predict the true value of this 0. This Rating Matrix is decomposed into two matrices, "feature x user" and "feature x movie". Roughly speaking, NMF is a method for successful matrix factorization.

Screen Shot 2016-12-23 at 01.21.41.png

NMF formulation

\mathbf{R}:Rating Matrix \mathbf{P}:User Feature Matrix \mathbf{Q}:Movie Feature Matrix When, it is decomposed so that the error between $ \ mathbf {R} $ and $ \ mathbf {P} \ times \ mathbf {Q} ^ T $ is small.

\mathbf{R} \approx \mathbf{P} \times \mathbf{Q}^T = \hat{\mathbf{R}}

The error can be expressed as $ e_ {ij} ^ 2 $.


e_{ij}^2 = (r_{ij} - \hat{r}_{ij})^2 = (r_{ij} - \sum_{k=1}^K{p_{ik}q_{kj}})^2

However, the approximated $ \ hat {\ mathbf {R}} $ elements are as follows:

\hat{r}_{ij} = p_i^T q_j = \sum_{k=1}^k{p_{ik}q_{kj}}
0.5 * ||X - WH||_Fro^2
+ alpha * l1\_ratio * ||vec(W)||_1
+ alpha * l1\_ratio * ||vec(H)||_1
+ 0.5 * alpha * (1 - l1\_ratio) * ||W||_Fro^2
+ 0.5 * alpha * (1 - l1\_ratio) * ||H||_Fro^2

||A||_Fro^2 = \sum_{i,j} A_{ij}^2 (Frobenius norm)
||vec(A)||_1 = \sum_{i,j} abs(A_{ij}) (Elementwise L1 norm)

Implementation with scikit-learn

It's easy to call. For example, the code when the rating matrix is R and the feature dimension is 2 is as follows.

from sklearn.decomposition import NMF
model = NMF(n_components=2, init='random', random_state=0) # n_Specify the feature dimension with components
P = model.fit_transform(R) #Learning
Q = model.components_

Concrete example

Suppose R is a 5x4 matrix like this: The numbers in this matrix are used on this site. I referred to the example.

D1 D2 D3 D4
U1 5 3 - 1
U2 4 - - 1
U3 1 1 - 5
U4 1 - - 4
U5 - 1 5 4
from sklearn.decomposition import NMF
import numpy as np

#The 0 part is unknown
R = np.array([
        [5, 3, 0, 1],
        [4, 0, 0, 1],
        [1, 1, 0, 5],
        [1, 0, 0, 4],
        [0, 1, 5, 4],
        ]
    )

#Try changing the feature dimension k from 1 to 3
for k in range(1,4):
    model = NMF(n_components=k, init='random', random_state=0)
    P = model.fit_transform(R)
    Q = model.components_
    print("****************************")
    print("k:",k)
    print("P is")
    print(P)
    print("Q^T is")
    print(Q)
    print("P×Q^T is")
    print(np.dot(P,Q))
    print("R-P×Q^T is")
    print(model.reconstruction_err_ )

The output result is as follows.

****************************
k: 1
P is
[[ 0.95446553]
 [ 0.64922245]
 [ 1.12694515]
 [ 0.87376625]
 [ 1.18572187]]
Q^T is
[[ 1.96356213  1.0846958   1.24239928  3.24322704]]
P×Q^T is
[[ 1.87415238  1.03530476  1.18582729  3.09554842]
 [ 1.27478862  0.70420887  0.8065935   2.10557581]
 [ 2.21282681  1.22239267  1.40011583  3.65493896]
 [ 1.71569432  0.94777058  1.08556656  2.83382232]
 [ 2.32823857  1.28614754  1.47314     3.84556523]]
R-P×Q^T is
7.511859871919941
****************************
k: 2
P is
[[ 0.          1.69547254]
 [ 0.          1.13044637]
 [ 1.38593123  0.42353044]
 [ 1.08595617  0.3165454 ]
 [ 2.01891156  0.        ]]
Q^T is
[[ 0.          0.32199795  1.40668938  2.58501889]
 [ 3.09992487  1.17556787  0.          0.85823043]]
P×Q^T is
[[ 5.25583751  1.99314304  0.          1.45510614]
 [ 3.50429883  1.32891643  0.          0.97018348]
 [ 1.31291255  0.9441558   1.94957474  3.94614513]
 [ 0.98126695  0.72179626  1.52760301  3.0788861 ]
 [ 0.          0.65008539  2.83998144  5.21892451]]
R-P×Q^T is
4.2765298341191516
****************************
k: 3
P is
[[ 0.08750151  2.03662182  0.52066139]
 [ 0.          1.32090927  0.59992585]
 [ 0.04491053  0.44753619  3.4215759 ]
 [ 0.          0.3221337   2.75625987]
 [ 2.6249395   0.          0.91559966]]
Q^T is
[[ 0.          0.38636917  1.90213535  1.02176621]
 [ 2.62017626  1.02934221  0.          0.08629708]
 [ 0.          0.02388804  0.          1.43875694]]
P×Q^T is
[[ 5.33630814  2.14262626  0.16643971  1.0142658 ]
 [ 3.4610151   1.37399871  0.          0.97713811]
 [ 1.17262369  0.55975468  0.08542591  5.00732522]
 [ 0.84404708  0.39742747  0.          3.99338722]
 [ 0.          1.03606758  4.99299022  3.99939987]]
R-P×Q^T is
1.8626938982246695

The observation matrix R was a 5 × 4 matrix. When you try different k, you can see that P is a 5 × k matrix and Q ^ T is a k × 4 matrix.

Also, in P × Q ^ T, it is estimated that it was 0 in R. For example, when k = 2, the following $ \ hat {\ mathbf {R}} $ is completed. (The part where bold is predicted)

D1 D2 D3 D4
U1 5.25 1.99 0. 1.45
U2 3.50 1.32 0. 0.97
U3 1.31 0.94 1.94 3.94
U4 0.98 0.72 1.52 3.07
U5 0. 0.65 2.83 5.21

It is a difficult problem to decide which value k should be, and it is said that it depends on the input data (R).

In this case, how can we determine which k is better? (If you know, please teach)

Link

Recommended Posts

Non-negative Matrix Factorization (NMF) with scikit-learn
I tried non-negative matrix factorization (NMF) with TensorFlow
Visualization of NMF (Nonnegative Matrix Factorization) Learning
Initial value problem of NMF (Nonnegative Matrix Factorization)
I tried to redo the non-negative matrix factorization (NMF)
Isomap with Scikit-learn
DBSCAN with scikit-learn
Clustering with scikit-learn (1)
Clustering with scikit-learn (2)
PCA with Scikit-learn
kmeans ++ with scikit-learn
Cross Validation with scikit-learn
Multi-class SVM with scikit-learn
Clustering with scikit-learn + DBSCAN
Learn with chemoinformatics scikit-learn
Implemented Matrix Factorization (python)
DBSCAN (clustering) with scikit-learn
Matrix concatenation with Numpy
Install scikit.learn with pip
Calculate tf-idf with scikit-learn
Recommender system using matrix factorization [Unsupervised learning with python Chapter 10]
Neural network with Python (scikit-learn)
Try matrix operation with NumPy
Parallel processing with Parallel of scikit-learn
[Python] Linear regression with scikit-learn
Prime factorization with O (n ^ (1/4))
Robust linear regression with scikit-learn