Implement the latent feature model using IBP (Indian buffet process).
The purpose is to implement a latent feature model using IBP. First, I will use a common recommendation example to explain the latent feature model (linear factor model?) Here. First, suppose you have a matrix (X) with people vertically and movie types horizontal. (As for the component, (i, j) is the evaluation of j by i) At this time, the latent feature can be known by decomposing it well with X = UV as shown below. For example, potential features include adventure, mystery, and romance. (It's just an image) At this time, U represents a person's liking for those components, and V represents what kind of component each movie has. (Used to predict missing values in the actual recommendation system) In other situations of natural language processing, we think about the same thing for a matrix consisting of document types and words, and in speech processing, we use the same model in the context of sound source separation.
But how many latent features should I have? With a deterministic model, the number of latent features can be determined well, but with a Bayesian model, it becomes difficult to determine how many latent features there are. Therefore, the motivation is to use an infinite dimensional stochastic process (IBP) to automatically determine the number of latent features. At first glance, it is not easy to understand, but the result of actually predicting Z is shown in the figure below. The left is the true Z, and the right is the Z predicted by gibbs sampling. At first glance, it may seem that something is different, but since the position (column) of the feature is originally an exchangeable model, you can see that it can be predicted after 33 steps.
In this article, I will explain a little more formalized explanation, and then move on to the implementation part. After that, as a theory, I will touch on the beta process, which is closely related to IBP. Finally, I will write about application examples.
You can do this by implementing Dr. Sato's non-parametric book [1] as it is. It's easy to understand, so I recommend you to read it. The code is given here. https://github.com/Ma-sa-ue/practice/tree/master/machine%20learning(python)/nonparabayes
Indian buffet process
As the name suggests, it represents the situation at an Indian restaurant. When I sample it, it looks like the one below.
You can also get the above by generating a matrix consisting of 0s and 1s as follows.
It is sensuously easy and shows how people take their favorite dishes in order. Specifically, people are vertical and food is horizontal. People come in turn and take as many dishes as they are popular, and I get tired of it, so I repeat asking for new dishes. $ \ Alpha $ is a hyperparameter that indicates the degree of diffusion, and you can see that the larger it is, the more various dishes will come out. At this rate, it seems that there is a problem with exchangeability, but it is important that IBP is actually a distribution of the history of the binary matrix of the beta-bernoulli generative model.
When $ x_ {k} $ is a D-dimensional vector
First, consider the following situations. In the above situation, the left is Y, the middle is Z, and the right is X. Note that only Y is observed. The policy is to gibbs sample Z and X along the conditional distribution in order. If you repeat gibbs sampling, it will be as follows. Since the features are interchangeable, you can see that they are well predicted. In reality, it's rarely the case.
I will write about sampling from beta process and the relationship between IBP and beta process. For details, see [4] and [6]. Levy process
The Levy process is a stochastic process with steady independent increments. It is a fairly general stochastic process and includes various stochastic processes. First, Levy-ito decomposition divides the levy process into Brownian motion with drift and jump process. Jump process is the main subject of this time. Jump process includes poisson process, beta process, gamma process in a stochastic process like jumping. First of all, as a major premise, the levy process that jumps monotonously like this corresponds to a random measure. And the levy process can be characterized by the levy measure. It is important when you want to sample the correspondence between levy measure and random measure (levy process), and you can configure random measure (levy process) by poisson process such as levy measure as mean measure.
Poisson process
Consider a simulation from the Poisson process. There are basically two ways to simulate. The first is a method of sampling people one after another assuming that people come in order. The second is a method of deciding in advance how many people will come in order and arranging them according to the intensity function (mean measure). The latter is used when sampling from the Beta process. See [7] and [8] for details.
Beta process
Consider the beta process on $ \ Omega $.
In the Beta process, it is $ B (d \ omega) \ sim Beta (c \ mu (d \ omega), c (1- \ mu (d \ omega))) $.
At this time, the Levy measure corresponding to the Beta process is when $ B_ {0} $ is the basis measure.
Following the above, the algorithm looks like this:
For example, if $ \ mu_ {k} $ is uniform, $ \ Omega = [0,1] $, $ \ int_ {\ Omega} \ mu_ {k} (d \ omega) $ is $ \ gamma $ You can draw the following graph.
Also, if you actually sample $ \ gamma = 10.0 $ several times, the average of $ B (\ Omega) $ will be about 10.
Details are described in [4]. In this paper, the sampling method of beta process is also derived from the relation with ibp.
According to De Finetti's Theorem, the exchangeable observation sequence $ Z_ {1}, ..., Z_ {n} $
IBP can be further generalized by having two parameters for the beta bernoulli distribution in a one-parameter distribution model. It also supports changing c and $ \ gamma $ in the beta process.
It is an application example of the latent feature model by matrix factorization rather than an application example of IBP, but it is listed in various ways. It's a light summary of the review [3] by Griffiths.
Linear factor models such as probabilistic PCA, factor analysis and sparse coding are summarized below. http://qiita.com/masasora/items/431afb71eae60a1c826b
There are already many pages ([10], [11], [12]) that summarize good literature, but I will explain them for the time being. There are generally three ways to recommend a recommender system. The first is to use the basic statistics as they are. For example, recommending a store that sells or a product that sells as it is. The second is collaborative filtering. There are user based and item based. For example, if it is user based, in order to predict the evaluation of a certain product (B) by a certain person (A), it is like taking the evaluation of B of a person who is similar to A and predicting the evaluation of B by A. The third method is matrix factorization. When the most basic method is X = SVD by matrix factorization by SVD, the matrix is decomposed by decomposing X into S and VD as described above. However, in reality, there are missing values and negative evaluations are strange, so there are many ways to further develop this (is it no longer SVD?). Although it depends on the application, the method based on 3 is basically superior when it comes to simple prediction performance. You can also use the decomposed matrix to find out about similar users and similar movies. Returning to the main subject, IBP can be used in 3 methods.
A typical application of voice processing is the problem of sound source separation. Note that the X prior is not gaussian. For example, it has a laplace distribution or a t distribution.
The first application is the decomposition of the matrix (Z) that represents the expression level of genes and cells. Whether or not the gene i is expressed in cell k is $ z_ {ik} $. Another application is to investigate protein interactions. Whether or not the protein $ i $ belongs to complex $ k $ is $ z_ {ik} $. You can find similar proteins using the decomposed matrix. (That is, the larger $ \ sum z_ {ik} z_ {jk} $, the more similar.)
Recommended Posts