"[Joint multi-modal representations for e-commerce catalog search driven by visual attributes](https://kddfashion2016.mybluemix.net/kddfashion_finalSubmissions/Joint%20multi-modal%]" from IBM Research, India at the KDD2016 workshop. 20representations% 20for% 20e-commerce% 20catalog% 20search% 20driven% 20by% 20visual% 20attributes.pdf) ”is to be implemented by chainer.
The original dissertation method is here. The Implementation of this paper is written in Theano.
The content of the dissertation feels like a rough read, and when text and images are given as a pair, the common space is found using Neural Net (method name is "Correlational Neural Net, abbreviated as Corr Net") and is useful for search engines It is something like that.
I think that it is common to use CCA when seeking a common space of two different modals, but in practice scikit-learn CCA takes time to train as soon as the data size becomes large & it may not be usable due to Memory Error. There is, and this time I tried to implement it with chainer quickly.
The code written in jupyter notebook in python3 is ** here **.
It seems that the point is to train the correlation coefficient so that it becomes large in the common space.
Image diagram
The loss function consists of the loss of restoring both when two modals are given, the loss of restoring both when one modal is given, and the loss that makes the correlation in the hidden layer high. I have.
I wanted to try it easily, so I used MNIST to find a common space for 28x28 images and label information in one-hot-vector format.
I will try different data in the future.
It seems that it can be restored successfully
An image that looks like the middle between 0 and 8 is generated properly.
Total
Detail
Learning is progressing so that the correlation is high in the hidden layer properly!
It seems that you can restore just from the label information!
In any case, chainer is much easier to write than Theano and can output logs, which is convenient ^^
Recommended Posts