Classify and extract the relationship between two words with supervised learning. Specifically, the following predictions can be made from data such as (sports, baseball, has-a) (with probability).
Blueberries is-a fruit
Animal has-a guinea pig
Animal has-a Cotton-top tamarin
Sports is-a sports
Climbing is-a sports
Rodeo is-a sports
Animal has-a Eurasian otter
Sports has-a freediving
Horse racing is-a sports
Sports has-a golf
The only relationships we are dealing with this time are has-a and is-a. If you use this, for example, for a person who searched for "sports", even if the word "sports" is not in the article, "baseball" or "soccer" which has a has-a relationship with "sports" You can present an article. Moreover, the score can be calculated in advance.
This time, I will learn the relationship between words with Word2Vec. There is a story that Word2Vec can capture analogies. That famous guy, king --man + woman = queen. This can also be written as king --man = queen --woman. In other words, the difference between the two words represents the relationship, and in this example, the relationship between king and man and the relationship between queen and woman can be regarded as the same.
By the way, I wrote that the difference vector expresses a good relationship by using the vector expression learned in Word2Vec, but that is not always the case. In addition, ** the relationships that the user wants to handle and the relationships learned in Word2Vec do not always match **. Therefore, we will use supervised learning as usual this time. The user can tell the algorithm the relationship he wants to handle through the teacher data. The important thing is that any relationship can be communicated to the algorithm using teacher data.
It is listed along with the code in [Word-to-word relationship classification using word2vec](https://nktmemo.wordpress.com/2015/10/27/Word-to-word relationship classification using word2vec /).
Recommended Posts