We have released a trained model of fastText. You can download the trained model from:
The embedded vector information is summarized in the following repository, so please check it out as well. awesome-embedding-models
Motivation In the following article, I have pasted the link that icoxfog417 published on GitHub.
However, there was a problem that Git LFS was required to download the published vector and the location was difficult to understand. Therefore, this time, I learned and published it so that it can be easily downloaded.
How to make I referred to the following article for how to use fastText. This is a good article that explains the theory and usage of fastText.
The data used for learning is wikipedia 2017/01/01.
Hyperparameters are set as follows. Other hyperparameters use the Default setting.
How to use After downloading the data, you can load it as follows. (For gensim)
import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('model.vec', binary=False)
Related words can be found as follows.
>>> model.most_similar(positive=['Japanese'])
[('Korean', 0.7338133454322815),
('Chinese', 0.717720627784729),
('American', 0.6725355982780457),
('Japanese woman', 0.6723321676254272),
('Foreigner', 0.6420464515686035),
('Filipino', 0.6264426708221436),
('Westerners', 0.621786892414093),
('Asian', 0.6192302703857422),
('Taiwanese', 0.6034690141677856),
('Nikkei', 0.5906497240066528)]
Good NLP Life!
Recommended Posts