Let's move word2vec with Chainer and see the learning progress

This time, let's take a look at the learning process of word2vec using Chainer, a framework that can move machine learning and deep learning.

First of all, in word2vec in the default examples, when epoch (test amount studied) reaches 10 times, we stop learning and search from the data by ourselves and get 5 nearby words as vector quantity. It is set to come.

At this rate, I can't see the progress of word2vec being taught at all, so I will rewrite it a little as an amateur who studied python for this purpose.

  1. Classify with search.py-> my_search.py
  2. Renamed to train_word2vec.py-> my_train_word2vec.py
  3. Rewrite import my_search as S at the beginning of my_train_word2vec.py
  4. Methodize the storage mechanism
def save_to_model(model, index2word, word2index): 
  model.to_cpu() 
  with open(‘model.pickle’, ‘wb’) as f: 
  obj = (model, index2word, word2index) 
  pickle.dump(obj, f) 
  return
  1. In the loop of epoch

save_to_model (model, index2word, word2index) print (S.MySearch.search (“Silicon Valley”)) #Called by class method

Also, this time, as a corpus, I borrowed 15 articles from Shinfu Silicon Valley (Nikkei Sangyo Shimbun) for experiments. I give up.

First of all, the experimental results

1st 10 rotations:

The closest word in Silicon Valley was "D". The cause is unknown, but D was the top from beginning to end. The vector quantities are all between 0.4–0.2 (I guess it's because k is 0.333). It is unavoidable that there is little movement because the amount of learning is small and the corpus is small. .. In epoch9, it becomes "D", "hardship", "IT", "run", "do", and __ Silicon Valley feeling __ smells a little.

==========
epoch: 0
accumulates loss: 1112031.500000
query:Silicon valley
D: 0.385320752859
Come off: 0.316111475229
Hardship: 0.311353355646
IT: 0.308985322714
maybe: 0.293527036905
None

==========
epoch: 1
accumulates loss: 982020.395020
query:Silicon valley
D: 0.380901038647
Come off: 0.319994270802
IT: 0.315405249596
Hardship: 0.310255050659
maybe: 0.294104635715
None

==========
epoch: 2
accumulates loss: 902829.900146
query:Silicon valley
D: 0.376115381718
Come off: 0.320046186447
IT: 0.31905066967
Hardship: 0.311782300472
maybe: 0.296858221292
None

==========
epoch: 3
accumulates loss: 820047.656860
query:Silicon valley
D: 0.371634662151
IT: 0.320495575666
Come off: 0.318237453699
Hardship: 0.313952356577
maybe: 0.302201360464
None

==========
epoch: 4
accumulates loss: 681908.571655
query:Silicon valley
D: 0.368631154299
IT: 0.320828229189
Come off: 0.316797375679
Hardship: 0.316728383303
maybe: 0.306283533573
None

==========
epoch: 5
accumulates loss: 641440.961914
query:Silicon valley
D: 0.365578979254
IT: 0.320439100266
Hardship: 0.3194886446
Come off: 0.315234780312
Run: 0.309817075729
None

==========
epoch: 6
accumulates loss: 586475.438599
query:Silicon valley
D: 0.363178402185
Hardship: 0.321959197521
IT: 0.319732785225
Run: 0.315447598696
Come off: 0.313367664814
None

==========
epoch: 7
accumulates loss: 556348.893921
query:Silicon valley
D: 0.361127972603
Hardship: 0.324909359217
IT: 0.319623440504
Run: 0.31960016489
To do: 0.318533718586
None

==========
epoch: 8
100000 words, 77.92 sec, 1283.30 words/sec
accumulates loss: 517327.874512
query:Silicon valley
D: 0.359653770924
Hardship: 0.327609688044
To do: 0.326554596424
Run: 0.321017146111
IT: 0.318472921848
None

==========
epoch: 9
accumulates loss: 551470.435913
query:Silicon valley
D: 0.358295291662
To do: 0.334549129009
Hardship: 0.328947871923
Run: 0.324358165264
IT: 0.31878477335
None

2nd and 3rd time:

The second time is finally "thin profit and high sales", "site", "thinking", "su", "various", the third time I do not write data, but "confinement", "i", "DECODED", "re" As a result of "discrimination". The second time I'm trying to tell you something, but ...

==========
epoch: 0
accumulates loss: 1155921.383301
query:Silicon valley
site: 0.34277588129
Low profit and high sales: 0.338559865952
tool: 0.291590571404
various: 0.288147270679
Thinking: 0.280256956816
None

==========
epoch: 1
accumulates loss: 921329.687744
query:Silicon valley
Low profit and high sales: 0.344960749149
site: 0.34360229969
various: 0.292381823063
tool: 0.289981007576
Thinking: 0.287175774574
None

==========
epoch: 2
accumulates loss: 891724.701904
query:Silicon valley
Low profit and high sales: 0.349293321371
site: 0.343631505966
various: 0.295914918184
Thinking: 0.291843622923
tool: 0.288331329823
None

==========
epoch: 3
accumulates loss: 757185.654785
query:Silicon valley
Low profit and high sales: 0.352725356817
site: 0.344897687435
various: 0.297841370106
Thinking: 0.295309871435
tool: 0.286360681057
None

==========
epoch: 4
accumulates loss: 678935.693481
query:Silicon valley
Low profit and high sales: 0.355262964964
site: 0.347212970257
Thinking: 0.299321830273
various: 0.298689037561
Su: 0.285281300545
None

==========
epoch: 5
accumulates loss: 610247.023926
query:Silicon valley
Low profit and high sales: 0.35762360692
site: 0.348474025726
Thinking: 0.300522983074
various: 0.300092220306
Su: 0.289157003164
None

==========
epoch: 6
accumulates loss: 600056.776855
query:Silicon valley
Low profit and high sales: 0.360702127218
site: 0.350107192993
Thinking: 0.303010463715
various: 0.300860673189
Su: 0.292713105679
None

==========
epoch: 7
accumulates loss: 589747.635376
query:Silicon valley
Low profit and high sales: 0.364328920841
site: 0.351830333471
Thinking: 0.304481714964
various: 0.299699604511
Su: 0.295893192291
None

==========
epoch: 8
100000 words, 77.42 sec, 1291.68 words/sec
accumulates loss: 523010.348755
query:Silicon valley
Low profit and high sales: 0.367006063461
site: 0.353862285614
Thinking: 0.305754393339
Su: 0.299977868795
various: 0.298767507076
None

==========
epoch: 9
accumulates loss: 508688.538574
query:Silicon valley
Low profit and high sales: 0.370497822762
site: 0.355607360601
Thinking: 0.306706368923
Su: 0.303147226572
various: 0.297139495611
None

Conclusion and consideration

This time I tried to rotate the same data 3 times, but 2

1. The words that you first learned and thought were close are the words that are close from beginning to end.

The data that hit from epoch0 remained the same from beginning to end. Speaking of human beings, what you have learned once will certainly give you a stereotype, so is it the same feeling?

2. Reach completely different conclusions three times

Although all the trained contents were the same data, the data that was finally output as close words were completely different. Speaking of human beings, even if they learn (acquire) the same thing, they have different ways of thinking.

From the middle of the process, I found that the lost value is still too large, and that in word2vec, close words are likely to be placed at 0.2–0.4.

It's like deep learning that different results are obtained every time. As with human learning, there is no answer, so I would like you to study more with machines and become smarter.

Recommended Posts

Let's move word2vec with Chainer and see the learning progress
See the behavior of drunkenness with reinforcement learning
"Learning word2vec" and "Visualization with Tensorboard" on Colaboratory
Let's transpose the matrix with numpy and multiply the matrices.
Now, let's try face recognition with Chainer (learning phase)
See the power of speeding up with NumPy and SciPy
Recognize your boss and hide the screen with Deep Learning
How is the progress? Let's get on with the boom ?? in Python
Let's move Cython and Numba easily
HTTPS with Django and Let's Encrypt
[Chainer] Learning XOR with multi-layer perceptron
Explore the maze with reinforcement learning
Try Common Representation Learning with chainer
Validate the learning model with Pylearn2
[Introduction to machine learning] Until you run the sample code with chainer
I vectorized the chord of the song with word2vec and visualized it with t-SNE
I tried to learn the angle from sin and cos with chainer
Make a DNN-CRF with Chainer and recognize the chord progression of music
Feature engineering for machine learning starting with the 4th Google Colaboratory --Interaction features
Easy learning of 100 language processing knock 2020 with "Google Colaboratory"
A memo when executing the deep learning sample code created from scratch with Google Colaboratory
Let's move word2vec with Chainer and see the learning progress
See through pie kneading conversion with Chainer
Let's tune the model hyperparameters with scikit-learn!
Let's solve the portfolio with continuous optimization
Try with Chainer Deep Q Learning --Launch
Let's try gRPC with Go and Docker
Let's read the RINEX file with Python ①
Let's play with Python Receive and save / display the text of the input form
Let's print PDF with python using foxit reader and specify the printer silently!
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Let's visualize the relationship between average salary and industry with XBRL data and seaborn! (7/10)
Python learning notes for machine learning with Chainer Chapters 11 and 12 Introduction to Pandas Matplotlib