This time, let's take a look at the learning process of word2vec using Chainer, a framework that can move machine learning and deep learning.

First of all, in word2vec in the default examples, when epoch (test amount studied) reaches 10 times, we stop learning and search from the data by ourselves and get 5 nearby words as vector quantity. It is set to come.

At this rate, I can't see the progress of word2vec being taught at all, so I will rewrite it a little as an amateur who studied python for this purpose.

Classify with search.py-> my_search.py
Renamed to train_word2vec.py-> my_train_word2vec.py
Rewrite import my_search as S at the beginning of my_train_word2vec.py
Methodize the storage mechanism

def save_to_model(model, index2word, word2index): 
  model.to_cpu() 
  with open(‘model.pickle’, ‘wb’) as f: 
  obj = (model, index2word, word2index) 
  pickle.dump(obj, f) 
  return

In the loop of epoch

save_to_model (model, index2word, word2index) print (S.MySearch.search (“Silicon Valley”)) #Called by class method

Also, this time, as a corpus, I borrowed 15 articles from Shinfu Silicon Valley (Nikkei Sangyo Shimbun) for experiments. I give up.

First of all, the experimental results

1st 10 rotations:

The closest word in Silicon Valley was "D". The cause is unknown, but D was the top from beginning to end. The vector quantities are all between 0.4–0.2 (I guess it's because k is 0.333). It is unavoidable that there is little movement because the amount of learning is small and the corpus is small. .. In epoch9, it becomes "D", "hardship", "IT", "run", "do", and __ Silicon Valley feeling __ smells a little.

==========
epoch: 0
accumulates loss: 1112031.500000
query:Silicon valley
Ｄ: 0.385320752859
Come off: 0.316111475229
Hardship: 0.311353355646
ＩＴ: 0.308985322714
maybe: 0.293527036905
None

==========
epoch: 1
accumulates loss: 982020.395020
query:Silicon valley
Ｄ: 0.380901038647
Come off: 0.319994270802
ＩＴ: 0.315405249596
Hardship: 0.310255050659
maybe: 0.294104635715
None

==========
epoch: 2
accumulates loss: 902829.900146
query:Silicon valley
Ｄ: 0.376115381718
Come off: 0.320046186447
ＩＴ: 0.31905066967
Hardship: 0.311782300472
maybe: 0.296858221292
None

==========
epoch: 3
accumulates loss: 820047.656860
query:Silicon valley
Ｄ: 0.371634662151
ＩＴ: 0.320495575666
Come off: 0.318237453699
Hardship: 0.313952356577
maybe: 0.302201360464
None

==========
epoch: 4
accumulates loss: 681908.571655
query:Silicon valley
Ｄ: 0.368631154299
ＩＴ: 0.320828229189
Come off: 0.316797375679
Hardship: 0.316728383303
maybe: 0.306283533573
None

==========
epoch: 5
accumulates loss: 641440.961914
query:Silicon valley
Ｄ: 0.365578979254
ＩＴ: 0.320439100266
Hardship: 0.3194886446
Come off: 0.315234780312
Run: 0.309817075729
None

==========
epoch: 6
accumulates loss: 586475.438599
query:Silicon valley
Ｄ: 0.363178402185
Hardship: 0.321959197521
ＩＴ: 0.319732785225
Run: 0.315447598696
Come off: 0.313367664814
None

==========
epoch: 7
accumulates loss: 556348.893921
query:Silicon valley
Ｄ: 0.361127972603
Hardship: 0.324909359217
ＩＴ: 0.319623440504
Run: 0.31960016489
To do: 0.318533718586
None

==========
epoch: 8
100000 words, 77.92 sec, 1283.30 words/sec
accumulates loss: 517327.874512
query:Silicon valley
Ｄ: 0.359653770924
Hardship: 0.327609688044
To do: 0.326554596424
Run: 0.321017146111
ＩＴ: 0.318472921848
None

==========
epoch: 9
accumulates loss: 551470.435913
query:Silicon valley
Ｄ: 0.358295291662
To do: 0.334549129009
Hardship: 0.328947871923
Run: 0.324358165264
ＩＴ: 0.31878477335
None

2nd and 3rd time:

The second time is finally "thin profit and high sales", "site", "thinking", "su", "various", the third time I do not write data, but "confinement", "i", "DECODED", "re" As a result of "discrimination". The second time I'm trying to tell you something, but ...

==========
epoch: 0
accumulates loss: 1155921.383301
query:Silicon valley
site: 0.34277588129
Low profit and high sales: 0.338559865952
tool: 0.291590571404
various: 0.288147270679
Thinking: 0.280256956816
None

==========
epoch: 1
accumulates loss: 921329.687744
query:Silicon valley
Low profit and high sales: 0.344960749149
site: 0.34360229969
various: 0.292381823063
tool: 0.289981007576
Thinking: 0.287175774574
None

==========
epoch: 2
accumulates loss: 891724.701904
query:Silicon valley
Low profit and high sales: 0.349293321371
site: 0.343631505966
various: 0.295914918184
Thinking: 0.291843622923
tool: 0.288331329823
None

==========
epoch: 3
accumulates loss: 757185.654785
query:Silicon valley
Low profit and high sales: 0.352725356817
site: 0.344897687435
various: 0.297841370106
Thinking: 0.295309871435
tool: 0.286360681057
None

==========
epoch: 4
accumulates loss: 678935.693481
query:Silicon valley
Low profit and high sales: 0.355262964964
site: 0.347212970257
Thinking: 0.299321830273
various: 0.298689037561
Su: 0.285281300545
None

==========
epoch: 5
accumulates loss: 610247.023926
query:Silicon valley
Low profit and high sales: 0.35762360692
site: 0.348474025726
Thinking: 0.300522983074
various: 0.300092220306
Su: 0.289157003164
None

==========
epoch: 6
accumulates loss: 600056.776855
query:Silicon valley
Low profit and high sales: 0.360702127218
site: 0.350107192993
Thinking: 0.303010463715
various: 0.300860673189
Su: 0.292713105679
None

==========
epoch: 7
accumulates loss: 589747.635376
query:Silicon valley
Low profit and high sales: 0.364328920841
site: 0.351830333471
Thinking: 0.304481714964
various: 0.299699604511
Su: 0.295893192291
None

==========
epoch: 8
100000 words, 77.42 sec, 1291.68 words/sec
accumulates loss: 523010.348755
query:Silicon valley
Low profit and high sales: 0.367006063461
site: 0.353862285614
Thinking: 0.305754393339
Su: 0.299977868795
various: 0.298767507076
None

==========
epoch: 9
accumulates loss: 508688.538574
query:Silicon valley
Low profit and high sales: 0.370497822762
site: 0.355607360601
Thinking: 0.306706368923
Su: 0.303147226572
various: 0.297139495611
None

Conclusion and consideration

This time I tried to rotate the same data 3 times, but 2

1. The words that you first learned and thought were close are the words that are close from beginning to end.

The data that hit from epoch0 remained the same from beginning to end. Speaking of human beings, what you have learned once will certainly give you a stereotype, so is it the same feeling?

2. Reach completely different conclusions three times

Although all the trained contents were the same data, the data that was finally output as close words were completely different. Speaking of human beings, even if they learn (acquire) the same thing, they have different ways of thinking.

From the middle of the process, I found that the lost value is still too large, and that in word2vec, close words are likely to be placed at 0.2–0.4.

It's like deep learning that different results are obtained every time. As with human learning, there is no answer, so I would like you to study more with machines and become smarter.