Somehow this ... It seems like you can play with passes or iterations I'm really scared because I don't know the contents such as parameter adjustment of numerical calculation.
Where the error is
model_lda = LdaModel(corpus=corpus, num_topics=30, id2word=corpus.id2word)
WARNING:gensim.models.ldamodel:too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
Take a look at the source code The problem is the init last run update method
In the update method_Near line 616
if updates_per_pass * passes < 10:
logger.warning("too few updates, training might not converge; consider "
"increasing the number of passes or iterations to improve accuracy")
passes uses the init parameter passes of LdaModel as it is. 1 is assigned by default. updates_per_pass ... Mmm ...
In the update method_Line 607
updates_per_pass = max(1, lencorpus / updateafter)
For lencorpus, the value of len (corpus) is assigned near line 585 of the update method. The point is the number of documents. The number of sentences when this warning is issued is 4019. updateafter...
In the update method_Around line 599
if update_every:
updatetype = "online"
updateafter = min(lencorpus, update_every * self.numworkers * chunksize)
else:
updatetype = "batch"
updateafter = lencorpus
If there is no argument specified for the update method, The same as the init parameter update_every is assigned to update_every. The initial value is 1. If you haven't done anything, the update type will be online. self.numworkers contains 1 if the init parameter distributed remains False.
chunksize is ...
In the update method_595 lines
chunksize = min(lencorpus, self.chunksize)
self.chunksize is the same as the init parameter chunksize. The default is 2000.
In other words ... updateafter = min(4019, 112000) = 2000 updates_per_pass = max(1, 4019 / 2000) ≒ 2 So, the evaluation formula on the left of if is 2 * 1. Out.
Measures ・ Increase passes. In this case, passes = 5 and you will not get angry. -Reduce updateafter = decrease update_every or chunksize. _ In this case, if you change only the chunk size, you will not get angry if you set it to about 400.
I'm tired of this parameter, so I'll look it up on another day.
Recommended Posts