This article is a memorandum / memo when reading Preprocessing in Natural Language.
The numbers are also normalized so that 99, 1.235, etc. are set to 0. Certainly, it seems that it has nothing to do with what you want to analyze with natural language processing.
It's not limited to this, but it would be nice to have a code sample. Also, how to choose a stop word is just right for review based on meaning and frequency.
The execution time difference is large. Isn't it more important than accuracy?
Recommended Posts