Ciao ... †
Natural language processing is accompanied by preprocessing. Pre-processing has never been fast. So, I compared the Japanese conversion module in Python3.
Full-width / half-width conversion and conversion from hiragana to katakana. See both long and short target strings.
-jaconv (A module I made. Recently renamed from jctconv) -Mohayonao's code
jaconv | cnvk | mojimoji | zenhan | rfZenHan | mohayonao | nkf | |
---|---|---|---|---|---|---|---|
Short sentences from half-width to full-width | 27.1 µs | 96.4 µs | 5.04 µs | 75.8 µs | 222 µs | 23 µs | |
Long sentences half-width → full-width | 89.9 ms | 38.6 ms | 23.1 ms | 360 ms | 237 ms | 95.4 ms | |
Short sentences in hiragana → katakana | 18.1 µs | 79.1 µs | 25.4 µs | 23.2 µs | |||
Long sentences in hiragana → katakana | 51.6 ms | 41.8 ms | 246 ms | 98.6 ms |
As I use Cython, mojimoji is fast. In Pure Python, jaconv has good performance in short sentences, and cnvk seems to be good in long sentences.
Recommended Posts