As expected, Java-made kuromoji is compatible with Maven without depending on the OS, so it is very easy to install and use.
The title of this post includes Win10 and Eclipse, but kuromoji does not depend on either.
Maven
This is the only setup. Model data will also be downloaded.
<!-- https://mvnrepository.com/artifact/com.atilika.kuromoji/kuromoji -->
<dependency>
<groupId>com.atilika.kuromoji</groupId>
<artifactId>kuromoji</artifactId>
<version>0.9.0</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>com.atilika.kuromoji</groupId>
<artifactId>kuromoji-ipadic</artifactId>
<version>0.9.0</version>
</dependency>
Code
The sample code is as it is.
// https://www.atilika.com/ja/kuromoji/
package hello.kuromoji;
import com.atilika.kuromoji.ipadic.Token;
import com.atilika.kuromoji.ipadic.Tokenizer;
import java.util.List;
public class KuromojiExample {
public static void main(String[] args) {
Tokenizer tokenizer = new Tokenizer();
List<Token> tokens = tokenizer.tokenize("I want to eat sushi. I also want to eat curry.");
for (Token token : tokens) {
System.out.println(token.getSurface() + "\t" + token.getAllFeatures());
}
}
}
Prefix,Noun connection,*,*,*,*,O,Oh,Oh
Sushi noun,General,*,*,*,*,sushi,Sushi,Sushi
Is a particle,Case particles,General,*,*,*,But,Moth,Moth
Eat verb,Independence,*,*,One step,Continuous form,eat,Tabe,Tabe
Tai auxiliary verb,*,*,*,Special Thailand,Uninflected word,Want,Thailand,Thailand
.. symbol,Kuten,*,*,*,*,。,。,。
Curry noun,General,*,*,*,*,curry,curry,curry
Also particles,Particle,*,*,*,*,Also,Mo,Mo
Eat verb,Independence,*,*,One step,Continuous form,eat,Tabe,Tabe
Tai auxiliary verb,*,*,*,Special Thailand,Uninflected word,Want,Thailand,Thailand
.. symbol,Kuten,*,*,*,*,。,。,。
Kuromoji is very easy to use.
kuromoji | Atilika https://www.atilika.com/ja/kuromoji/
Recommended Posts