I want to do things like Japanese morphological analysis (MeCab) in Chinese, so I use FNLP.
OS : Windows7 64bit Language: Java8 IDE: Ecplise4.8.0
Same as English morphological analysis like MeCab with OpenNLP
I want to do "morphemes", "part of speech", and "basic forms" that can be obtained by applying Japanese sentences to Mecab in Chinese. Use the open source "Fudan NLP (FNLP)" to acquire "morphemes" and "part of speech" from the Chinese part.
In this article, we will limit ourselves to simplified Chinese sentences.
Therefore, in this article, we assume that all Chinese morphemes can be obtained in the "basic form".
If you specify fnlp-core in MavenRepository directly in pom.xml, an error will occur, so build the source code once and create the fnlp-core-2.1-SNAPSHOT.jar file.
Create a maven project and place the created fnlp-core-2.1-SNAPSHOT.jar file under the dic folder
Add the following to pom.xml
<dependency>
<groupId>net.sf.trove4j</groupId>
<artifactId>trove4j</artifactId>
<version>3.0.3</version>
</dependency>
<dependency>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
<version>1.2</version>
</dependency>
<dependency>
<groupId>org.fnlp</groupId>
<artifactId>core</artifactId>
<version>2.1</version>
</dependency>
<dependency>
<groupId>org.fnlp</groupId>
<artifactId>core</artifactId>
<version>2.1</version>
<scope>system</scope>
<systemPath>${project.basedir}/dic/fnlp-core-2.1-SNAPSHOT.jar</systemPath>
</dependency>
Also, download the three model files (pos.m, seg.m, dep.m) published at https://github.com/xpqiu/fnlp/releases and place them in the dic folder.
CNFactory factory = null;
//Specify the path of the model file and call the morphological analyzer
try {
factory = CNFactory.getInstance("./dic");
} catch (LoadModelException lme) {
lme.printStackTrace();
}
String message = "Now the weather is good!";
String[][] tokens = factory.tag(message);
System.out.println(Arrays.asList(tokens[0]));
>> [Imaten,Weather,true,Good,啊, !]
CNFactory factory = null;
//Specify the path of the model file and call the morphological analyzer
try {
factory = CNFactory.getInstance("./dic");
} catch (LoadModelException lme) {
lme.printStackTrace();
}
String message = "Now the weather is good!";
String[][] tokens = factory.tag(message);
System.out.println(Arrays.asList(tokens[1]));
>> [Time short phrase,Noun,Adverb,Predicate,Interjection,Punctuation]
Recommended Posts