NLP4J [006-034] 100 language processing knocks with NLP4J # 34 "A B" directly code the process to extract "A B" I wrote in. The following part.
//Find "A to B"
String meishi_a = null;
String no = null;
for (Keyword kwd : kwds) {
if (meishi_a == null && kwd.getFacet().equals("noun")) {
meishi_a = kwd.getLex();
} //
else if (meishi_a != null && no == null && kwd.getLex().equals("of")) {
no = kwd.getLex();
} //
else if (meishi_a != null && no != null && kwd.getFacet().equals("noun")) {
System.err.println(meishi_a + no + kwd.getLex());
meishi_a = null;
no = null;
} //
else {
meishi_a = null;
no = null;
}
}
Logic cannot be reused with this kind of keyword extraction (Annotation) method.
Annotator
Therefore, NLP4J provides Annotator, which is a mechanism to add Annotation independently. The mechanism is simple: implement Interface nlp4j.DocumentAnnotator.
If you prepare the above logic as Annotator code, it will be as follows. "A's B" is simply output as a string and added as a new keyword, not the end. An identifier (= facet: facet) called "word_nn_no_nn" is set so that the type of keyword can be identified.
package nlp4j.annotator;
import java.util.ArrayList;
import nlp4j.AbstractDocumentAnnotator;
import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.Keyword;
import nlp4j.impl.DefaultKeyword;
/**
*"Noun noun" to "word"_nn_no_Extract as the "nn" keyword.
* @author Hiroki Oya
*/
public class Nokku34Annotator extends AbstractDocumentAnnotator implements DocumentAnnotator {
@Override
public void annotate(Document doc) throws Exception {
ArrayList<Keyword> newkwds = new ArrayList<>();
Keyword meishi_a = null;
Keyword no = null;
for (Keyword kwd : doc.getKeywords()) {
if (meishi_a == null && kwd.getFacet().equals("noun")) {
meishi_a = kwd;
} //
else if (meishi_a != null && no == null && kwd.getLex().equals("of")) {
no = kwd;
} //
else if (meishi_a != null && no != null && kwd.getFacet().equals("noun")) {
Keyword kw = new DefaultKeyword();
kwd.setLex(meishi_a.getLex() + no.getLex() + kwd.getLex());
kwd.setFacet("word_nn_no_nn");
kwd.setBegin(meishi_a.getBegin());
kwd.setEnd(kwd.getEnd());
kwd.setStr(meishi_a.getStr() + no.getStr() + kwd.getStr());
kwd.setReading(meishi_a.getReading() + no.getReading() + kwd.getReading());
newkwds.add(kw);
meishi_a = null;
no = null;
} //
else {
meishi_a = null;
no = null;
}
}
doc.addKeywords(newkwds);
}
}
Now we have separated and defined the logic to extract the keyword "A to B".
package nlp4j.nokku.chap4;
import java.util.List;
import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.DocumentAnnotatorPipeline;
import nlp4j.Keyword;
import nlp4j.crawler.Crawler;
import nlp4j.crawler.TextFileLineSeparatedCrawler;
import nlp4j.impl.DefaultDocumentAnnotatorPipeline;
import nlp4j.annotator.Nokku34Annotator;
public class Nokku34b {
public static void main(String[] args) throws Exception {
//Use the text file crawler provided by NLP4J
Crawler crawler = new TextFileLineSeparatedCrawler();
crawler.setProperty("file", "src/test/resources/nlp4j.crawler/neko_short_utf8.txt");
crawler.setProperty("encoding", "UTF-8");
crawler.setProperty("target", "text");
//Document crawl
List<Document> docs = crawler.crawlDocuments();
//Definition of NLP pipeline (process by connecting multiple processes as a pipeline)
DocumentAnnotatorPipeline pipeline = new DefaultDocumentAnnotatorPipeline();
{
// Yahoo!Annotator using Japan's morphological analysis API
DocumentAnnotator annotator = new YJpMaAnnotator();
pipeline.add(annotator);
}
{
//"Noun noun" to "word"_nn_no_Extract as the "nn" keyword.
Nokku34Annotator annotator = new Nokku34Annotator(); //← Issue 34 is only here
pipeline.add(annotator); //← Issue 34 is only here
}
//Execution of annotation processing
pipeline.annotate(docs);
for (Document doc : docs) {
for (Keyword kwd : doc.getKeywords("word_nn_no_nn")) {
System.err.println(kwd.getStr());
}
}
}
}
The process of extracting "B of A" is now only two lines!
//"Noun noun" to "word"_nn_no_Extract as the "nn" keyword.
Nokku34Annotator annotator = new Nokku34Annotator();
pipeline.add(annotator);
This way, you can define a lot of your own Annotators and extend your natural language processing even further.
His palm
On the palm
Student's face
Should face
In the middle of the face
In the hole
With NLP4J, you can easily process natural language in Java!
https://www.nlp4j.org/
Recommended Posts