Points to be improved

NLP4J [006-034] 100 language processing knocks with NLP4J # 34 "A B" directly code the process to extract "A B" I wrote in. The following part.

//Find "A to B"
String meishi_a = null;
String no = null;

for (Keyword kwd : kwds) {
	if (meishi_a == null && kwd.getFacet().equals("noun")) {
		meishi_a = kwd.getLex();
	} //
	else if (meishi_a != null && no == null && kwd.getLex().equals("of")) {
		no = kwd.getLex();
	} //
	else if (meishi_a != null && no != null && kwd.getFacet().equals("noun")) {
		System.err.println(meishi_a + no + kwd.getLex());
		meishi_a = null;
		no = null;
	} //
	else {
		meishi_a = null;
		no = null;
	}
}

Logic cannot be reused with this kind of keyword extraction (Annotation) method.

Annotator

Therefore, NLP4J provides Annotator, which is a mechanism to add Annotation independently. The mechanism is simple: implement Interface nlp4j.DocumentAnnotator.

If you prepare the above logic as Annotator code, it will be as follows. "A's B" is simply output as a string and added as a new keyword, not the end. An identifier (= facet: facet) called "word_nn_no_nn" is set so that the type of keyword can be identified.

package nlp4j.annotator;
import java.util.ArrayList;
import nlp4j.AbstractDocumentAnnotator;
import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.Keyword;
import nlp4j.impl.DefaultKeyword;

/**
 *"Noun noun" to "word"_nn_no_Extract as the "nn" keyword.
 * @author Hiroki Oya
 */
public class Nokku34Annotator extends AbstractDocumentAnnotator implements DocumentAnnotator {
	@Override
	public void annotate(Document doc) throws Exception {
		ArrayList<Keyword> newkwds = new ArrayList<>();
		Keyword meishi_a = null;
		Keyword no = null;
		for (Keyword kwd : doc.getKeywords()) {
			if (meishi_a == null && kwd.getFacet().equals("noun")) {
				meishi_a = kwd;
			} //
			else if (meishi_a != null && no == null && kwd.getLex().equals("of")) {
				no = kwd;
			} //
			else if (meishi_a != null && no != null && kwd.getFacet().equals("noun")) {
				Keyword kw = new DefaultKeyword();
				kwd.setLex(meishi_a.getLex() + no.getLex() + kwd.getLex());
				kwd.setFacet("word_nn_no_nn");
				kwd.setBegin(meishi_a.getBegin());
				kwd.setEnd(kwd.getEnd());
				kwd.setStr(meishi_a.getStr() + no.getStr() + kwd.getStr());
				kwd.setReading(meishi_a.getReading() + no.getReading() + kwd.getReading());
				newkwds.add(kw);
				meishi_a = null;
				no = null;
			} //
			else {
				meishi_a = null;
				no = null;
			}
		}
		doc.addKeywords(newkwds);
	}
}

Now we have separated and defined the logic to extract the keyword "A to B".

Using Annotator

package nlp4j.nokku.chap4;

import java.util.List;
import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.DocumentAnnotatorPipeline;
import nlp4j.Keyword;
import nlp4j.crawler.Crawler;
import nlp4j.crawler.TextFileLineSeparatedCrawler;
import nlp4j.impl.DefaultDocumentAnnotatorPipeline;
import nlp4j.annotator.Nokku34Annotator;

public class Nokku34b {

	public static void main(String[] args) throws Exception {

		//Use the text file crawler provided by NLP4J
		Crawler crawler = new TextFileLineSeparatedCrawler();
		crawler.setProperty("file", "src/test/resources/nlp4j.crawler/neko_short_utf8.txt");
		crawler.setProperty("encoding", "UTF-8");
		crawler.setProperty("target", "text");

		//Document crawl
		List<Document> docs = crawler.crawlDocuments();

		//Definition of NLP pipeline (process by connecting multiple processes as a pipeline)
		DocumentAnnotatorPipeline pipeline = new DefaultDocumentAnnotatorPipeline();
		{
			// Yahoo!Annotator using Japan's morphological analysis API
			DocumentAnnotator annotator = new YJpMaAnnotator();
			pipeline.add(annotator);
		}
		{
			//"Noun noun" to "word"_nn_no_Extract as the "nn" keyword.
			Nokku34Annotator annotator = new Nokku34Annotator(); //← Issue 34 is only here
			pipeline.add(annotator); //← Issue 34 is only here
		}
		//Execution of annotation processing
		pipeline.annotate(docs);

		for (Document doc : docs) {
			for (Keyword kwd : doc.getKeywords("word_nn_no_nn")) {
				System.err.println(kwd.getStr());
			}
		}
	}
}

The process of extracting "B of A" is now only two lines!

 //"Noun noun" to "word"_nn_no_Extract as the "nn" keyword.
 Nokku34Annotator annotator = new Nokku34Annotator();
 pipeline.add(annotator);

This way, you can define a lot of your own Annotators and extend your natural language processing even further.

result

His palm
On the palm
Student's face
Should face
In the middle of the face
In the hole

Summary

With NLP4J, you can easily process natural language in Java!

Project URL

https://www.nlp4j.org/

Return to Index

[JAVA] NLP4J [006-034b] Try to make an Annotator of 100 language processing knock # 34 "A's B" with NLP4J

Points to be improved

Using Annotator

result

Summary

Project URL