[JAVA] NLP4J [006-034b] Try to make an Annotator of 100 language processing knock # 34 "A's B" with NLP4J

Return to Index

Points to be improved

NLP4J [006-034] 100 language processing knocks with NLP4J # 34 "A B" directly code the process to extract "A B" I wrote in. The following part.

//Find "A to B"
String meishi_a = null;
String no = null;

for (Keyword kwd : kwds) {
	if (meishi_a == null && kwd.getFacet().equals("noun")) {
		meishi_a = kwd.getLex();
	} //
	else if (meishi_a != null && no == null && kwd.getLex().equals("of")) {
		no = kwd.getLex();
	} //
	else if (meishi_a != null && no != null && kwd.getFacet().equals("noun")) {
		System.err.println(meishi_a + no + kwd.getLex());
		meishi_a = null;
		no = null;
	} //
	else {
		meishi_a = null;
		no = null;
	}
}

Logic cannot be reused with this kind of keyword extraction (Annotation) method.

Annotator

Therefore, NLP4J provides Annotator, which is a mechanism to add Annotation independently. The mechanism is simple: implement Interface nlp4j.DocumentAnnotator.

If you prepare the above logic as Annotator code, it will be as follows. "A's B" is simply output as a string and added as a new keyword, not the end. An identifier (= facet: facet) called "word_nn_no_nn" is set so that the type of keyword can be identified.

package nlp4j.annotator;
import java.util.ArrayList;
import nlp4j.AbstractDocumentAnnotator;
import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.Keyword;
import nlp4j.impl.DefaultKeyword;

/**
 *"Noun noun" to "word"_nn_no_Extract as the "nn" keyword.
 * @author Hiroki Oya
 */
public class Nokku34Annotator extends AbstractDocumentAnnotator implements DocumentAnnotator {
	@Override
	public void annotate(Document doc) throws Exception {
		ArrayList<Keyword> newkwds = new ArrayList<>();
		Keyword meishi_a = null;
		Keyword no = null;
		for (Keyword kwd : doc.getKeywords()) {
			if (meishi_a == null && kwd.getFacet().equals("noun")) {
				meishi_a = kwd;
			} //
			else if (meishi_a != null && no == null && kwd.getLex().equals("of")) {
				no = kwd;
			} //
			else if (meishi_a != null && no != null && kwd.getFacet().equals("noun")) {
				Keyword kw = new DefaultKeyword();
				kwd.setLex(meishi_a.getLex() + no.getLex() + kwd.getLex());
				kwd.setFacet("word_nn_no_nn");
				kwd.setBegin(meishi_a.getBegin());
				kwd.setEnd(kwd.getEnd());
				kwd.setStr(meishi_a.getStr() + no.getStr() + kwd.getStr());
				kwd.setReading(meishi_a.getReading() + no.getReading() + kwd.getReading());
				newkwds.add(kw);
				meishi_a = null;
				no = null;
			} //
			else {
				meishi_a = null;
				no = null;
			}
		}
		doc.addKeywords(newkwds);
	}
}

Now we have separated and defined the logic to extract the keyword "A to B".

Using Annotator

package nlp4j.nokku.chap4;

import java.util.List;
import nlp4j.Document;
import nlp4j.DocumentAnnotator;
import nlp4j.DocumentAnnotatorPipeline;
import nlp4j.Keyword;
import nlp4j.crawler.Crawler;
import nlp4j.crawler.TextFileLineSeparatedCrawler;
import nlp4j.impl.DefaultDocumentAnnotatorPipeline;
import nlp4j.annotator.Nokku34Annotator;

public class Nokku34b {

	public static void main(String[] args) throws Exception {

		//Use the text file crawler provided by NLP4J
		Crawler crawler = new TextFileLineSeparatedCrawler();
		crawler.setProperty("file", "src/test/resources/nlp4j.crawler/neko_short_utf8.txt");
		crawler.setProperty("encoding", "UTF-8");
		crawler.setProperty("target", "text");

		//Document crawl
		List<Document> docs = crawler.crawlDocuments();

		//Definition of NLP pipeline (process by connecting multiple processes as a pipeline)
		DocumentAnnotatorPipeline pipeline = new DefaultDocumentAnnotatorPipeline();
		{
			// Yahoo!Annotator using Japan's morphological analysis API
			DocumentAnnotator annotator = new YJpMaAnnotator();
			pipeline.add(annotator);
		}
		{
			//"Noun noun" to "word"_nn_no_Extract as the "nn" keyword.
			Nokku34Annotator annotator = new Nokku34Annotator(); //← Issue 34 is only here
			pipeline.add(annotator); //← Issue 34 is only here
		}
		//Execution of annotation processing
		pipeline.annotate(docs);

		for (Document doc : docs) {
			for (Keyword kwd : doc.getKeywords("word_nn_no_nn")) {
				System.err.println(kwd.getStr());
			}
		}
	}
}

The process of extracting "B of A" is now only two lines!

 //"Noun noun" to "word"_nn_no_Extract as the "nn" keyword.
 Nokku34Annotator annotator = new Nokku34Annotator();
 pipeline.add(annotator);

This way, you can define a lot of your own Annotators and extend your natural language processing even further.

result

His palm
On the palm
Student's face
Should face
In the middle of the face
In the hole

Summary

With NLP4J, you can easily process natural language in Java!

Project URL

https://www.nlp4j.org/ NLP4J_N_128.png


Return to Index

Recommended Posts

NLP4J [006-034b] Try to make an Annotator of 100 language processing knock # 34 "A's B" with NLP4J
NLP4J [006-034c] 100 language processing knocks with NLP4J # 34 Try to solve "A's B" smarter (final edition)
NLP4J [006-032] 100 language processing with NLP4J Knock # 32 Prototype of verb
NLP4J [006-034] 100 language processing knocks with NLP4J # 34 "A B"
[swift5] Try to make an API client with various methods
NLP4J [006-031] 100 language processing knocks with NLP4J # 31 verb
NLP4J [006-033] 100 language processing knocks with NLP4J # 33 Sahen noun
Rails6 I want to make an array of values with a check box
Create an Annotator that uses kuromoji with NLP4J [007]
How to make an image partially transparent in Processing
How to make batch processing with Rails + Heroku configuration
Initialization of for Try to make Java problem TypeScript 5-4
How to make an almost static page with rails
Try to make an addition program in several languages
NLP4J [006-030] 100 language processing knocks with NLP4J # 30 Reading morphological analysis results
I tried to make an introduction to PHP + MySQL with Docker
[Beginner] Try to make a simple RPG game with Java ①
Initialization with an empty string to an instance of Java String type
How to make an app using Tensorflow with Android Studio
Try debugging natural language processing on Windows. with VS Code
What is an immutable object? [Explanation of how to make]