Parsing the COTOHA API in Java


Three processes are required to use a REST API such as COTOHA API.

  1. Make an HTTP request
  2. Send HTTP request (handling when an error occurs)
  3. Receive HTTP response

Of these, in the previous article (Try using COTOHA API parsing in Java) I was doing the processing equivalent to 1. (Create) and 2. (Send), but since the response was left as simple JSON, I will map it to a Java class.

Java class

DefaultKeywordWithDependency which is a model that implements dependency in NLP4J that I create with DIY /nlp4j/blob/master/nlp4j/nlp4j-core/src/main/java/nlp4j/impl/ I have a class, so I will map it there. (Because it is →, it is not simply mapped to the POJO class.)



The JSON of the parsing result looks like the following. The result of the parsing is tree-like data, but it can be read that it is not a tree-like JSON.

	"result": [
			"chunk_info": {"id": 0,"head": 2,"dep": "D","chunk_head": 0,"chunk_func": 1,
				"links": []
			"tokens": [
					"id": 0,"form": "today","kana": "today","lemma": "today","pos": "noun",
					"features": ["Date and time"],
					"dependency_labels": [{"token_id": 1,"label": "case"}],
					"attributes": {}
					"id": 1,"form": "Is","kana": "C","lemma": "Is","pos": "Conjunctive particles",
					"features": [],
					"attributes": {}
	"status": 0,
	"message": ""

Below is the class for Perth. (All code will be published on Maven Repository and Github)

package nlp4j.cotoha;

import java.lang.invoke.MethodHandles;
import java.util.ArrayList;
import java.util.HashMap;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;


import nlp4j.Keyword;
import nlp4j.impl.DefaultKeyword;
import nlp4j.impl.DefaultKeywordWithDependency;

 *COTOHA API Parsing V1 Response JSON Parsing
 * @author Hiroki Oya
 * @since
public class CotohaNlpV1ResponseHandler {

	static private final Logger logger = LogManager.getLogger(MethodHandles.lookup().lookupClass());

	 *Keywords extracted as the root of the syntax
	ArrayList<DefaultKeywordWithDependency> roots = new ArrayList<>();

	 *List of keywords
	ArrayList<Keyword> keywords = new ArrayList<>();

	 *Original keyword for phrase
	ArrayList<Keyword> chunkLinkKeywords = new ArrayList<>();

	 * @return clause Original keyword
	public ArrayList<Keyword> getChunkLinkKeywords() {
		return chunkLinkKeywords;

	ArrayList<String> chunkLinks = new ArrayList<>();

	JsonArray arrChunkLinks = new JsonArray();

	 * Map: token_id --> Keyword
	HashMap<String, DefaultKeywordWithDependency> mapTokenidKwd = new HashMap<>();

	 * Map: id --> Keyword
	HashMap<String, DefaultKeywordWithDependency> mapIdKwd = new HashMap<>();

	 * token id --> sentence
	HashMap<Integer, Integer> idSentenceMap = new HashMap<>();

	 *Dependent keywords
	ArrayList<DefaultKeyword> patternKeywords = new ArrayList<>();

	 * @return Kakemoto
	public JsonArray getArrChunkLinks() {
		return arrChunkLinks;

	 * @return Kakemoto
	public ArrayList<String> getChunkLinks() {
		return chunkLinks;

	 * @map of return ID and keywords
	public HashMap<String, DefaultKeywordWithDependency> getIdMapKwd() {
		return mapIdKwd;

	 * @return Morpheme ID and statement number mapping
	public HashMap<Integer, Integer> getIdSentenceMap() {
		return idSentenceMap;

	 * @Return word sequence
	public ArrayList<Keyword> getKeywords() {
		return keywords;

	 * @map of return TOKEN ID and keywords
	public HashMap<String, DefaultKeywordWithDependency> getMapKwd() {
		return mapTokenidKwd;

	 * @return Keyword
	public ArrayList<DefaultKeyword> getPatternKeywords() {
		return patternKeywords;

	 * @return Extracted dependency route keyword
	public ArrayList<DefaultKeywordWithDependency> getRoots() {
		return roots;

	 * @param json COTOHA API Parsing Response JSON
	public void parse(String json) {
		// JSON Parser
		Gson gson = new Gson();
		JsonObject result = gson.fromJson(json, JsonObject.class);
		//The order in which they appear in the sentence
		int sequence = 0;
		// {
		// "result":[
		// _{"chunk_info":{...},"tokens"[{...},{...},{...}]},
		// _{"chunk_info":{...},"tokens"[{...},{...},{...}]},
		// _{"chunk_info":{...},"tokens"[{...},{...},{...}]}
		// ]
		// }
		// chunk_An object that combines info and tokens
		JsonArray arrChunkTokens = result.getAsJsonArray("result");
		int idxBegin = 0;
		int idxSentence = 0;
		// FOR EACH(chunk_tokens)
		for (int idxChunkTokens = 0; idxChunkTokens < arrChunkTokens.size(); idxChunkTokens++) {
			JsonObject chunk_token = arrChunkTokens.get(idxChunkTokens).getAsJsonObject();
			// 1. chunk_info clause information object
			JsonObject chunk_info = chunk_token.get("chunk_info").getAsJsonObject();
			logger.debug("chunk_info: " + chunk_info);
			int chunk_head = -1;
				//Morpheme number (0 origin)
				String chunk_id = "" + chunk_info.get("id").getAsInt();
				//Contact phrase number
				chunk_head = chunk_info.get("head").getAsInt();
				//Array of source information
				JsonArray links = chunk_info.get("links").getAsJsonArray();
				for (int n = 0; n < links.size(); n++) {
					JsonObject link = links.get(n).getAsJsonObject();
					int link_link = link.get("link").getAsInt();
					String link_label = link.get("label").getAsString();
					chunkLinks.add(chunk_id + "/" + link_label + "/" + link_link);

			// 2.tokens Morpheme information object
			JsonArray tokens = chunk_token.get("tokens").getAsJsonArray();

			//FOR EACH TOKENS Morpheme information object
			for (int idxTokens = 0; idxTokens < tokens.size(); idxTokens++) {

				JsonObject token = tokens.get(idxTokens).getAsJsonObject();
				logger.debug("token: " + token);

				// X-Y-style ID What morpheme in a clause
				String token_id = idxChunkTokens + "-" + idxTokens;
				logger.debug("token_id: " + token_id);
				String token_pos = token.get("pos") != null ? token.get("pos").getAsString() : null;
				String token_lemma = token.get("lemma") != null ? token.get("lemma").getAsString() : null;
				String token_form = token.get("form") != null ? token.get("form").getAsString() : null;
				String token_kana = token.get("kana") != null ? token.get("kana").getAsString() : null;
				//Is it the last of the tokens? If true, the destination of the dependency is the next token
				boolean isLastOfTokens = (idxTokens == tokens.size() - 1);
				if (isLastOfTokens) {
					logger.debug("Last token: chunk_head:" + chunk_head);
				//Dependent keywords(Defined in nlp4j)
				DefaultKeywordWithDependency kw = new DefaultKeywordWithDependency();
				//Serial numbers in the order they appear in the text
				//Starting position
				// lemma:Headword:Prototype
				if (token_lemma != null) {
				} else {
					logger.warn("lemma is null");
				int intId = token.get("id").getAsInt();
				String id = "" + token.get("id").getAsInt();
				idSentenceMap.put(intId, idxSentence);
				//Whether it is the end of a sentence
				boolean isLastOfSentence = (chunk_head == -1 && idxTokens == tokens.size() - 1) //
						|| (token_pos != null && token_pos.equals("Kuten"));
				// IF(End of sentence)
				if (isLastOfSentence) {
					//increment statement number
				//set facet part of speech
				//set str expression type
				kw.setEnd(idxBegin + kw.getStr().length());
				idxBegin += kw.getStr().length();

				//set reading reading
				mapTokenidKwd.put(token_id, kw);
				mapIdKwd.put(id, kw);

				//dependency labels Array of dependency information
				if (token.get("dependency_labels") != null) {
					//Array of dependent information
					JsonArray arrDependency = token.get("dependency_labels").getAsJsonArray();
					for (int n = 0; n < arrDependency.size(); n++) {
						//Dependency information
						JsonObject objDependency = arrDependency.get(n).getAsJsonObject();
						String dependency_token_id = "" + objDependency.get("token_id").getAsInt();
						//Set dependency information for keywords
		} // END OF FOR EACH (chunk_tokens)

		// <Assembling the tree>

		// FOR EACH(chunk_tokens)
		for (int idxChunkTokens = 0; idxChunkTokens < arrChunkTokens.size(); idxChunkTokens++) {
			JsonObject chunk_token = arrChunkTokens.get(idxChunkTokens).getAsJsonObject();
			// 2. tokens
			JsonArray tokens = chunk_token.get("tokens").getAsJsonArray();
			for (int idxTokens = 0; idxTokens < tokens.size(); idxTokens++) {
				JsonObject token = tokens.get(idxTokens).getAsJsonObject();
				String id = "" + token.get("id").getAsInt();
				DefaultKeywordWithDependency kw = mapIdKwd.get(id);
				// dependency labels
				if (token.get("dependency_labels") != null) {
					JsonArray arr_dependency_labels = token.get("dependency_labels").getAsJsonArray();
					for (int n = 0; n < arr_dependency_labels.size(); n++) {
						JsonObject dependency_label = arr_dependency_labels.get(n).getAsJsonObject();
						String childID = "" + dependency_label.get("token_id").getAsInt();
						String labelDependency = dependency_label.get("label").getAsString();

						//Check if it straddles sentences
						int sentence1 = idSentenceMap.get(token.get("id").getAsInt());
						int sentence2 = idSentenceMap.get(dependency_label.get("token_id").getAsInt());

						//Do not straddle sentences
						if (mapIdKwd.get(childID) != null && (sentence1 == sentence2)) {
							//Parent and Child are reversed in Japanese and English
							DefaultKeywordWithDependency kw1Child = mapIdKwd.get(childID);
							DefaultKeywordWithDependency kw2Parent = kw;

							if (kw1Child.getBegin() < kw2Parent.getBegin()) {
								DefaultKeyword kwd = new DefaultKeyword();
								kwd.setLex(kw1Child.getLex() + " ... " + kw2Parent.getLex());
							} else {
								DefaultKeyword kwd = new DefaultKeyword();
								kwd.setLex(kw2Parent.getLex() + " ... " + kw1Child.getLex());

						} //
		} // END OF FOR EACH (chunk_tokens)

		for (String link : chunkLinks) {
			String id1 = link.split("/")[0];
			String relation = link.split("/")[1];
			String id2 = link.split("/")[2];
			Keyword kwd1 = mapTokenidKwd.get(id1 + "-0");
			Keyword kwd2 = mapTokenidKwd.get(id2 + "-0");
			String lex1 = kwd1.getLex();
			String lex2 = kwd2.getLex();
			DefaultKeyword kwd = new DefaultKeyword();
			kwd.setLex(lex2 + " ... " + lex1);


		// </Assembling the tree>

		for (String key : mapIdKwd.keySet()) {
			DefaultKeywordWithDependency kw = mapIdKwd.get(key);
			// IF(If it is a root keyword)
			if (kw.getParent() == null) {
	} // end of parse()


Regarding the parsing of the COTOHA API, if you analyze two sentences like "It's a nice day today. I'm going to school tomorrow.", It seems that a dependency that spans two sentences is returned. (Please point out if the recognition is wrong)

On the analysis demo page, two sentences are divided, but it seems that the parsing process is performed after separating the sentences with punctuation marks in advance. image.png

Therefore, this parser counts the "number of sentences" in advance as follows.


I try to ignore dependencies that span sentences.


Using Parser

As a TestCase, I will parse the JSON that saved the result of the COTOHA parsing API and output it as a character. (Scheduled to be released on Github at a later date)

File file = new File("src/test/resources/nlp_v1_parse_002.json");
String json = FileUtils.readFileToString(file, "UTF-8");

CotohaNlpV1ResponseHandler handler = new CotohaNlpV1ResponseHandler();

for (DefaultKeywordWithDependency root : handler.getRoots()) {


for (Keyword kwd : handler.getKeywords()) {
	System.err.println(kwd.getLex() + " (" + "word." + kwd.getFacet() + ")");
	System.err.println("\t" + kwd);


for (Keyword kwd : handler.getPatternKeywords()) {
	System.err.println(kwd.getLex() + " (" + "pattern." + kwd.getFacet() + ")");
	System.err.println("\t" + kwd);

for (Keyword kwd : handler.getChunkLinkKeywords()) {
	System.err.println(kwd.getLex() + " (" + "pattern." + kwd.getFacet() + ")");
	System.err.println("\t" + kwd);


It looks like the following. It's hard to read if it's raw JSON, but I tried to output it in a tree shape. The state of the dependency has become easier to understand.

today[relation=nmod, sequence=0, dependencyKey=1, hasChildren=true, hasParent=false, facet=noun, lex=today, str=today, reading=today, begin=0, end=2]
Is(word.Conjunctive particles)
Is[relation=case, sequence=1, dependencyKey=null, hasChildren=false, hasParent=false, facet=Conjunctive particles, lex=Is, str=Is, reading=C, begin=2, end=3]
Good(word.Adjective stem)
Good[relation=amod, sequence=2, dependencyKey=3, hasChildren=true, hasParent=false, facet=Adjective stem, lex=Good, str=I, reading=I, begin=3, end=4]
I(word.Adjective suffix)
I[relation=aux, sequence=3, dependencyKey=null, hasChildren=false, hasParent=false, facet=Adjective suffix, lex=I, str=I, reading=I, begin=4, end=5]
weather[relation=null, sequence=4, dependencyKey=6, hasChildren=true, hasParent=true, facet=noun, lex=weather, str=weather, reading=weather, begin=5, end=7]
is[relation=cop, sequence=5, dependencyKey=null, hasChildren=false, hasParent=false, facet=Judgment, lex=is, str=is, reading=death, begin=7, end=9]
。 (word.Kuten)
	。 [relation=punct, sequence=6, dependencyKey=null, hasChildren=false, hasParent=false, facet=Kuten, lex=。, str=。, reading=, begin=9, end=10]
tomorrow[relation=nmod, sequence=7, dependencyKey=8, hasChildren=true, hasParent=false, facet=noun, lex=tomorrow, str=tomorrow, reading=Ass, begin=10, end=12]
Is(word.Conjunctive particles)
Is[relation=case, sequence=8, dependencyKey=null, hasChildren=false, hasParent=false, facet=Conjunctive particles, lex=Is, str=Is, reading=C, begin=12, end=13]
school[relation=nmod, sequence=9, dependencyKey=10, hasChildren=true, hasParent=false, facet=noun, lex=school, str=school, reading=Gakkou, begin=13, end=15]
To(word.Case particles)
To[relation=case, sequence=10, dependencyKey=null, hasChildren=false, hasParent=false, facet=Case particles, lex=To, str=To, reading=D, begin=15, end=16]
go(word.Verb stem)
go[relation=null, sequence=11, dependencyKey=14, hasChildren=true, hasParent=true, facet=Verb stem, lex=go, str=line, reading=I, begin=16, end=17]
Ki(word.Verb conjugation ending)
Ki[relation=aux, sequence=12, dependencyKey=null, hasChildren=false, hasParent=false, facet=Verb conjugation ending, lex=Ki, str=Ki, reading=Ki, begin=17, end=18]
Masu(word.Verb suffix)
Masu[relation=aux, sequence=13, dependencyKey=null, hasChildren=false, hasParent=false, facet=Verb suffix, lex=Masu, str=Masu, reading=trout, begin=18, end=20]
。 (word.Kuten)
	。 [relation=punct, sequence=14, dependencyKey=null, hasChildren=false, hasParent=false, facet=Kuten, lex=。, str=。, reading=, begin=20, end=21]
today...Is[sequence=-1, facet=case, lex=today...Is, str=null, reading=null, count=-1, begin=0, end=3, correlation=0.0]
Good...I[sequence=-1, facet=aux, lex=Good...I, str=null, reading=null, count=-1, begin=3, end=5, correlation=0.0][sequence=-1, facet=nmod,, str=null, reading=null, count=-1, begin=0, end=7, correlation=0.0][sequence=-1, facet=amod,, str=null, reading=null, count=-1, begin=3, end=7, correlation=0.0][sequence=-1, facet=cop,, str=null, reading=null, count=-1, begin=5, end=9, correlation=0.0]
weather... 。 (pattern.punct)
weather... 。 [sequence=-1, facet=punct, lex=weather... 。, str=null, reading=null, count=-1, begin=5, end=10, correlation=0.0]
tomorrow...Is[sequence=-1, facet=case, lex=tomorrow...Is, str=null, reading=null, count=-1, begin=10, end=13, correlation=0.0]
school...To[sequence=-1, facet=case, lex=school...To, str=null, reading=null, count=-1, begin=13, end=16, correlation=0.0]
tomorrow...go[sequence=-1, facet=nmod, lex=tomorrow...go, str=null, reading=null, count=-1, begin=10, end=17, correlation=0.0]
school...go[sequence=-1, facet=nmod, lex=school...go, str=null, reading=null, count=-1, begin=13, end=17, correlation=0.0]
go...Ki[sequence=-1, facet=aux, lex=go...Ki, str=null, reading=null, count=-1, begin=16, end=18, correlation=0.0]
go...Masu[sequence=-1, facet=aux, lex=go...Masu, str=null, reading=null, count=-1, begin=16, end=20, correlation=0.0]
go... 。 (pattern.punct)
go... 。 [sequence=-1, facet=punct, lex=go... 。, str=null, reading=null, count=-1, begin=16, end=21, correlation=0.0]
---[sequence=-1, facet=time,,, reading=null, count=-1, begin=5, end=2, correlation=0.0][sequence=-1, facet=adjectivals,,, reading=null, count=-1, begin=5, end=4, correlation=0.0]
weather...go[sequence=-1, facet=manner, lex=weather...go, str=weather...go, reading=null, count=-1, begin=16, end=7, correlation=0.0]
tomorrow...go[sequence=-1, facet=time, lex=tomorrow...go, str=tomorrow...go, reading=null, count=-1, begin=16, end=12, correlation=0.0]
school...go[sequence=-1, facet=goal, lex=school...go, str=school...go, reading=null, count=-1, begin=16, end=15, correlation=0.0]


Being easy to handle as a Java class means that it is also easy to handle for business use. Parsing the results of parsing is a bit of a hassle, but in the business world ** this is the game **.



that's all

Recommended Posts

Parsing the COTOHA API in Java
Try using the COTOHA API parsing in Java
Comments on the COTOHA Parsing API
Call the Windows Notification API in Java
Zabbix API in Java
ChatWork4j for using the ChatWork API in Java
I called the COTOHA API parser 100 times in Java to measure performance.
Java Stream API in 5 minutes
2 Implement simple parsing in Java
Guess the character code in Java
[java8] To understand the Stream API
Specify the java location in eclipse.ini
Unzip the zip file in Java
Generate CloudStack API URL in Java
Hit Zaim's API (OAuth 1.0) in Java
Hit the Docker API in Rust
JPA (Java Persistence API) in Eclipse
Call the super method in Java
Sample code to call the Yahoo! Local Search API in Java
Leverage Either for individual exception handling in the Java Stream API
I tried using Elasticsearch API in Java
Get the result of POST in Java
Implement API Gateway Lambda Authorizer in Java Lambda
Java reference to understand in the figure
Studying Java 8 (date API in java.time package)
I tried the new era in Java
Hit the Salesforce REST API from Java
[Java] Use cryptography in the standard library
Organized memo in the head (Java --Array)
The Java EE Security API is here!
How to get the date in java
The story of writing Java in Emacs
Console input in Java (understanding the mechanism)
Partization in Java
Changes in Java 11
Java Stream API
Pi in Java
FizzBuzz in Java
Regarding the transient modifier and serialization in Java
The story of low-level string comparison in Java
[Java] Handling of JavaBeans in the method chain
About the idea of anonymous classes in Java
A story about the JDK in the Java 11 era
Organized memo in the head (Java --Control syntax)
[Java] API creation using Jerjey (Jax-rs) in eclipse
The intersection type introduced in Java 10 is amazing (?)
The story of learning Java in the first programming
Measure the size of a folder in Java
Feel the passage of time even in Java
Try using GCP's Cloud Vision API in Java
Organized memo in the head (Java --instance edition)
[Java] Read the file in src / main / resources
Organized memo in the head (Java --Data type)
Display "Hello World" in the browser using Java
[Java] Judgment by entering characters in the terminal
Display "Hello World" in the browser using Java
[Java] Something is displayed as "-0.0" in the output
Import files of the same hierarchy in Java
Get the URL of the HTTP redirect destination in Java
[java] sort in list
[Java] Let's declare variables used in the loop in the loop [Variables in the block]