Try using the COTOHA API parsing in Java

What is COTOHA API?

It is a service provided by NTT Group that provides various natural language processing / speech processing APIs such as parsing, anaphora resolution, keyword extraction, speech recognition, and summarization.

COTOHA API | Natural language processing and speech recognition API platform utilizing Japan's largest Japanese dictionary developed by NTT Communications https://api.ce-cotoha.com/contents/index.html

Some may wonder, "Is NTT a telephone company?", But NTT laboratories have been studying Japanese processing on computers for decades. (I was also taken care of by visiting the NTT Communication Science Laboratories and reading papers when I was a student.)

Recently, cloud technology has advanced, and it has become possible to provide processing that used to run on a local computer as an API service via the Internet. It seems that it is not disclosed which company / department of NTT provides the development of the logic itself, but it seems that the API is provided by NTT Communications.

Prerequisite knowledge that should be present

URL HTTP request POST HTTP response curl command POST JSON GSON Java Maven It is around.

What is parsing?

The COTOHA API page has the following:

Parsing The parsing API receives sentences written in Japanese as input, analyzes and outputs the structure and meaning of the sentences. The input sentence is decomposed into phrases and morphemes, and semantic information such as dependency relations between clauses, dependency relations between morphemes, and part of speech information is added.

"Sentences written in Japanese (= natural sentences)"

I ran to school today.

It says a sentence like. If you divide this into phrases

Today / I went to school / ran.

It will be like. (In the API, the clause is called chunk) Also, if you divide by morpheme unit

Today / is / school / to / run / tsu / te / line / ki / better / ta /.

It will be like. (The morpheme is called token in the API) In morphological analysis, the original "go" is output for the "line" part, and the part of speech is output for each morpheme.

Machine learning is also very popular these days, but I think the reality is that it is still a little difficult to process natural sentences as they are without morphological analysis or parsing, or that good results cannot be obtained. Even if you apply machine learning, I think it is better to apply machine learning to the values obtained by morphological analysis and parsing. In the case of Japanese analysis, there is no word-separation in Japanese, and the order of words is relatively free, so I think that there may be circumstances in which "simple machine learning" is difficult to apply.

Use API

Now, let's call the COTOHA API.

As a preparation, the flow is as follows.

  1. Register for an account (https://api.ce-cotoha.com/)
  2. Obtain "Client ID" and "Client secret" as access information on the API portal (https://api.ce-cotoha.com/home).
  3. Get an "access token" from the program
  4. Access various APIs using "access token"

If you follow the guide, the account registration in 1. will be completed without any problem.

In 2., when you access the API portal, the following screen will be displayed, so make a note of it by copying the "Client ID" and "Client secret".

image.png

"Client ID" and "Client secret" are equivalent to user ID and password, but in recent APIs, it is not good to send user ID and password for each access, so first of all, "access" It is supposed to get a "token" and reuse it. The COTOHA API has a maximum 24-hour deadline, so you can reuse what you got in the first session for 24 hours.

Find out how to get an "access token"

It's almost the same when calling any API service, but first look at the specifications to find out how to access them.

Get access token|reference| COTOHA API https://api.ce-cotoha.com/contents/reference/accesstoken.html

When you look at

image.png

is what it reads.

It's not a curl command format ... Suddenly a typo!

Wrong documentation is common in the IT industry, so don't be fooled by this. (Lol) The code is the specification. (← Quotations)

But I understand what I mean. It means that you should send a POST request like the one below in the curl command.

$ curl -X POST -H "Content-Type:application/json;charset=UTF-8" -d '{"grantType":"client_credentials","clientId": "[client id]","clientSecret":"[client secret]"}' "[Access Token Publish URL]"

In the [client id] [client secret] [Access Token Publish URL] part, enter the parameters written in the portal.

Get an "access token" from the program

There are several ways to send an HTTP request in Java, but for now I'll try using a library called OkHttp. Also, since JSON is used when sending a request, this also uses the famous Gson.

Maven

<!-- https://mvnrepository.com/artifact/com.squareup.okhttp3/okhttp -->
<dependency>
	<groupId>com.squareup.okhttp3</groupId>
	<artifactId>okhttp</artifactId>
	<version>3.14.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
	<groupId>com.google.code.gson</groupId>
	<artifactId>gson</artifactId>
	<version>2.8.6</version>
</dependency>

If you write the code in Java + OkHttp, it will look like the following.

	String url = "https://api.ce-cotoha.com/v1/oauth/accesstokens";
	String clientId = "[client id]";
	String clientSecret = "[client secret]";
//		{
//		 "grantType": "client_credentials",
//		 "clientId": "[client id]",
//		 "clientSecret": "[client secret]"
//		}
	Gson gson = new Gson();
	JsonObject jsonObj = new JsonObject();
	jsonObj.addProperty("grantType", "client_credentials");
	jsonObj.addProperty("clientId", clientId);
	jsonObj.addProperty("clientSecret", clientSecret);

	OkHttpClient client = new OkHttpClient();
	MediaType JSON = MediaType.get("application/json; charset=utf-8");
	RequestBody body = RequestBody.create(JSON, jsonObj.toString());
	Request request = new Request.Builder() //
			.url(url) //
			.post(body) //
			.build();
	try (Response response = client.newCall(request).execute()) {
		int responseCode = response.code();
		String originalResponseBody = response.body().string();
		System.err.println(responseCode); // 201
		System.err.println(originalResponseBody);

		// 201
//	          {
//	              "access_token": "xxx", 
//	              "token_type": "bearer",
//	              "expires_in": "86399" ,
//	              "scope": "" ,    
//	              "issued_at": "1581590104700"           
//	          }
		}
	}

I think the output looks like the following. The actual access token is the one that is hidden as "xxx". It's not very cool programmatically, but let's copy and use it.

{
    "access_token": "xxx", 
    "token_type": "bearer",
    "expires_in": "86399" ,
    "scope": "" ,    
    "issued_at": "1581590104700"           
}

Read the specs before calling the parsing API

Read the following firmly.

API Reference-Parsing https://api.ce-cotoha.com/contents/reference/apireference.html

... There are several calling options, but the following curl command example seems to be the simplest.

$ curl -X POST -H "Content-Type:application/json;charset=UTF-8" -H "Authorization:Bearer [Access Token]" -d '{"sentence":"The dog walks.","type": "default"}' "[API Base URL]/nlp/v1/parse"

Try calling the parsing API

The curl command is a simple line, but in Java it looks like this:

	String url = "https://api.ce-cotoha.com/api/dev" + "/nlp/v1/parse";
	String sentence = "It is a good weather today.";
	String type = "default";
	String access_token = "xxx";

	Gson gson = new Gson();
	JsonObject jsonObj = new JsonObject();
	jsonObj.addProperty("sentence", sentence);
	jsonObj.addProperty("type", type);

	OkHttpClient client = new OkHttpClient();
	MediaType JSON = MediaType.get("application/json; charset=utf-8");	
	RequestBody body = RequestBody.create(JSON, jsonObj.toString());
	Request request = new Request.Builder() //
			.addHeader("Authorization", "Bearer " + access_token) //
			.url(url) //
			.post(body) //
			.build();

	try (Response response = client.newCall(request).execute()) {
		String originalResponseBody = response.body().string();
		System.err.println(originalResponseBody);
	}

result

Well, what about the result? I think it will look something like the following.

Looking at JSON while staring at the specs, information on phrases and morphemes is output in considerable detail. (The results are considerably more detailed than the analysis results of other companies' APIs.)

This JSON format output result will be parsed and used as a Java object. That part is a general Java technique rather than an API call, so I'll write it in the next article. → Continued article Parsing COTOHA API parsing in Java

{
	"result": [
		{
			"chunk_info": {"id": 0,"head": 2,"dep": "D","chunk_head": 0,"chunk_func": 1,
				"links": []
			},
			"tokens": [
				{
					"id": 0,
					"form": "today",
					"kana": "today",
					"lemma": "today",
					"pos": "noun",
					"features": ["Date and time"],
					"dependency_labels": [
						{
							"token_id": 1,
							"label": "case"
						}
					],
					"attributes": {
						
					}
				},
				{
					"id": 1,
					"form": "Is",
					"kana": "C",
					"lemma": "Is",
					"pos": "Conjunctive particles",
					"features": [],
					"attributes": {
						
					}
				}
			]
		},
		{
			"chunk_info": {
				"id": 1,
				"head": 2,
				"dep": "D",
				"chunk_head": 0,
				"chunk_func": 1,
				"links": []
			},
			"tokens": [
				{
					"id": 2,
					"form": "I",
					"kana": "I",
					"lemma": "Good",
					"pos": "Adjective stem",
					"features": [
						"Step"
					],
					"dependency_labels": [
						{
							"token_id": 3,
							"label": "aux"
						}
					],
					"attributes": {
						
					}
				},
				{
					"id": 3,
					"form": "I",
					"kana": "I",
					"lemma": "I",
					"pos": "Adjective suffix",
					"features": [
						"Attributive form"
					],
					"attributes": {
						
					}
				}
			]
		},
		{
			"chunk_info": {
				"id": 2,
				"head": -1,
				"dep": "O",
				"chunk_head": 0,
				"chunk_func": 1,
				"links": [
					{
						"link": 0,
						"label": "time"
					},
					{
						"link": 1,
						"label": "adjectivals"
					}
				],
				"predicate": []
			},
			"tokens": [
				{
					"id": 4,
					"form": "weather",
					"kana": "weather",
					"lemma": "weather",
					"pos": "noun",
					"features": [],
					"dependency_labels": [
						{
							"token_id": 0,
							"label": "nmod"
						},
						{
							"token_id": 2,
							"label": "amod"
						},
						{
							"token_id": 5,
							"label": "cop"
						},
						{
							"token_id": 6,
							"label": "punct"
						}
					],
					"attributes": {
						
					}
				},
				{
					"id": 5,
					"form": "is",
					"kana": "death",
					"lemma": "is",
					"pos": "Judgment",
					"features": [
						"stop"
					],
					"attributes": {
						
					}
				},
				{
					"id": 6,
					"form": "。",
					"kana": "",
					"lemma": "。",
					"pos": "Kuten",
					"features": [],
					"attributes": {
						
					}
				}
			]
		}
	],
	"status": 0,
	"message": ""
}

Link

COTOHA API Portal

that's all

Recommended Posts

Try using the COTOHA API parsing in Java
Parsing the COTOHA API in Java
Try using the Stream API in Java
Try using JSON format API in Java
ChatWork4j for using the ChatWork API in Java
Try using GCP's Cloud Vision API in Java
Try using RocksDB in Java
Try global hooking in Java using the JNativeHook library
Comments on the COTOHA Parsing API
I tried using Elasticsearch API in Java
Call the Windows Notification API in Java
Try using the Rails API (zip code)
Try calling the CORBA service in Java 11+
Try using the Emotion API from Android
Try using the Wii remote with Java
[Java] API creation using Jerjey (Jax-rs) in eclipse
NLP4J [002] Try parsing Japanese using Yahoo! Developer Network Japanese Parsing Analysis (V1) in Java
Try using Sourcetrail (macOS version) in Java code
Try accessing the dataset from Java using JZOS
Zabbix API in Java
I called the COTOHA API parser 100 times in Java to measure performance.
Display "Hello World" in the browser using Java
Try adding text to an image in Scala using the Java standard library
NLP4J [004] Try text analysis using natural language processing and parsing statistical processing in Java
I tried using Google Cloud Vision API in Java
Try implementing the Eratosthenes sieve using the Java standard library
Differences in code when using the length system in Java
Try scraping using java [Notes]
Try calling JavaScript in Java
Try developing Spresense in Java (1)
Try functional type in Java! ①
Java Stream API in 5 minutes
2 Implement simple parsing in Java
Try using gRPC in Ruby
Implement Thread in Java and try using anonymous class, lambda
[Java] Try editing the elements of the Json string using the library
Translate using Microsoft Translator Text API in Java (Japanese → English)
Tips for using Salesforce SOAP and Bulk API in Java
How to play MIDI files using the Java Sound API
Access the network interface in Java
Try implementing Android Hilt in Java
Guess the character code in Java
Try implementing GraphQL server in Java
[java8] To understand the Stream API
Export issues using JIRA's Java API
[Parse] Hit the API using callFunctionInBackground
Specify the java location in eclipse.ini
Encrypt using RSA cryptography in Java
Java comparison using the compareTo () method
Try using Redis with Java (jar)
Generate CloudStack API URL in Java
Try an If expression in Java
Hit Zaim's API (OAuth 1.0) in Java
I tried using Java8 Stream API
Hit the Docker API in Rust
[Java] Try to implement using generics
Try using the messaging system Pulsar
HTTPS connection using tls1.2 in Java 6
Try running AWS X-Ray in Java
Try to implement Yubaba in Java
Try using IBM Java method tracing