When
Based on, I created the following classes so that the COTOHA API can be used easily. (It will be released on the Maven Repository at a later date)
CotohaNlpService.java https://github.com/oyahiroki/nlp4j/blob/master/nlp4j/nlp4j-cotoha/src/main/java/nlp4j/cotoha/CotohaNlpService.java
By using this class, you can easily obtain the morphological analysis result as follows. (Variables required for API call are set as environment variables) Separately, I created a class that formats JSON and saves it in a text file separated by line breaks, so it is possible to save the number of calls.
CotohaNlpService service = new CotohaNlpService();
DefaultNlpServiceResponse response = service.nlpV1Parse("I ran to school today.");
//The JSON of the response is output. There is also a method that returns as a keyword class.
System.err.println(response.getOriginalResponseBody());
We downloaded data from the Ministry of Land, Infrastructure, Transport and Tourism "Automobile Recall / Problem Information" and used 100 cases. Of course it is troublesome to copy HTML by hand, so [Gonyogonyo](https://github.com/oyahiroki/nlp4j/blob/master/nlp4j/nlp4j-webcrawler/src/main/java/nlp4j/webcrawler/mlit /MlitCarInfoCrawler.java) ^ _ ^;
Ministry of Land, Infrastructure, Transport and Tourism "Car recall / defect information" http://carinf.mlit.go.jp/jidosha/carinf/opn/index.html
The result of calling 100 times is as follows.
The horizontal axis is the number of trials, and the vertical axis is the processing time. It includes JSON parsing processing on the client side, but you can think of it as a few ms. Initially it took about 2000 ms, but this also includes getting the Token. It's usually within 200-300 ms, but sometimes it seems to slow down suddenly. I think there is a high possibility that it is slow for the DEV environment, and of course it depends on the network environment.
Next is a graph of string length and processing time. Apparently, there seems to be no correlation between the length of the string and the processing time. (I think that extremely long characters are likely to change)
Below is the raw data.
length,time
61,2001
73,337
54,310
79,349
58,274
51,269
41,660
21,263
38,283
74,295
52,4472
70,1138
68,3074
31,243
39,251
15,219
11,258
14,259
62,293
66,276
27,272
18,220
63,278
62,428
68,284
50,288
43,250
45,264
70,273
58,250
157,593
88,280
66,264
26,272
38,1514
8,237
42,256
53,1472
42,2668
35,230
32,235
36,241
116,325
17,254
102,309
59,268
21,220
43,278
64,249
32,246
31,247
27,252
70,3698
61,340
51,233
23,225
20,226
60,310
50,1685
72,281
37,270
45,253
13,224
54,243
64,302
52,1876
90,3251
30,9501
73,2323
70,3689
70,1304
61,303
67,262
17,3032
128,302
63,272
33,238
32,257
106,3906
57,261
103,299
82,270
71,268
158,803
41,255
36,284
62,304
36,234
38,1778
19,1478
90,345
22,239
62,310
72,2555
66,256
25,927
33,242
39,283
24,237
42,247
Recommended Posts