NLP for Java (NLP4J) (2)

I made a code that you can enjoy text mining by writing a simple Java program like the following, so I am thinking of releasing it as open source soon. We are targeting people who want to do natural language processing and want to do text mining.

The morphological analysis engine uses Yahoo Japan's Web service. Outputs characteristic keywords in the document using the results of morphological analysis

Processing and Input

		List<Document> docs = new ArrayList<Document>();
		{
			docs.add(createDocument("Toyota", "I am making a hybrid car."));
			docs.add(createDocument("Toyota", "We sell hybrid cars."));
			docs.add(createDocument("Toyota", "I'm making a car."));
			docs.add(createDocument("Toyota", "I sell cars."));
			docs.add(createDocument("Nissan", "I'm making an EV."));
			docs.add(createDocument("Nissan", "I sell EVs."));
			docs.add(createDocument("Nissan", "I sell cars."));
			docs.add(createDocument("Nissan", "We are affiliated with Renault."));
			docs.add(createDocument("Nissan", "I sell light cars."));
			docs.add(createDocument("Honda", "I'm making a car."));
			docs.add(createDocument("Honda", "I sell cars."));
			docs.add(createDocument("Honda", "I'm making a motorcycle."));
			docs.add(createDocument("Honda", "I sell motorcycles."));
			docs.add(createDocument("Honda", "I sell light cars."));
			docs.add(createDocument("Honda", "I am making a light car."));
		}
		Annotator annotator = new YJpMaAnnotator();{
			//Morphological analysis processing
			annotator.annotate(docs);
		}
		Index index = new SimpleDocumentIndex();{
			//Keyword indexing process
			index.addDocuments(docs);
		}
		{
			//Acquisition of highly co-occurrence keywords
			List<Keyword> kwds = index.getKeywords("noun", "item=Nissan");
			System.out.println("Keywords(noun) for Nissan");
			for (Keyword kwd : kwds) {
				System.out.println(String.format("%.1f,%s", kwd.getCorrelation(), kwd.getLex()));
			}
		}
		{
			//Acquisition of highly co-occurrence keywords
			List<Keyword> kwds = index.getKeywords("noun", "item=Toyota");
			System.out.println("Keywords(noun) for Toyota");
			for (Keyword kwd : kwds) {
				System.out.println(String.format("%.1f,%s", kwd.getCorrelation(), kwd.getLex()));
			}
		}
		{
			//Acquisition of highly co-occurrence keywords
			List<Keyword> kwds = index.getKeywords("noun", "item=Honda");
			System.out.println("Keywords(noun) for Honda");
			for (Keyword kwd : kwds) {
				System.out.println(String.format("%.1f,%s", kwd.getCorrelation(), kwd.getLex()));
			}
		}
	}

Output: Displays keywords characteristic of Nissan in descending order of coefficient

Keywords for Nissan
3.0,EV
3.0,Renault
3.0,Alliance
1.0,Light car
0.6,Automobile

Click here for Toyota and Honda

Keywords(noun) for Toyota
3.8,hybrid
3.8,car
1.5,Automobile
Keywords(noun) for Honda
2.5,bike
1.7,Light car
1.0,Automobile

Recommended Posts

NLP for Java (NLP4J) (2)
NLP for Java (NLP4J) (1)
For JAVA learning (2018-03-16-01)
Java for statement
[Java] for statement, while statement
[Java] for statement / extended for statement
Countermeasures for Java OutOfMemoryError
(Memo) Java for statement
Java update for Scala users
Java debug execution [for Java beginners]
[Java] Basic statement for beginners
[Java] Precautions for type conversion
Books used for learning Java
2018 Java Proficiency Test for Newcomers-Basics-
Java thread safe for you
[Java] Summary of for statements
Java for beginners, data hiding
[Java] Tips for writing source
Java installation location for mac
Java application for beginners: stream
Java while and for statements
C # cheat sheet for Java engineers
New grammar for Java 12 Switch statements
[For beginners] Summary of java constructor
AWS SDK for Java 1.11.x and 2.x
Rock-paper-scissors game for beginners in Java
Java for beginners, expressions and operators 1
[Java] Memo for naming class names
[For beginners] Run Selenium in Java
Hello World for ImageJ Java Plugin
[OpenCV3.2.0] Eclipse (Java) settings (for Mac)
Java for beginners, expressions and operators 2
Enable OpenCV with java8. (For myself)
Spring Framework tools for Java developer
java (use class type for field)
Build Java development environment (for Mac)
Java
[Java & SpringBoot] Environment Construction for Mac
Settings for SSL debugging in Java
Generics of Kotlin for Java developers
Java
Diary for Java SE 8 Silver qualification
[For Java beginners] About exception handling
Classes and instances Java for beginners
Modern best practices for Java testing
GraalVM for Java Performance (Windows Developer Build)
[Until March 5, 2020] Renew RDS certificate for java
Getting Started with Ruby for Java Engineers
[Java Spring MVC] Controller for development confirmation
Memory measurement for Java apps using jstat
Introduction to java for the first time # 2
First steps for deep learning in Java
Java for All! I read everyone's Java #minjava
I tried Cassandra's Object Mapper for Java
Key points for introducing gRPC in Java
Learn Java with "So What" [For beginners]
About the procedure for java to work
[Java + jsoup] Scraping Mercari's products for sale
[Java] for Each and sorted in Lambda
[For beginners] Difference between Java and Kotlin
Sample code collection for Azure Java development