Topic Analysis (LDA) in Java

About this page

I wanted to practice JavaFX & self-contained packages, I wanted to make a tool that I could use anyway, so I decided to use LDA. Here is a summary of LDA on Java. See the links at the end of this article for JavaFX & self-contained packages.

What is LDA?

It is one of the (general purpose) machine learning methods for estimating topics from a specified set of documents in natural language processing. As of 2019, deep learning is popular, but before that, I was on a business trip to improve accuracy in the NLP area. For those who want to read about logic, the following pages are easy to read and recommended.

Excerpt from Latent Dirichlet Allocation (LDA) Introduction to Yurufuwa

LDA is a type of language model that assumes that a document consists of multiple topics. In Japanese, it is called "Latent Dirichlet Allocation Method". If you describe a word as superficial, the topic is latent because it does not appear on the surface unlike a word. I wonder if it is called the "latent Dirichlet distribution method" because the Dirichlet distribution is assumed to be the prior distribution of the distribution of the potential elements. (Omitted) The Dirichlet distribution is roughly the probability distribution of the probability distribution. For example, if there are three topics, "sports", "economy", and "politics" Probability of generation of each topic (sports, economy, politics) = (0.3, 0.2, 0.5) The probability that 0> .1, (sports, economy, politics) = (0.1, 0.2, 0.7) determines the probability of the probability distribution as 0.2.

Implementation policy

As an implementation policy of LDA, it seems that the Python library called gensim is famous. Reference: Introduction to gensim

From the perspective of simplifying subsequent JavaFX apps, we will not collaborate with Python & Java. An implementation example of Java LDA was published on GitHub, so I decided to borrow it. Thanks.

I did two hits under the name of LDA4j, but this time I adopted the module of Mr. hankcs. I almost like it. (As an input document set, 1 file (1 document per line) in breakbee / LDA4J, In hankcs / LDA4j, there was a difference between multiple files (1 file, 1 document), I personally preferred by file)

Let's run

environment service/version
Execution environment Windows10
Development environment eclipse 4.1.0
development language Java 8

Pull the module to eclipse appropriately. Download forks & clones or Zip from Github and import projects. After that, I created my own execution module. (


As the ReadMe says ...


import java.util.Map;

import com.hankcs.lda.Corpus;
import com.hankcs.lda.LdaGibbsSampler;
import com.hankcs.lda.LdaUtil;

public class MainRunner {
	public static void main(String[] args)
		// 1. Load corpus from disk
		Corpus corpus;
		try {
			corpus = Corpus.load("data/mini");
			// 2. Create a LDA sampler
			LdaGibbsSampler ldaGibbsSampler = new LdaGibbsSampler(corpus.getDocument(), corpus.getVocabularySize());
			// 3. Train it
			// 4. The phi matrix is a LDA model, you can use LdaUtil to explain it.
			double[][] phi = ldaGibbsSampler.getPhi();
			Map<String, Double>[] topicMap = LdaUtil.translate(phi, corpus.getVocabulary(), 10);
		} catch (IOException e) {
			//TODO auto-generated catch block

Try running MainRunner from Execution ⇒ Execution Configuration ⇒ Java Application. You should see the following output on the console. Estimates the specified number (10) of topics for the set of documents stored in data / mini.

Sampling 1000 iterations with burn-in of 100 (B/S=20).
topic 0 :

topic 1 :
Beautiful country=0.007753386939328633

~ Omitted ~

topic 9 :

A story that I actually used as a tool

Recommended Posts

Topic Analysis (LDA) in Java
Morphological analysis in Java with Kuromoji
Creating lexical analysis in Java 8 (Part 2)
1 Implement simple lexical analysis in Java
Creating lexical analysis in Java 8 (Part 1)
Partization in Java
Changes in Java 11
Rock-paper-scissors in Java
Pi in Java
FizzBuzz in Java
NLP4J [001b] Morphological analysis in Java (using kuromoji)
Interpreter implementation in Java
Make Blackjack in Java
Rock-paper-scissors app in Java
Constraint programming in Java
Put java8 in centos7
NVL-ish guy in Java
"Hello World" in Java
Callable Interface in Java
Comments in Java source
Azure functions in java
Format XML in Java
Simple htmlspecialchars in Java
Boyer-Moore implementation in Java
Hello World in Java
Use OpenCV in Java
webApi memorandum in java
Type determination in Java
Ping commands in Java
Various threads in java
Heapsort implementation (in java)
Zabbix API in Java
ASCII art in Java
Compare Lists in Java
Express failure in Java
Create JSON in Java
Date manipulation in Java 8
What's new in Java 8
Use PreparedStatement in Java
What's new in Java 9,10,11
Parallel execution in Java
Initializing HashMap in Java
In 2021, there is no topic in Java these days (Poem)
Try using RocksDB in Java
Read binary files in Java 1
Avoid Yubaba's error in Java
Get EXIF information in Java
Save Java PDF in Excel
[Neta] Sleep Sort in Java
Edit ini in Java: ini4j
Java history in this world
Let Java segfault in 6 lines
Log aggregation and analysis (working with AWS Athena in Java)
Try developing Spresense in Java (1)
Try functional type in Java! ①
I made roulette in Java.
Create hyperlinks in Java PowerPoint
Implement two-step verification in Java
Write flyway callbacks in Java
Importing Excel data in Java 2