OCR in Java (character recognition from images)

things to do

Get text from images using OSS tess4j

Maven Copy and paste from mvnrepository to POM.xml


tess4j-4.3.1.jar is downloaded キャプチャ.PNG

If Maven cannot be used from here

Japanese recognition file

Get the Japanese recognition file (jpn.traineddata) from GitHub repository



import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import javax.imageio.ImageIO;
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;

public class OcrTrial {
	public static void main(String[] args) throws IOException, TesseractException {
		//Load image
		File file = new File("C:\\work\\INPUT.JPG");
		BufferedImage img = ImageIO.read(file);

		ITesseract tesseract = new Tesseract();
		tesseract.setDatapath("C:\\work"); //Language file (jpn.traineddata)))
		tesseract.setLanguage("jpn"); //Specify "Japanese" as the analysis language

		String str = tesseract.doOCR(img);


Image file set to INPUT


Output result



This is the mistake 〇 (pictogram) × (Pivot Gram)

The recognition rate seems to be high if the image can be clearly identified as characters.

next time

-[] Try various images

