This is a sequel to I tried OCR processing a PDF file with Java. I mainly write about tess4j 4.1
Even if I google about tess4j 4.1, I do not get much information, so I will write how to move it and the result of moving it If you only use the information on the net, you will get a run-time error.
Here is the changed part from I tried OCR processing of PDF file with Java.
compile group: 'net.sourceforge.tess4j', name: 'tess4j', version: '4.1.1'
I will describe the module dependency with a feeling tessdata/configs/api_config
Will be added. Without this description, a run-time error will occur. jpn.traineddata Overwrite with the learning data downloaded from GitHub
Just run it from Gradle with the run command
I compared the execution results of 3 series and 4 series with Win10pro iCore5 2.2GH memory 16G by the conversion process of "2016 Spring Information Security Supporter Examination 2 pm" 4 series about 2.5 minutes 3 series about 8 minutes 4 series is overwhelmingly faster
In the 3rd series, the misconversion rate was unreasonably high if Japanese and English characters were mixed, but in the 4th series, this was dramatically improved. For example, in 3 series
Q-What are the characteristics of Pus S? By the bell chief,The number of stages is decided.
The part that was converted to is 4 series
Q (1) What are the characteristics of AES? By the key size,The number of stages is decided.
It is designed to be properly converted to meaningful characters