Java to extract PDF text content

In your daily work, you may need to extract the textual content contained in a huge PDF document. And Free Spire.PDF for Java provides a convenient and fast way to extract text, then introduce the Java code used in the process.

** Basic steps: ** ** 1. ** Free Spire.PDF for Java Download and unzip the package. ** 2. ** Import the Spire.Pdf.jar package in the lib folder as a dependency into your Java application or install the JAR package from the Maven repository (see below for the code that makes up the pom.xml file) please). ** 3. ** In your Java application, create a new Java Class (named ExtractText here) and enter and execute the corresponding Java code.

** Configure the pom.xml file: **


** The PDF source document is: ** sample.jpg

** Java code: **

import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;

public class ExtractText {

    public static void main(String[] args) {

        //Create a PdfDocument instance
        PdfDocument doc = new PdfDocument();
        //Load PDF file

        //Create a StringBuilder instance
        StringBuilder sb = new StringBuilder();

        PdfPageBase page;
        //Traverse the PDF pages, get the text for each page and add it to the StringBuilder object
        for(int i= 0;i<doc.getPages().getCount();i++){
            page = doc.getPages().get(i);
        FileWriter writer;
        try {
            //Writes the text of a StringBuilder object to a text file
            writer = new FileWriter("ExtractText.txt");
        } catch (IOException e) {


** Extract results: ** text.jpg

Recommended Posts

Java to extract PDF text content
Java adds table to PDF
Add watermark to Java to PDF document
Try to extract java public method
Java adds form fields to PDF
Java introductory text
[Java] Introduction to Java
Introduction to java
Text extraction in Java from PDF with pdfbox-2.0.8
Java adds a text box to PowerPoint slides
Java adds page numbers to existing PDF documents
Java enables extraction of PDF text and images
Launch Docker from Java to convert Office documents to PDF
[Java] Convert PDF version
Changes from Java 8 to Java 11
Sum from Java_1 to 100
Java compressed PDF document
Java extracts text content of SmartArt shapes in PowerPoint
[Java] How to extract the file name from the path
[Java] Connect to MySQL
Kotlin's improvements to Java
[Java] PDF viewing settings
Java applications convert Word (DOC / DOCX) documents to PDF
From Java to Ruby !!
Introduction to java command
Append text to BlobItem in Azure BlobStorage SDK Java V8
Java basic learning content 7 (exception)
[Java] How to use Map
[Java] Extract substrings (AOJ13 --shuffle)
How to lower java version
Migration from Cobol to JAVA
[Java] How to use Map
Convert Java Powerpoint to XPS
How to uninstall Java 8 (Mac)
Java to play with Function
Save Java PDF in Excel
Java --How to make JTable
How to use java Optional
Java encryption and decryption PDF
Java basic learning content 5 (modifier)
New features from Java7 to Java8
How to minimize Java images
How to write java comments
How to use java class
Paging PDF with Java + PDFBox.jar
[Java] How to use Optional ②
Connect from Java to PostgreSQL
[Java] How to use removeAll ()
[Java] How to display Wingdings
Java turns Excel into PDF
Save Java HTML as PDF
[Java] Introduction to lambda expressions
[Java] How to use string.format
Shell to kill Java process
How to use Java Map
[Java] Content acquisition with HttpCliient
How to set Java constants
Connect to DB with Java
Connect to MySQL 8 with Java
[java] Reasons to use static
How to use Java variables