import com.ironsoftware.ironpdf.*; import java.io.IOException; import java.nio.file.Paths; // Apply your license key License.setLicenseKey("YOUR-LICENSE-KEY"); // Set a log path Settings.setLogPath(Paths.get("C:/tmp/IronPdfEngine.log")); // Render the HTML as a PDF. Stored in myPdf as type PdfDocument; PdfDocument myPdf = PdfDocument.renderHtmlAsPdf("<h1> ~Hello World~ </h1> Made with IronPDF!"); // Save the PdfDocument to a file myPdf.saveAs(Paths.get("html_saved.pdf"));

USING IRONPDF FOR JAVA

How to Read PDF File in Java

Updated August 25, 2024

Reading a PDF document in Java can be an integral part of any project, ranging from business applications to data analytics. With the IronPDF library, it has become easier than ever before to integrate PDF processing capabilities in your Java projects.

How to Read PDF Files in Java

Install a Java library to read PDF files
Load an existing PDF document using the fromFile method
Render a new PDF from an HTML string, file, or web URL
Utilize the extractAllText method to read text from the opened file
Print the extracted text to the console or save it to a text file in Java

IronPDF: Import Java PDF Library

IronPDF Java PDF library is the perfect solution for Software Developers who need to produce high-quality, capture-ready PDFs quickly from HTML. The library also provides powerful document manipulation tools that enable dynamic control over page layout, content, and formatting.

Let's see how to read a PDF file stored at a path in a Java program using the IronPDF library.

Read PDFs using IronPDF

The first step is to install IronPDF using Maven, more details can be found in this installation guide.

Install IronPDF in Maven

Here are the steps to install IronPDF in a Maven project:

Open your Maven project in your preferred IDE.

In the pom.xml file, add the IronPDF library dependency in the dependencies section.

<dependency>
   <groupId>com.ironsoftware</groupId>
   <artifactId>com.ironsoftware</artifactId>
   <version>2024.9.1</version>
</dependency>

Save the pom.xml file and let Maven download and install the IronPDF library.

Once the installation is complete, you should be able to import and use the IronPDF's following classes and Apache Tika parsers in your project.

Java Code to Read PDF Document

Here is the code which you can use to read the new file with or without tabular boundaries using the IronPDF library.

import com.ironsoftware.ironpdf.*;
import java.io.IOException;
import java.nio.file.Paths;

public class Test {
    public static void main(String[] args) throws IOException {
    PdfDocument pdf = PdfDocument.fromFile(Paths.get("C:\\sample.pdf"));  
    String text = pdf.extractAllText();
    System.out.println(text);
    }
}

import com.ironsoftware.ironpdf.*;
import java.io.IOException;
import java.nio.file.Paths;

public class Test {
    public static void main(String[] args) throws IOException {
    PdfDocument pdf = PdfDocument.fromFile(Paths.get("C:\\sample.pdf"));  
    String text = pdf.extractAllText();
    System.out.println(text);
    }
}

JAVA

In this program, the PdfDocument class from the IronPDF library is used to read the contents of a PDF file. The first line of the program imports the required classes from the IronPDF library. The second line imports the IOException class from the Java standard library.

The program defines a public class named Test. Inside the class, there is a public static method named main that takes an array of strings as an argument.

The main method uses the fromFile method of the PdfDocument class to load a PDF file located at "C:\sample.pdf". This method returns a PdfDocument object that represents the PDF file.

Once the PDF file is loaded, the program calls the [extractAllText](/java/object-reference/api/com/ironsoftware/ironpdf/PdfDocument.html#extractAllText()) method of the PdfDocument class to extract all the text from the PDF file. This method returns a String that contains all the text in the PDF file.

The extracted text is then stored in a String variable named "text". This variable can be used to process or display the contents of the PDF file.

Finally, the program prints the extracted text to the console using the System.out.println method.

How to Read PDF File in Java, Figure 1: Program Output Program Output

Conclusion

IronPDF is a great solution for reading PDF files within the same path or multiple different paths in Java, as it offers high performance and many features that make developing PDFs easily. Its syntax is straightforward and user-friendly. Its API allows developers to quickly craft the code that they need for their projects.

IronPDF's licensing plans start from just $749, making it accessible to extract content for those on a budget. Overall, IronPDF provides an excellent option for any Java developer looking to work with PDFs in their Java applications programming.

< PREVIOUS
How to Add Digital Signatures to PDFs in Java

NEXT >
How to Rotate PDF File in Java