How to Read PDF File in Java
This article will explore how to create a PDF reader, to open a PDF file in your software application programmatically. To perform this task effectively, IronPDF for Java is one such system library which helps open and read PDF files using the filename in Java programs.
How to Read PDF Files in Java
- Download the IronPDF Java Library
- Use
fromFile
method to load existing PDF documents - Call
extractAllText
method to extract embedded text in PDFs - Extract text from a specific page with
extractTextFromPage
method - Retrieve text from PDFs rendered from URL
IronPDF
The IronPDF - Java Library is built on top of the already successful .NET Framework. This makes IronPDF a versatile tool for working with PDF documents compared to other class libraries such as Apache PDFBox. It provides the facility to extract and parse content, load text, and load images. It also provides options to customize the PDF pages such as page layout, margins, header and footer, page orientation, and much more.
In addition to this, IronPDF also supports conversion from other file formats, protecting PDFs with a password, digital signing, merging, and splitting PDF documents.
How to Read PDF Files in Java
Prerequisites
To use IronPDF to make a Java PDF reader, it is necessary to ensure that the following components are installed on the computer:
- JDK - Java Development Kit is required for building and running Java programs. If it is not installed, download it from the Oracle Website.
- IDE - Integrated Development Environment is software that helps write, edit, and debug a program. Download any IDE for Java, e.g., Eclipse, NetBeans, IntelliJ.
- Maven - Maven is an automation tool that helps download libraries from the Central Repository. Download it from the Apache Maven Website.
- IronPDF - Finally, IronPDF is required to read the PDF file in Java. This needs to be added as a dependency in your Java Maven Project. Include the IronPDF artifact along with the slf4j dependency in the
pom.xml
file as shown in the example below:
<!-- Add Maven dependencies for IronPDF -->
<dependencies>
<!-- IronPDF Dependency -->
<dependency>
<groupId>com.ironsoftware</groupId>
<artifactId>ironpdf</artifactId>
<version>your-version-here</version>
</dependency>
<!-- SLF4J Dependency necessary for logging -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.32</version>
</dependency>
</dependencies>
<!-- Add Maven dependencies for IronPDF -->
<dependencies>
<!-- IronPDF Dependency -->
<dependency>
<groupId>com.ironsoftware</groupId>
<artifactId>ironpdf</artifactId>
<version>your-version-here</version>
</dependency>
<!-- SLF4J Dependency necessary for logging -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.32</version>
</dependency>
</dependencies>
Adding Necessary Imports
Firstly, add the following code on top of the Java source file to reference all the required methods from IronPDF:
import com.ironsoftware.ironpdf.*;
// Necessary imports from IronPDF library
import com.ironsoftware.ironpdf.*;
// Necessary imports from IronPDF library
Next, configure IronPDF with a valid license key to use its methods. Invoke the setLicenseKey
method in the main method.
License.setLicenseKey("Your license key");
// Set your IronPDF license key - required for full version
License.setLicenseKey("Your license key");
// Set your IronPDF license key - required for full version
Note: You can get a free trial license key to create, read, and print PDFs.
Read Existing PDF File in Java
To read PDF files, there must be PDF files, or one can be created. This article will use an already created PDF file. The code is simple and a two-step process to extract text from the document:
// Load the PDF document from file
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
// Extract all text from the PDF
String text = pdf.extractAllText();
// Print the extracted text
System.out.println(text);
// Load the PDF document from file
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
// Extract all text from the PDF
String text = pdf.extractAllText();
// Print the extracted text
System.out.println(text);
In the above code, fromFile
opens a PDF document. The Paths.get
method gets the directory of the file and is ready to extract content from the file. Then, [extractAllText
](/java/object-reference/api/com/ironsoftware/ironpdf/PdfDocument.html#extractAllText()) reads all the text in the document.
The output is below:
Reading PDF Text Output
Read Text from a Specific Page
IronPDF can also read content from a specific page in a PDF. The extractTextFromPage
method uses a PageSelection
object to accept a range of page(s) from which text will be read.
In the following example, the text is extracted from the second page of the PDF document. PageSelection.singlePage
takes the index of the page which needs to be extracted (index starting from 0).
// Load the PDF document from file
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
// Extract text from the second page (page index based, starts at 0, so 1 means second page)
String text = pdf.extractTextFromPage(PageSelection.singlePage(1));
// Print the extracted text from the specified page
System.out.println(text);
// Load the PDF document from file
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
// Extract text from the second page (page index based, starts at 0, so 1 means second page)
String text = pdf.extractTextFromPage(PageSelection.singlePage(1));
// Print the extracted text from the specified page
System.out.println(text);
Reading PDF Text Output
Other methods available in the PageSelection
class which can be used to extract text from various pages include: [firstPage
](/java/object-reference/api/com/ironsoftware/ironpdf/edit/PageSelection.html#lastPage()), [lastPage
](/java/object-reference/api/com/ironsoftware/ironpdf/edit/PageSelection.html#firstPage()), pageRange
, and [allPages
](/java/object-reference/api/com/ironsoftware/ironpdf/edit/PageSelection.html#allPages()).
Read Text from a Newly-Generated PDF File
Search text can also be performed from a newly generated PDF file from either an HTML file or URL. The following sample code generates PDFs from URL and extracts all text from the website.
// Generate PDF from a URL
PdfDocument pdf = PdfDocument.renderUrlAsPdf("https://unsplash.com/");
// Extract all text from the generated PDF
String text = pdf.extractAllText();
// Print the extracted text from the URL
System.out.println("Text extracted from the website: " + text);
// Generate PDF from a URL
PdfDocument pdf = PdfDocument.renderUrlAsPdf("https://unsplash.com/");
// Extract all text from the generated PDF
String text = pdf.extractAllText();
// Print the extracted text from the URL
System.out.println("Text extracted from the website: " + text);
Read from a New File
IronPDF can also be used to extract images from PDF files.
The complete code is as follows:
import com.ironsoftware.ironpdf.License;
import com.ironsoftware.ironpdf.PdfDocument;
import com.ironsoftware.ironpdf.edit.PageSelection;
import java.io.IOException;
import java.nio.file.Paths;
public class Main {
public static void main(String[] args) throws IOException {
// Set the IronPDF license key for commercial use
License.setLicenseKey("YOUR LICENSE KEY HERE");
// Read text from a specific page in an existing PDF
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
String text = pdf.extractTextFromPage(PageSelection.singlePage(1));
System.out.println(text);
// Read all text from a PDF generated from a URL
pdf = PdfDocument.renderUrlAsPdf("https://unsplash.com/");
text = pdf.extractAllText();
System.out.println("Text extracted from the website: " + text);
}
}
import com.ironsoftware.ironpdf.License;
import com.ironsoftware.ironpdf.PdfDocument;
import com.ironsoftware.ironpdf.edit.PageSelection;
import java.io.IOException;
import java.nio.file.Paths;
public class Main {
public static void main(String[] args) throws IOException {
// Set the IronPDF license key for commercial use
License.setLicenseKey("YOUR LICENSE KEY HERE");
// Read text from a specific page in an existing PDF
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
String text = pdf.extractTextFromPage(PageSelection.singlePage(1));
System.out.println(text);
// Read all text from a PDF generated from a URL
pdf = PdfDocument.renderUrlAsPdf("https://unsplash.com/");
text = pdf.extractAllText();
System.out.println("Text extracted from the website: " + text);
}
}
Summary
This article explained how to open and read PDFs in Java using IronPDF.
IronPDF helps easily create PDFs from HTML or URL and convert from different file formats. It also helps in getting PDF tasks done quickly and easily.
Try IronPDF for 30 days with a free trial and find out how well it works for you in production. Explore commercial licensing options for IronPDF which start only from $749.
Frequently Asked Questions
How can I create a PDF reader in Java?
You can create a PDF reader in Java using IronPDF by utilizing the `fromFile` method to load PDF documents and then using methods like `extractAllText` to parse and manipulate the content.
What are the steps to install prerequisites for using IronPDF in Java?
To use IronPDF in Java, you need to install the Java Development Kit (JDK), set up an Integrated Development Environment (IDE) such as Eclipse or IntelliJ, configure Maven for dependency management, and include the IronPDF library in your project.
How do I extract text from a PDF file in Java?
To extract text from a PDF file in Java using IronPDF, you can use the `extractAllText` method to retrieve the entire document's text or `extractTextFromPage` to extract text from a specific page.
Can I generate a PDF from a URL in Java?
Yes, with IronPDF, you can generate a PDF from a URL by using the `renderUrlAsPdf` method, which converts web content into a PDF format.
Does IronPDF support adding password protection to PDFs in Java?
Yes, IronPDF supports adding password protection to PDFs, along with other features such as digital signing and merging or splitting documents.
What file formats can IronPDF convert to PDF in Java?
IronPDF can convert various file formats to PDF, including HTML and other document formats, providing flexible options for PDF generation and manipulation.
Is there a trial version available for IronPDF in Java?
Yes, IronPDF offers a 30-day free trial, allowing you to test its features and evaluate its performance in your Java applications before purchasing a license.
How can I extract text from a specific page in a PDF document using a Java library?
Using IronPDF, you can extract text from a specific page in a PDF by employing the `extractTextFromPage` method, which requires specifying the page number or range.