在 JAVA 中使用 IRONPDF

如何在 Java 中读取 PDF 文件

Darrius Serrant

已更新:七月 28, 2025

这篇文章将探讨如何创建一个PDF阅读器，以编程方式在您的软件应用程序中打开PDF文件。要有效地执行此任务，IronPDF for Java 是一款有助于在Java程序中使用文件名打开和读取PDF文件的系统库。

class="hsg-featured-snippet">

在Java中如何读取PDF文件

下载IronPDF Java库
使用fromFile方法加载现有的 PDF 文档
调用extractAllText方法提取PDF中的嵌入文本
使用extractTextFromPage方法从特定页面提取文本
从URL渲染的PDF中检索文本

IronPDF。

IronPDF - Java Library 构建在已成功的.NET Framework之上。与Apache PDFBox等其他类库相比，这使得IronPDF成为处理PDF文档的多功能工具。它提供了提取和解析内容、加载文本和加载图片的功能。 It also provides options to customize the PDF pages such as page layout, margins, header and footer, page orientation, and much more.

除此之外，IronPDF还支持从其他文件格式转换、使用密码保护PDF、数字签名、合并和拆分PDF文档。

在Java中如何读取PDF文件

前提条件

要使用IronPDF制作Java PDF阅读器，必须确保计算机上安装了以下组件：

JDK - Java开发工具包是构建和运行Java程序所需的。如果未安装，请从Oracle网站下载。
IDE - 集成开发环境是一种帮助编写、编辑和调试程序的软件。下载任何用于Java的IDE，例如Eclipse、NetBeans、IntelliJ。
Maven - Maven是一种自动化工具，帮助从中央仓库下载库。从Apache Maven网站下载。
IronPDF - 最后，需要IronPDF来在Java中读取PDF文件。需要将该项添加为Java Maven项目中的依赖项。在pom.xml文件中包括IronPDF工件与slf4j依赖项，如下面的示例所示：

<!-- Add Maven dependencies for IronPDF -->
<dependencies>
    <!-- IronPDF Dependency -->
    <dependency>
        <groupId>com.ironsoftware</groupId>
        <artifactId>ironpdf</artifactId>
        <version>your-version-here</version>
    </dependency>

    <!-- SLF4J Dependency necessary for logging -->
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.7.32</version>
    </dependency>
</dependencies>

<!-- Add Maven dependencies for IronPDF -->
<dependencies>
    <!-- IronPDF Dependency -->
    <dependency>
        <groupId>com.ironsoftware</groupId>
        <artifactId>ironpdf</artifactId>
        <version>your-version-here</version>
    </dependency>

    <!-- SLF4J Dependency necessary for logging -->
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.7.32</version>
    </dependency>
</dependencies>

XML

添加必要的导入

首先，在Java源文件顶部添加以下代码，以引用来自IronPDF的所有必需方法：

import com.ironsoftware.ironpdf.*;
// Necessary imports from IronPDF library

import com.ironsoftware.ironpdf.*;
// Necessary imports from IronPDF library

JAVA

接下来，配置IronPDF并使用有效的许可证密钥以使用其方法。在主方法中调用setLicenseKey方法。

License.setLicenseKey("Your license key");
// Set your IronPDF license key - required for full version

License.setLicenseKey("Your license key");
// Set your IronPDF license key - required for full version

JAVA

注意：您可以获得一个试用许可证密钥来创建、读取和打印PDF。

在Java中读取现有的PDF文件

要读取PDF文件，必须有PDF文件，或者可以创建一个。本文将使用已创建的PDF文件。代码简单，是从文档中提取文本的两步过程：

// Load the PDF document from file
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
// Extract all text from the PDF
String text = pdf.extractAllText();
// Print the extracted text
System.out.println(text);

// Load the PDF document from file
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
// Extract all text from the PDF
String text = pdf.extractAllText();
// Print the extracted text
System.out.println(text);

JAVA

在上面的代码中，fromFile打开一个PDF文档。 Paths.get方法获取文件的目录，准备从文件中提取内容。然后，[extractAllText](/java/object-reference/api/com/ironsoftware/ironpdf/PdfDocument.html#extractAllText())读取文档中的所有文本。

输出如下：

如何在Java中读取PDF文件，图1：读取PDF文本输出 读取PDF文本输出

从特定页面读取文本

IronPDF还可以从PDF的特定页面读取内容。 extractTextFromPage方法使用PageSelection对象来接受文本将被读取的页面范围。

在下面的示例中，从PDF文档的第二页提取文本。 PageSelection.singlePage接受需要提取的页面索引（索引从0开始）。

// Load the PDF document from file
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
// Extract text from the second page (page index based, starts at 0, so 1 means second page)
String text = pdf.extractTextFromPage(PageSelection.singlePage(1));
// Print the extracted text from the specified page
System.out.println(text);

// Load the PDF document from file
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
// Extract text from the second page (page index based, starts at 0, so 1 means second page)
String text = pdf.extractTextFromPage(PageSelection.singlePage(1));
// Print the extracted text from the specified page
System.out.println(text);

JAVA

如何在Java中读取PDF文件，图2：读取PDF文本输出 读取PDF文本输出

Other methods available in the PageSelection class which can be used to extract text from various pages include: [firstPage](/java/object-reference/api/com/ironsoftware/ironpdf/edit/PageSelection.html#lastPage()), [lastPage](/java/object-reference/api/com/ironsoftware/ironpdf/edit/PageSelection.html#firstPage()), pageRange, and [allPages](/java/object-reference/api/com/ironsoftware/ironpdf/edit/PageSelection.html#allPages()).

从新生成的PDF文件读取文本

还可以从HTML文件或URL生成的新PDF文件中执行搜索文本。下面的示例代码从URL生成PDF并从网站提取所有文本。

// Generate PDF from a URL
PdfDocument pdf = PdfDocument.renderUrlAsPdf("https://unsplash.com/");
// Extract all text from the generated PDF
String text = pdf.extractAllText();
// Print the extracted text from the URL
System.out.println("Text extracted from the website: " + text);

// Generate PDF from a URL
PdfDocument pdf = PdfDocument.renderUrlAsPdf("https://unsplash.com/");
// Extract all text from the generated PDF
String text = pdf.extractAllText();
// Print the extracted text from the URL
System.out.println("Text extracted from the website: " + text);

JAVA

如何在Java中读取PDF文件，图3：从新文件读取 从新文件读取

IronPDF还可以用于从PDF文件中提取图像。

完整代码如下：

import com.ironsoftware.ironpdf.License;
import com.ironsoftware.ironpdf.PdfDocument;
import com.ironsoftware.ironpdf.edit.PageSelection;

import java.io.IOException;
import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) throws IOException {
        // Set the IronPDF license key for commercial use
        License.setLicenseKey("YOUR LICENSE KEY HERE");

        // Read text from a specific page in an existing PDF
        PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
        String text = pdf.extractTextFromPage(PageSelection.singlePage(1));
        System.out.println(text);

        // Read all text from a PDF generated from a URL
        pdf = PdfDocument.renderUrlAsPdf("https://unsplash.com/");
        text = pdf.extractAllText();
        System.out.println("Text extracted from the website: " + text);
    }
}

import com.ironsoftware.ironpdf.License;
import com.ironsoftware.ironpdf.PdfDocument;
import com.ironsoftware.ironpdf.edit.PageSelection;

import java.io.IOException;
import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) throws IOException {
        // Set the IronPDF license key for commercial use
        License.setLicenseKey("YOUR LICENSE KEY HERE");

        // Read text from a specific page in an existing PDF
        PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
        String text = pdf.extractTextFromPage(PageSelection.singlePage(1));
        System.out.println(text);

        // Read all text from a PDF generated from a URL
        pdf = PdfDocument.renderUrlAsPdf("https://unsplash.com/");
        text = pdf.extractAllText();
        System.out.println("Text extracted from the website: " + text);
    }
}

JAVA