使用 IRONPDF FOR JAVA

如何在 Java 中提取 PDF 資料

Q: 如何在 Java 中從 PDF 擷取文字？

您可以使用 IronPDF for Java 從 PDF 擷取文字，通過 PdfDocument 類加載文檔，並使用 extractAllText 方法來檢索文字。

Q: 我可以從網址擷取資料並將其轉換為 Java 中的 PDF 嗎？

可以，IronPDF for Java 允許您在運行時將網址轉換為 PDF 並使用 PdfDocument 類從其中擷取資料。

Q: 如何在 Java 中從 PDF 擷取表格資料？

要使用 IronPDF for Java 從 PDF 擷取表格資料，請載入 PDF 文件並使用 extractAllText 方法檢索表格資料。

Q: 如何解決使用 Java 從 PDF 擷取資料的問題？

確保滿足所有先決條件，例如擁有最新的 Java 版本、相容的 IDE 和 IronPDF 庫。檢查 pom.xml 文件中正確的 Maven 集成和庫依賴。

Darrius Serrant

更新日期:7月 28, 2025

本教程將向您展示如何使用IronPDF for Java從 PDF 文件中提取數據。環境設置、庫導入、讀取輸入文件和提取所需數據都會用代碼示例進行解釋。

2. IronPDF Java PDF 庫

IronPDF 是一個軟體庫，提供給開發人員生成、編輯和使用 IronPDF for Java 從 PDF 文件中提取數據的能力，適用於他們的 Java 應用程式。 It allows you to create PDFs from HTML documents, images, and more, as well as merge multiple PDFs, split PDF files, and manipulate existing PDFs. IronPDF also provides the ability to secure PDFs with password protection features and add digital signatures to PDFs, among other features.

IronPDF for Java 由 Iron Software 開發和維護。它最受好評的功能之一是從 PDF 文件以及 HTML 和 URL 中提取文本和數據。

3. 先決條件

要使用 IronPDF 從 PDF 文件中提取數據，您必須滿足以下先決條件：

Java 安裝：確保 Java 已安裝在您的系統上，並且其路徑已在環境變量中設置。如果您還沒有安裝 Java，請參考Java 網站上的下載頁面獲取說明。
Java IDE：安裝一個像 Eclipse 或 IntelliJ 的 Java IDE。 You can download Eclipse from this Eclipse download page and IntelliJ from this IntelliJ download page.
IronPDF 庫：下載並在您的項目中添加 IronPDF 庫作為依賴項。訪問IronPDF 安裝說明頁面獲取安裝說明。
Maven 安裝：在開始 PDF 轉換過程之前，Maven 應安裝並與您的 IDE 集成。參考這個JetBrains 上的 Maven 安裝教程以進行 Maven 的安裝和集成。

4. IronPDF for Java 安裝

安裝 IronPDF for Java 很簡單，前提是滿足所有要求。本指南將使用 JetBrains 的 IntelliJ IDEA 示範安裝並運行示例代碼。

如下所示：

打開 IntelliJ IDEA： 在您的系統上啟動 JetBrains IntelliJ IDEA。
創建一個 Maven 項目：在 IntelliJ IDEA 中，創建一個新的 Maven 項目。這將為 IronPDF for Java 的安裝提供合適的環境。

如何從 Java 中的 PDF 中提取數據，圖1：在 IntelliJ 中創建新 Maven 項目 在 IntelliJ 中創建新 Maven 項目

將出現一個新窗口。輸入項目名稱並點擊完成。

如何從 Java 中的 PDF 中提取數據，圖2：命名 Maven 項目並點擊完成 命名 Maven 項目並點擊完成

一旦您點擊完成，新項目將打開一個 pom.xml。這將用於添加 IronPDF Java Maven 依賴項。

如何從 Java 中的 PDF 中提取數據，圖3：pom.xml 文件 pom.xml 文件

在pom.xml文件中添加以下依賴項，或者您可以從Sonatype Central 的 IronPDF 庫頁面下載 JAR 文件。

<dependency>
    <groupId>com.ironsoftware</groupId>
    <artifactId>ironpdf</artifactId>
    <version>1.0.0</version> <!-- replace with the latest version -->
</dependency>

<dependency>
    <groupId>com.ironsoftware</groupId>
    <artifactId>ironpdf</artifactId>
    <version>1.0.0</version> <!-- replace with the latest version -->
</dependency>

XML

在 pom.xml 文件中放置依賴項後，文件的右上角會出現一個小圖標。

如何從 Java 中的 PDF 中提取數據，圖4：點擊浮動圖標以自動安裝 Maven 依賴項 點擊浮動圖標以自動安裝 Maven 依賴項

通過點擊該按鈕來安裝 IronPDF for Java 的 Maven 依賴項。根據您網路連接的速度，這應會在幾分鐘內完成。

5. 提取數據

IronPDF 是一個用于建立、編輯和從 PDF 文件中提取數據的 Java 庫。它提供了一個簡單的 API 來從 PDF 文件、URL 和表格中提取文本。

5.1. 從 PDF 文件中提取數據

使用 IronPDF for Java，您可以輕鬆地從 PDF 文件中提取文本數據。以下是從 PDF 文件中提取數據的示例代碼。

如何從 Java 中的 PDF 中提取數據，圖5：PDF 輸入 PDF 輸入

// Import the necessary IronPDF package for working with PDF documents
import com.ironsoftware.ironpdf.PdfDocument;

import java.io.IOException;
import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) throws IOException {
        // Load the PDF document from the specified file
        PdfDocument pdf = PdfDocument.fromFile(Paths.get("business plan.pdf"));

        // Extract all text from the PDF document
        String text = pdf.extractAllText();

        // Print the extracted text to the console
        System.out.println("Text extracted from the PDF: " + text);
    }
}

// Import the necessary IronPDF package for working with PDF documents
import com.ironsoftware.ironpdf.PdfDocument;

import java.io.IOException;
import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) throws IOException {
        // Load the PDF document from the specified file
        PdfDocument pdf = PdfDocument.fromFile(Paths.get("business plan.pdf"));

        // Extract all text from the PDF document
        String text = pdf.extractAllText();

        // Print the extracted text to the console
        System.out.println("Text extracted from the PDF: " + text);
    }
}

JAVA

源代碼生成的輸出如下所示：

> Text extracted from the PDF:
> 
> CRAFT-ARENA
> 
> Muhammad Waleed Butt
> 
> Hassan Khan
> 
> ABOUT US
> 
> Craft-Arena is a partnership based business that will help local crafters of Pakistan to sell their handicrafts at good prices and helps them earn a good living.

5.2. 從 URL 中提取數據

IronPDF for Java 在運行時將 URL 轉換為 PDF 並從中提取文本。本示例將展示從 URL 中提取文本的源代碼。

// Import the necessary IronPDF package for working with PDF documents
import com.ironsoftware.ironpdf.PdfDocument;

import java.io.IOException;

public class Main {
    public static void main(String[] args) throws IOException {
        // Convert a URL to a PDF and load it into a PdfDocument
        PdfDocument pdf = PdfDocument.renderUrlAsPdf("https://ironpdf.com/java/");

        // Extract all text from the PDF document
        String text = pdf.extractAllText();

        // Print the extracted text to the console
        System.out.println("Text extracted from the URLs: " + text);
    }
}

// Import the necessary IronPDF package for working with PDF documents
import com.ironsoftware.ironpdf.PdfDocument;

import java.io.IOException;

public class Main {
    public static void main(String[] args) throws IOException {
        // Convert a URL to a PDF and load it into a PdfDocument
        PdfDocument pdf = PdfDocument.renderUrlAsPdf("https://ironpdf.com/java/");

        // Extract all text from the PDF document
        String text = pdf.extractAllText();

        // Print the extracted text to the console
        System.out.println("Text extracted from the URLs: " + text);
    }
}

JAVA

如何從 Java 中的 PDF 中提取數據，圖6：提取的網頁數據 提取的網頁數據

5.3. 從表格數據中提取數據

從 PDF 中使用 IronPDF for Java 提取表格數據非常簡單；您只需擁有包含表格的 PDF 並運行以下代碼。

如何從 Java 中的 PDF 中提取數據，圖7：樣本 PDF 表格輸入 樣本 PDF 表格輸入

// Import the necessary IronPDF package for working with PDF documents
import com.ironsoftware.ironpdf.PdfDocument;

import java.io.IOException;
import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) throws IOException {
        // Load the PDF document from the specified file
        PdfDocument pdf = PdfDocument.fromFile(Paths.get("table.pdf"));

        // Extract all text from the PDF document, including table data
        String text = pdf.extractAllText();

        // Print the extracted table data to the console
        System.out.print("Text extracted from the Marked tables: " + text);
    }
}

// Import the necessary IronPDF package for working with PDF documents
import com.ironsoftware.ironpdf.PdfDocument;

import java.io.IOException;
import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) throws IOException {
        // Load the PDF document from the specified file
        PdfDocument pdf = PdfDocument.fromFile(Paths.get("table.pdf"));

        // Extract all text from the PDF document, including table data
        String text = pdf.extractAllText();

        // Print the extracted table data to the console
        System.out.print("Text extracted from the Marked tables: " + text);
    }
}

JAVA

> Test Case Description Expected Result Actual Result Status
> 
> 1 Test login functionality User should be able to log in with valid credentials
> 
> User log in successfully Pass
> 
> 2 Test search functionality Search results should be relevant and accurate
> 
> Search is accurate and provide relevant products Pass
> 
> 3 Test checkout process User should be able to complete a purchase successfully
> 
> User can purchase successfully Pass