using IronPdf; // Disable local disk access or cross-origin requests Installation.EnableWebSecurity = true; // Instantiate Renderer var renderer = new ChromePdfRenderer(); // Create a PDF from a HTML string using C# var pdf = renderer.RenderHtmlAsPdf("<h1>Hello World</h1>"); // Export to a file or Stream pdf.SaveAs("output.pdf"); // Advanced Example with HTML Assets // Load external html assets: Images, CSS and JavaScript. // An optional BasePath 'C:\site\assets\' is set as the file location to load assets from var myAdvancedPdf = renderer.RenderHtmlAsPdf("<img src='icons/iron.png'>", @"C:\site\assets\"); myAdvancedPdf.SaveAs("html-with-assets.pdf");

PDF工具

如何在C++中讀取PDF文件

Curtis Chau

更新:2025年7月28日

PDF（便攜式文件格式）文件廣泛用於文件交換，能夠以程式設計方式讀取其內容在各種應用中都很有價值。以下函式庫可用於在 C++ 中讀取 PDF：Poppler、MuPDF、Haru 免費 PDF 函式庫、Xpdf 和 Qpdf。

本文將探討如何使用 Xpdf 命令列工具在 C++ 中讀取 PDF 檔案。 Xpdf 提供了一系列用於處理 PDF 文件的工具，包括提取文字內容。透過將 Xpdf 整合到 C++ 程式中，我們可以從 PDF 文件中提取文字並以程式設計方式對其進行處理。

Xpdf - 命令列工具

Xpdf是一個開源軟體套件，提供了一系列用於處理 PDF（便攜式文件格式）文件的工具和函式庫。 Xpdf 套件包含多個命令列實用程式和 C++ 程式庫，可實現各種與 PDF 相關的功能，例如解析、渲染、文字擷取等。 Xpdf 的一些關鍵元件包括pdfimages 、 pdftops 、 pdfinfo和pdftotext 。在這裡，我們將使用 pdftotext 來讀取 PDF 文件。

pdftotext 是一個命令列工具，可以從 PDF 檔案中提取文字內容並將其輸出為純文字。當您需要從 PDF 文件中提取文字資訊以進行進一步處理或分析時，此工具尤其有用。使用選項，您也可以指定要從中提取文字的頁面。

先決條件

要建立一個用於提取文字的 PDF 閱讀器項目，我們需要滿足以下先決條件：

您的系統上已安裝 C++ 編譯器，例如 GCC 或 Clang。您可以使用任何支援 C++ 程式設計的整合開發環境 (IDE)。
您的系統上已安裝 Xpdf 命令列工具。 Xpdf 是一套 PDF 工具集，可從 Xpdf 網站取得。請從Xpdf 網站下載。在環境變數路徑中設定 Xpdf 的 bin 目錄，以便使用命令列工具從任何地方存取它。

C++中讀取PDF檔案格式的步驟

步驟 1：新增必要的頭部訊息

首先，讓我們在 main.cpp 文件的頂部添加必要的頭檔：

#include <cstdlib>  // For system call
#include <iostream> // For basic input and output
#include <fstream>  // For file stream operations

#include <cstdlib>  // For system call
#include <iostream> // For basic input and output
#include <fstream>  // For file stream operations

C++

步驟 2：編寫 C++ 程式碼

讓我們編寫 C++ 程式碼，呼叫 Xpdf 命令列工具從 PDF 文件中提取文字內容。我們將使用以下 input.pdf 檔案：

如何在 C++ 中讀取 PDF 檔案：圖 1

程式碼範例如下：

#include <cstdlib>
#include <iostream>
#include <fstream>

using namespace std;

int main() {
    // Specify the input and output file paths
    string pdfPath = "input.pdf";
    string outputFilePath = "output.txt";

    // Construct the command to run pdftotext
    string command = "pdftotext " + pdfPath + " " + outputFilePath;
    int status = system(command.c_str());

    // Check if the command executed successfully
    if (status == 0) {
        cout << "Text extraction successful." << endl;
    } else {
        cout << "Text extraction failed." << endl;
        return 1; // Exit the program with error code
    }

    // Open the output file to read the extracted text
    ifstream outputFile(outputFilePath);
    if (outputFile.is_open()) {
        string textContent;
        string line;
        while (getline(outputFile, line)) {
            textContent += line + "\n"; // Append each line to the textContent
        }
        outputFile.close();

        // Display the extracted text
        cout << "Text content extracted from PDF document:" << endl;
        cout << textContent << endl;
    } else {
        cout << "Failed to open output file." << endl;
        return 1; // Exit the program with error code
    }

    return 0; // Exit the program successfully
}

#include <cstdlib>
#include <iostream>
#include <fstream>

using namespace std;

int main() {
    // Specify the input and output file paths
    string pdfPath = "input.pdf";
    string outputFilePath = "output.txt";

    // Construct the command to run pdftotext
    string command = "pdftotext " + pdfPath + " " + outputFilePath;
    int status = system(command.c_str());

    // Check if the command executed successfully
    if (status == 0) {
        cout << "Text extraction successful." << endl;
    } else {
        cout << "Text extraction failed." << endl;
        return 1; // Exit the program with error code
    }

    // Open the output file to read the extracted text
    ifstream outputFile(outputFilePath);
    if (outputFile.is_open()) {
        string textContent;
        string line;
        while (getline(outputFile, line)) {
            textContent += line + "\n"; // Append each line to the textContent
        }
        outputFile.close();

        // Display the extracted text
        cout << "Text content extracted from PDF document:" << endl;
        cout << textContent << endl;
    } else {
        cout << "Failed to open output file." << endl;
        return 1; // Exit the program with error code
    }

    return 0; // Exit the program successfully
}

C++

程式碼解釋

在上面的程式碼中，我們定義了變數 pdfPath 來保存輸入 PDF 檔案的路徑。請務必將其替換為您實際輸入 PDF 文件的正確路徑。

我們也定義了 outputFilePath 變數來保存 Xpdf 將產生的輸出文字檔案的路徑。

程式碼使用 system 函數執行 pdftotext 命令，並將輸入 PDF 檔案路徑和輸出文字檔案路徑作為命令列參數傳遞。 status 變數擷取指令的退出狀態。

如果 pdftotext 執行成功（狀態為 0），我們將繼續使用 ifstream 開啟輸出文字檔。然後我們逐行讀取文字內容，並將其儲存在 textContent 字串中。

最後，我們將從生成的輸出檔案中提取的文字內容輸出到控制台。如果您不需要可編輯的輸出文字文件，或者想要釋放磁碟空間，只需在程式結束時，在結束主函數之前使用以下命令將其刪除：

remove(outputFilePath.c_str());

remove(outputFilePath.c_str());

C++

步驟 3：編譯並執行程式

編譯 C++ 程式碼並執行可執行檔。如果將 pdftotext 新增至環境變數系統路徑中，則其指令將成功執行。該程式會產生輸出文字文件，並從 PDF 文件中提取文字內容。提取出的文字隨後顯示在控制台上。

輸出結果如下

如何在 C++ 中讀取 PDF 檔案：圖 2

用 C# 讀取 PDF 文件

IronPDF 函式庫

IronPDF是一個流行的 C# PDF 函式庫，它提供了強大的 PDF 文件處理功能。它使開發人員能夠以程式設計方式建立、編輯、修改和讀取 PDF 文件。

使用 IronPDF 函式庫讀取 PDF 文件是一個簡單的過程。該程式庫提供了各種方法和屬性，使開發人員能夠從 PDF 頁面中提取文字、圖像、元資料和其他資料。提取的資訊可用於在應用程式中進行進一步處理、分析或顯示。

以下程式碼範例將使用 IronPDF 讀取 PDF 檔案：

// Import necessary namespaces
using IronPdf; // For PDF functionalities
using IronSoftware.Drawing; // For handling images
using System.Collections.Generic; // For using the List

// Example of extracting text and images from PDF using IronPDF

// Open a 128-bit encrypted PDF
var pdf = PdfDocument.FromFile("encrypted.pdf", "password");

// Get all text from the PDF
string text = pdf.ExtractAllText();

// Extract all images from the PDF
var allImages = pdf.ExtractAllImages();

// Iterate over each page to extract text and images
for (var index = 0; index < pdf.PageCount; index++) {
    int pageNumber = index + 1;
    text = pdf.ExtractTextFromPage(index);
    List<AnyBitmap> images = pdf.ExtractBitmapsFromPage(index);
    // Perform actions with text and images...
}

// Import necessary namespaces
using IronPdf; // For PDF functionalities
using IronSoftware.Drawing; // For handling images
using System.Collections.Generic; // For using the List

// Example of extracting text and images from PDF using IronPDF

// Open a 128-bit encrypted PDF
var pdf = PdfDocument.FromFile("encrypted.pdf", "password");

// Get all text from the PDF
string text = pdf.ExtractAllText();

// Extract all images from the PDF
var allImages = pdf.ExtractAllImages();

// Iterate over each page to extract text and images
for (var index = 0; index < pdf.PageCount; index++) {
    int pageNumber = index + 1;
    text = pdf.ExtractTextFromPage(index);
    List<AnyBitmap> images = pdf.ExtractBitmapsFromPage(index);
    // Perform actions with text and images...
}

' Import necessary namespaces
Imports IronPdf ' For PDF functionalities
Imports IronSoftware.Drawing ' For handling images
Imports System.Collections.Generic ' For using the List

' Example of extracting text and images from PDF using IronPDF

' Open a 128-bit encrypted PDF
Private pdf = PdfDocument.FromFile("encrypted.pdf", "password")

' Get all text from the PDF
Private text As String = pdf.ExtractAllText()

' Extract all images from the PDF
Private allImages = pdf.ExtractAllImages()

' Iterate over each page to extract text and images
For index = 0 To pdf.PageCount - 1
	Dim pageNumber As Integer = index + 1
	text = pdf.ExtractTextFromPage(index)
	Dim images As List(Of AnyBitmap) = pdf.ExtractBitmapsFromPage(index)
	' Perform actions with text and images...
Next index

$vbLabelText $csharpLabel

有關如何閱讀 PDF 文件的更多詳細信息，請訪問IronPDF C# PDF 閱讀指南。