在 C# 中讀取 PDF 文件

VB C#

string(15) "點擊以複製"

using IronPdf;
using IronSoftware.Drawing;
using System.Collections.Generic;

// Extracting Image and Text content from Pdf Documents

// open a 128 bit encrypted PDF
var pdf = PdfDocument.FromFile("encrypted.pdf", "password");

// Get all text to put in a search index
string text = pdf.ExtractAllText();

// Get all Images
var allImages = pdf.ExtractAllImages();

// Or even find the precise text and images for each page in the document
for (var index = 0 ; index < pdf.PageCount ; index++)
{
    int pageNumber = index + 1;
    text = pdf.ExtractTextFromPage(index);
    List<AnyBitmap> images = pdf.ExtractBitmapsFromPage(index);
    //...
}

Install-Package IronPdf

在 C# 中讀取 PDF 文件

IronPDF C# PDF 庫中的 PdfDocument.ExtractAllText 方法非常適合普通的 PDF 文字讀取任務。該方法能夠毫無問題地處理來源 PDF 文件中的空格和編碼差異。

PdfDocument.ExtractTextFromPage 讀取 PDF 中特定頁面的文字。在下面的範例中，我們可以看到它被迭代地用於從特定範圍的頁面中檢索文字內容。

IronPDF也可以從PDF中擷取原始影像。為此，請使用以下 PdfDocument 類別中的任一方法：

ExtractAllImages : 傳回 PDF 中嵌入的所有影像作為 IronSoftware.Drawing.AnyBitmap 物件。
ExtractAllRawImages : 以原始位元組清單的形式檢索所有嵌入的圖像 (byte[])。
ExtractImagesFromPage : 擷取索引頁上所包含的圖片。
ExtractImagesFromPages : 與 ExtractImagesFromPage 相同，但來自特定的頁面範圍或單一頁面清單。
ExtractRawImagesFromPages ：與前兩個方法的工作方式相同，但返回提取的圖像為位元組數組而不是 IronSoftware.Drawing.AnyBitmap 物件。