在 C# 中讀取 PDF 檔案

VB C#

using IronPdf;
using IronSoftware.Drawing;
using System.Collections.Generic;

// Extracting Image and Text content from Pdf Documents

// open a 128 bit encrypted PDF
var pdf = PdfDocument.FromFile("encrypted.pdf", "password");

// Get all text to put in a search index
string text = pdf.ExtractAllText();

// Get all Images
var allImages = pdf.ExtractAllImages();

// Or even find the precise text and images for each page in the document
for (var index = 0 ; index < pdf.PageCount ; index++)
{
    int pageNumber = index + 1;
    text = pdf.ExtractTextFromPage(index);
    List<AnyBitmap> images = pdf.ExtractBitmapsFromPage(index);
    //...
}

Imports IronPdf
Imports IronSoftware.Drawing
Imports System.Collections.Generic

' Extracting Image and Text content from Pdf Documents

' open a 128 bit encrypted PDF
Private pdf = PdfDocument.FromFile("encrypted.pdf", "password")

' Get all text to put in a search index
Private text As String = pdf.ExtractAllText()

' Get all Images
Private allImages = pdf.ExtractAllImages()

' Or even find the precise text and images for each page in the document
For index = 0 To pdf.PageCount - 1
	Dim pageNumber As Integer = index + 1
	text = pdf.ExtractTextFromPage(index)
	Dim images As List(Of AnyBitmap) = pdf.ExtractBitmapsFromPage(index)
	'...
Next index

Install-Package IronPdf

在 C# 中讀取 PDF 檔案

IronPDF C# PDF 函式庫中的 PdfDocument.ExtractAllText 方法，非常適合用於標準的 PDF 文字讀取任務。此方法能無縫處理原始 PDF 文件中的空白字元與編碼差異。

PdfDocument.ExtractTextFromPage 會從 PDF 的特定頁面讀取文字。在下方的範例中，我們可以看到它被反覆使用來從特定範圍的頁面中擷取文字內容。

IronPDF 亦可從 PDF 檔案中擷取原始圖像。為此，請使用下方 PdfDocument 類別中的任一方法：

ExtractAllImages：將 PDF 中嵌入的所有圖片以 IronSoftware.Drawing.AnyBitmap 物件的形式返回。
ExtractAllRawImages：將所有嵌入的圖片擷取為原始位元組清單 (byte[])。
ExtractImagesFromPage：從已建立索引的頁面中擷取所含的圖片。
ExtractImagesFromPages：與 ExtractImagesFromPage 相同，但僅針對特定頁碼範圍或個別頁碼清單。
ExtractRawImagesFromPage 和 ExtractRawImagesFromPages：運作方式與前兩種方法相同，但會將擷取的圖片以位元組陣列形式返回，而非 IronSoftware.Drawing.AnyBitmap 物件。