在 C# 中读取 PDF 文件

VB C#

string(12) "点击复制"

using IronPdf;
using IronSoftware.Drawing;
using System.Collections.Generic;

// Extracting Image and Text content from Pdf Documents

// open a 128 bit encrypted PDF
var pdf = PdfDocument.FromFile("encrypted.pdf", "password");

// Get all text to put in a search index
string text = pdf.ExtractAllText();

// Get all Images
var allImages = pdf.ExtractAllImages();

// Or even find the precise text and images for each page in the document
for (var index = 0 ; index < pdf.PageCount ; index++)
{
    int pageNumber = index + 1;
    text = pdf.ExtractTextFromPage(index);
    List<AnyBitmap> images = pdf.ExtractBitmapsFromPage(index);
    //...
}

Install-Package IronPdf

在 C# 中读取 PDF 文件

IronPDF C# PDF 库中的 PdfDocument.ExtractAllText 方法非常适合普通的 PDF 文本读取任务。此方法可以处理源 PDF 文档中的空白和编码不一致问题。

PdfDocument.ExtractTextFromPage 读取 PDF 中特定页面的文本。在下面的示例中，我们看到它被迭代使用以从特定页面范围中获取文本内容。

IronPDF 也可以从 PDF 中提取原始图像。为此，请使用以下 PdfDocument 类中的任一方法：

ExtractAllImages : 返回 PDF 中嵌入的所有图像作为 IronSoftware.Drawing.AnyBitmap 对象。
ExtractAllRawImages : 以原始字节列表的形式检索所有嵌入的图像 (byte[])。
ExtractImagesFromPage : 提取索引页面上包含的图像。
ExtractImagesFromPages : 与 ExtractImagesFromPage 相同，但来自特定的页面范围或单个页面列表。
ExtractRawImagesFromPages ：与前两个方法的工作方式相同，但返回提取的图像为字节数组而不是 IronSoftware.Drawing.AnyBitmap 对象。