using IronPdf; // Disable local disk access or cross-origin requests Installation.EnableWebSecurity = true; // Instantiate Renderer var renderer = new ChromePdfRenderer(); // Create a PDF from a HTML string using C# var pdf = renderer.RenderHtmlAsPdf("<h1>Hello World</h1>"); // Export to a file or Stream pdf.SaveAs("output.pdf"); // Advanced Example with HTML Assets // Load external html assets: Images, CSS and JavaScript. // An optional BasePath 'C:\site\assets\' is set as the file location to load assets from var myAdvancedPdf = renderer.RenderHtmlAsPdf("<img src='icons/iron.png'>", @"C:\site\assets\"); myAdvancedPdf.SaveAs("html-with-assets.pdf");

產品比較

iText7 從 PDF 中提取文本 vs IronPDF（代碼示例教程）

里根普恩

2023年2月2日

您的企業在每年的 PDF 安全和合規訂閱上花費過多。考慮 IronSecureDoc，提供用於管理數位簽名、編輯、加密和保護等SaaS服務的解決方案，且僅需一次性付款。探索 IronSecureDoc 文件

在本教程中，我們將學習如何從 PDF 中讀取數據。(可攜式文件格式)使用兩種不同工具在C#中處理文件的範例。

在線上有許多解析器庫/讀取器可用，可以從 PDF 中提取文本和圖像。我們將使用迄今為止最有用且最佳的兩個具相關服務的程式庫從 PDF 檔案中提取信息。我們還將比較這兩個庫以找出哪一個更好。

我們將比較 iText 7 和 IronPDF. 在繼續之前，我們將介紹這兩個庫。

iText 7

iText 7 庫是 iTextSharp 的最新版本。它用於 .NET 和 Java 應用程式中。它配備了一個文檔引擎(如 Adobe Acrobat Reader)高級和低級程式設計功能、事件監聽器以及PDF編輯功能。 iText 7 可以創建、編輯和增強 PDF 文件的頁面而不出現任何錯誤。其他功能包括添加密碼、創建編碼策略以及將許可選項保存到 PDF 文件中。它也被用來添加或更改內容或畫布圖像，附加 PDF 元素。[字典等。]，製作浮水印和書籤，更改字體大小，簽署機密資料。

iText 7 允許我們在 .NET 中為網頁、移動設備、桌面、核心或雲端應用構建自訂的 PDF 處理應用程式。

IronPDF

IronPDF 是由 Iron Software 開發的一個函式庫，幫助 C# 和 Java 軟體工程師建立、編輯和提取 PDF 內容。它通常用於從 HTML、網頁或圖像生成 PDF。它用於讀取 PDF 並提取其文本。其他功能包括添加頁眉/頁腳、簽名、附件、密碼和安全問題。它通過多執行緒和異步功能提供全面的性能優化。

IronPDF 支援跨平台相容性，適用於 .NET 5、.NET 6 和 .NET 7、.NET Core、Standard 和 Framework。它也兼容 Windows、macOS、Linux、Docker、Azure 和 AWS。

現在，讓我們看看兩者的示範。

使用iText 7從PDF文件中提取文本

我們將使用以下 PDF 檔案從 PDF 中提取文字。

IronPDF

編寫以下使用iText 7提取文本的源代碼。

//assign PDF location to a string and create new StringBuilder...
string pdfPath = @"D:/TestDocument.pdf";
 var pageText = new StringBuilder();
//read PDF using new PdfDocument and new PdfReader...
 using (PdfDocument document = new PdfDocument(new PdfReader(pdfPath)))
    {
      var pageNumbers = document.GetNumberOfPages();
       for (int page = 1; page <= pageNumbers; page++)
        {
//new LocationTextExtractionStrategy creates a new text extraction renderer
    LocationTextExtractionStrategy strategy = new LocationTextExtractionStrategy();
     PdfCanvasProcessor parser = new PdfCanvasProcessor(strategy);
     parser.ProcessPageContent(document.GetFirstPage());
     pageText.Append(strategy.GetResultantText());
         }
            Console.WriteLine(pageText.ToString());
     }

//assign PDF location to a string and create new StringBuilder...
string pdfPath = @"D:/TestDocument.pdf";
 var pageText = new StringBuilder();
//read PDF using new PdfDocument and new PdfReader...
 using (PdfDocument document = new PdfDocument(new PdfReader(pdfPath)))
    {
      var pageNumbers = document.GetNumberOfPages();
       for (int page = 1; page <= pageNumbers; page++)
        {
//new LocationTextExtractionStrategy creates a new text extraction renderer
    LocationTextExtractionStrategy strategy = new LocationTextExtractionStrategy();
     PdfCanvasProcessor parser = new PdfCanvasProcessor(strategy);
     parser.ProcessPageContent(document.GetFirstPage());
     pageText.Append(strategy.GetResultantText());
         }
            Console.WriteLine(pageText.ToString());
     }

'assign PDF location to a string and create new StringBuilder...
Dim pdfPath As String = "D:/TestDocument.pdf"
 Dim pageText = New StringBuilder()
'read PDF using new PdfDocument and new PdfReader...
 Using document As New PdfDocument(New PdfReader(pdfPath))
	  Dim pageNumbers = document.GetNumberOfPages()
	   For page As Integer = 1 To pageNumbers
'new LocationTextExtractionStrategy creates a new text extraction renderer
	Dim strategy As New LocationTextExtractionStrategy()
	 Dim parser As New PdfCanvasProcessor(strategy)
	 parser.ProcessPageContent(document.GetFirstPage())
	 pageText.Append(strategy.GetResultantText())
	   Next page
			Console.WriteLine(pageText.ToString())
 End Using

提取的文本輸出

現在，讓我們使用 IronPDF 從 PDF 中提取文本。

使用 IronPDF 從 PDF 文件中提取文本

以下源代碼展示了使用 IronPDF 從 PDF 中提取文字的示例。

var pdf = PdfDocument.FromFile(@"D:/TestDocument.pdf");
string text = pdf.ExtractAllText();
Console.WriteLine(text);

var pdf = PdfDocument.FromFile(@"D:/TestDocument.pdf");
string text = pdf.ExtractAllText();
Console.WriteLine(text);

Dim pdf = PdfDocument.FromFile("D:/TestDocument.pdf")
Dim text As String = pdf.ExtractAllText()
Console.WriteLine(text)

使用 IronPDF 提取的文字

比較

使用IronPDF，只需兩行代碼即可從PDF中提取文本。另一方面，使用iText 7，我們需要編寫大約10行代碼來完成相同的任務。

IronPDF 提供便捷的文字提取方法，開箱即用；但iText 7要求我們編寫自己的邏輯來執行相同的任務。

IronPDF 在性能和代碼可讀性方面都很高效。

這兩個程式庫在準確性方面是相同的，因為它們都提供100%準確的輸出。

結論

iText 7 可用於商業用途僅。 IronPDF 是免費供開發使用的，並且還提供一個免費試用為商業用途.

如需更深入比較 IronPDF 和 iText 7，請閱讀此內容。 IronPDF 與 iText 7 的比較文章.

里根普恩

立即與工程團隊聊天

軟體工程師

Regan 畢業於雷丁大學，擁有電子工程學士學位。在加入 Iron Software 之前，他的工作角色讓他專注於單一任務；而他在 Iron Software 工作中最喜歡的是他所能承擔的工作範圍，無論是增加銷售價值、技術支持、產品開發或市場營銷。他喜歡了解開發人員如何使用 Iron Software 庫，並利用這些知識不斷改進文檔和開發產品。

< 上一頁
使用 IronPDF 進行產品比較

下一個 >
IronPDF 與 PDFium.NET 的比較