C# 中的批次 PDF 處理:大規模自動化文件工作流程
透過 IronPDF 在 C# 中進行批次 PDF 處理,可讓 .NET 開發人員大規模自動化文件工作流程——從並行 HTML 轉 PDF 轉換、批次合併/分割,到具備內建錯誤處理、重試邏輯及檢查點功能的非同步 PDF 處理流程。 IronPDF 具備執行緒安全的 Chromium 引擎,並採用 IDisposable 為基礎的記憶體管理機制,使其專為高吞吐量的 PDF 自動化處理而設計,無論您是在本地端、Azure Functions、AWS Lambda 還是 Kubernetes 上執行。
TL;DR:快速入門指南
本教學涵蓋 C# 中的可擴展 PDF 自動化技術——從並行轉換與批次操作,到雲端部署及具彈性的管線模式。
- 適用對象:負責處理大量文件工作流程的 .NET 開發人員與架構師——例如文件遷移專案、每日報表生成流程、合規性整改審查,或無法進行順序處理的檔案數位化工作。
- 您將開發的內容:使用
Parallel.ForEach進行並行 HTML 轉 PDF 轉換、批次合併與分割操作、搭配SemaphoreSlim進行並發控制、採用"失敗時跳過"與重試邏輯的錯誤處理、用於崩潰恢復的檢查點/恢復模式,以及針對 Azure Functions、AWS Lambda 和 Kubernetes 的雲端部署配置。 - 支援環境:.NET 6 以上、.NET Framework 4.6.2 以上、.NET Standard 2.0。所有渲染均採用 IronPDF 內建的 Chromium 引擎 — 無需依賴無頭瀏覽器或外部服務。
- 何時適用此方法:當您需要處理的 PDF 數量超出依序執行的處理能力時——例如大規模的文件遷移、時間窗口緊迫的排程批次工作,或是文件負載不定的多租戶平台。
- 技術上的重要性:IronPDF 的
ChromePdfRenderer具備線程安全特性,且每個渲染操作皆為無狀態,這意味著多個線程可以安全地共用單一渲染器實例。 結合 .NET 的任務並行函式庫(Task Parallel Library)以及IDisposable上的PdfDocument,您將獲得可預測的記憶體行為與 CPU 飽和度,同時避免競態條件或記憶體洩漏。
只需幾行程式碼,即可批次將整個目錄中的 HTML 檔案轉換為 PDF:
-
using NuGet 套件管理員安裝 https://www.nuget.org/packages/IronPdf
PM > Install-Package IronPdf -
請複製並執行此程式碼片段。
using IronPdf; using System.IO; using System.Threading.Tasks; var renderer = new ChromePdfRenderer(); var htmlFiles = Directory.GetFiles("input/", "*.html"); Parallel.ForEach(htmlFiles, htmlFile => { var pdf = renderer.RenderHtmlFileAsPdf(htmlFile); pdf.SaveAs($"output/{Path.GetFileNameWithoutExtension(htmlFile)}.pdf"); }); -
部署至您的生產環境進行測試
立即透過免費試用,在您的專案中開始使用 IronPDF
購買 IronPDF 或註冊 30 天試用版後,請在應用程式啟動時輸入您的授權金鑰。
IronPdf.License.LicenseKey = "KEY";
IronPdf.License.LicenseKey = "KEY";
Imports IronPdf
IronPdf.License.LicenseKey = "KEY"
立即透過免費試用,在您的專案中開始使用 IronPDF。
目錄
- 問題理解
- 基礎架構
- 核心功能
- 韌性
- 效能
- 部署
- 綜合整理
當您需要處理數千份 PDF 檔案時
PDF 批次處理並非小眾需求——它是企業文件管理中不可或缺的常規環節。 各行各業皆會面臨此類需求,且它們都有一個共同點:無法逐一處理。
文件遷移專案是最常見的觸發因素之一。 當組織從一個文件管理系統遷移至另一個系統時,往往需要將數千(有時甚至數百萬)份文件進行轉換、重新格式化或重新標記。 一家正從舊有理賠系統遷移的保險公司,可能需要將 500,000 份基於 TIFF 的理賠文件轉換為可搜尋的 PDF 檔案。 一家律師事務所若要遷移至新的案件管理平台,可能需要將分散的往來信件整合至統一的案件檔案中。 這些雖屬一次性工作,但範圍龐大且不容許任何錯誤。
每日報表生成是同一問題的常態化版本。 無論是為數千名客戶生成日終投資組合報告的金融機構、為每個出貨貨櫃生成裝貨單的物流公司,還是跨數百個部門製作每日病患摘要的醫療系統——這些機構所產生的 PDF 輸出量級之大,若採用順序處理,將遠超出可接受的時間範圍。 當 10,000 份報告必須在早上 6 點前準備就緒,而數據卻要到午夜才最終確定時,您沒有六個小時的時間逐一處理這些報告。
檔案數位化處於資料遷移與合規要求的交匯點。 擁有數十年紙本記錄的政府機關、大學及企業,面臨必須將文件數位化並以符合標準的格式(通常為 PDF/A)進行歸檔的強制要求。 其數量之龐大令人咋舌——僅美國國家檔案與記錄管理局(NARA)接收供永久保存的聯邦檔案就達數百萬頁——且該流程必須足夠可靠,以免數年後才發現資料缺漏。
合規整改通常是最迫切的驅動因素。 當稽核發現您的文件檔案庫不符合新實施的標準時——例如,您儲存的發票未符合電子發票法規的 PDF/A-3 標準,或您的醫療紀錄缺乏《第 508 條》所要求的無障礙標籤——您需要將現有的整個檔案庫依據新標準進行處理。 工作壓力大、時程緊迫,且翻譯量取決於您的檔案庫中包含的內容。
在上述每種情境中,核心挑戰皆相同:如何可靠且高效地處理大量 PDF 操作,同時確保在發生錯誤時不會耗盡記憶體,或留下未完成的工作?

IronPDF 批次處理架構
在深入探討具體操作之前,了解 IronPDF 是如何設計來處理並發工作負載的,以及在基於 IronPDF 構建批次處理管道時應做出的架構決策,這一點至關重要。
安裝 IronPDF
透過 NuGet 安裝 IronPDF:
Install-Package IronPdf
Install-Package IronPdf
或使用 .NET CLI:
dotnet add package IronPdf
dotnet add package IronPdf
IronPDF 支援 .NET Framework 4.6.2 以上版本、.NET Core、.NET 5 至 .NET 10,以及 .NET Standard 2.0。它可在 Windows、Linux、macOS 及 Docker 容器上運行,因此既適用於本地端批次作業,也適用於雲原生部署。
若需進行生產環境的批次處理,請在應用程式啟動時,於任何 PDF 操作開始前,將您的授權金鑰設定為 License.LicenseKey。 這確保所有執行緒中的每個渲染呼叫都能存取完整的功能集,且不會出現按檔案設定的水印。
並行控制與執行緒安全性
IronPDF 基於 Chromium 的渲染引擎具有線程安全特性。 您可以在不同執行緒中建立多個 ChromePdfRenderer 實例,或共用單一實例 — IronPDF 會處理內部同步。 針對批次處理,官方建議使用 .NET Core 內建的 Parallel.ForEach,該功能會自動將工作負載分配至所有可用的 CPU 核心。
話雖如此,"執行緒安全"並不等同於"可使用無限執行緒"。每項並發的 PDF 渲染操作都會消耗記憶體(Chromium 引擎需要工作空間來進行 DOM 解析、CSS 版面配置及圖像光柵化),若在記憶體受限的系統上啟動過多並行操作,將導致效能下降或引發 OutOfMemoryException。 適當的並發程度取決於您的硬體:一台配備 64 GB 記憶體的 16 核心伺服器,可輕鬆處理 8 至 12 個並發渲染任務; 配備 8 GB 記憶體的 4 核心虛擬機器可能僅能支援 2 至 4 個執行緒。請透過 ParallelOptions.MaxDegreeOfParallelism 進行控制 — 建議先將其設定為可用 CPU 核心數的大約一半作為起點,再根據實際觀察到的記憶體壓力進行調整。
大規模記憶體管理
記憶體管理是批次 PDF 處理中最關鍵的考量因素。 每個 PdfDocument 物件都會在記憶體中保留 PDF 的完整二進位內容,若未妥善釋放這些物件,記憶體佔用量將隨處理的檔案數量呈線性增長。
關鍵規則:務必始終使用 using 語句,或在 PdfDocument 物件上明確呼叫 Dispose()。 IronPDF 的 PdfDocument 實作 IDisposable,而未正確釋放資源是批次處理情境中記憶體問題最常見的成因。 處理迴圈的每次迭代都應建立一個 PdfDocument 物件,執行其任務,並進行釋放 — 除非有特定理由且具備足夠記憶體來處理,否則切勿將 PdfDocument 物件累積在清單或集合中。
除了釋放記憶體外,針對大量批次處理,請考慮以下記憶體管理策略:
請分段處理,而非一次載入所有內容。 若需處理 50,000 個檔案,請勿將其全數列舉至清單中再逐一迭代——應以 100 或 500 個為一組進行批次處理,讓垃圾回收程式能在各批次之間回收記憶體。
對於極大規模的批次處理,請在各區塊之間強制執行垃圾回收。 雖然通常應讓垃圾回收器自行管理,但批次處理是少數幾種情境之一,此時在區塊邊界之間呼叫 GC.Collect() 可防止記憶體壓力累積。
使用 GC.GetTotalMemory() 或進程層級指標來監控記憶體消耗。 若記憶體使用量超過閾值(例如可用 RAM 的 80%),請暫停處理以讓垃圾回收機制(GC)來得及處理。
進度報告與記錄
當批次工作需要數小時才能完成時,掌握其進度狀況並非可有可無——而是至關重要。 至少應記錄每個檔案的開始與完成時間、追蹤成功/失敗次數,並提供預估剩餘時間。 執行並行操作時,請使用 Interlocked.Increment 作為線程安全的計數器,並以固定間隔(每 50 或 100 個檔案)進行記錄,而非針對每個檔案分別記錄,以避免輸出過載。 使用 System.Diagnostics.Stopwatch 追蹤已用時間,並計算每秒處理檔案數,以得出具參考價值的預計完成時間 (ETA)。
針對生產環境的批次工作,建議將進度寫入持久化儲存體(資料庫、檔案或訊息佇列),以便監控儀表板能在無需直接連線至批次程序的情況下,顯示即時狀態。
常見批次操作
在架構基礎就緒後,讓我們逐步了解最常見的批次操作及其在 IronPDF 中的實現方式。
批次將 HTML 轉換為 PDF
HTML 轉 PDF 是最常見的批次處理操作。 無論是從範本生成發票、將 HTML 文件庫轉換為 PDF,還是從網頁應用程式渲染動態報表,其運作模式皆相同:遍歷輸入資料、渲染每一項,並儲存輸出結果。
輸入內容(5 個 HTML 檔案)
INV-2026-001
INV-2026-002
INV-2026-003
INV-2026-004
INV-2026-005
該實作使用 ChromePdfRenderer 與 Parallel.ForEach 同時處理所有 HTML 檔案,並透過 MaxDegreeOfParallelism 控制並行處理,以在吞吐量與記憶體消耗之間取得平衡。 每個檔案皆透過 RenderHtmlFileAsPdf 進行渲染並儲存至輸出目錄,並透過執行緒安全的 Interlocked 計數器追蹤進度。
:path=/static-assets/pdf/content-code-examples/tutorials/batch-pdf-processing-csharp/batch-processing-html-to-pdf.cs
using IronPdf;
using System;
using System.IO;
using System.Threading.Tasks;
using System.Threading;
// Configure paths
string inputFolder = "input/";
string outputFolder = "output/";
Directory.CreateDirectory(outputFolder);
string[] htmlFiles = Directory.GetFiles(inputFolder, "*.html");
Console.WriteLine($"Found {htmlFiles.Length} HTML files to convert");
// Create renderer instance (thread-safe, can be shared)
var renderer = new ChromePdfRenderer();
// Track progress
int processed = 0;
int failed = 0;
// Process in parallel with controlled concurrency
var options = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount / 2
};
Parallel.ForEach(htmlFiles, options, htmlFile =>
{
try
{
string fileName = Path.GetFileNameWithoutExtension(htmlFile);
string outputPath = Path.Combine(outputFolder, $"{fileName}.pdf");
using var pdf = renderer.RenderHtmlFileAsPdf(htmlFile);
pdf.SaveAs(outputPath);
Interlocked.Increment(ref processed);
Console.WriteLine($"[OK] {fileName}.pdf");
}
catch (Exception ex)
{
Interlocked.Increment(ref failed);
Console.WriteLine($"[ERROR] {Path.GetFileName(htmlFile)}: {ex.Message}");
}
});
Console.WriteLine($"\nComplete: {processed} succeeded, {failed} failed");
Imports IronPdf
Imports System
Imports System.IO
Imports System.Threading.Tasks
Imports System.Threading
' Configure paths
Dim inputFolder As String = "input/"
Dim outputFolder As String = "output/"
Directory.CreateDirectory(outputFolder)
Dim htmlFiles As String() = Directory.GetFiles(inputFolder, "*.html")
Console.WriteLine($"Found {htmlFiles.Length} HTML files to convert")
' Create renderer instance (thread-safe, can be shared)
Dim renderer As New ChromePdfRenderer()
' Track progress
Dim processed As Integer = 0
Dim failed As Integer = 0
' Process in parallel with controlled concurrency
Dim options As New ParallelOptions With {
.MaxDegreeOfParallelism = Environment.ProcessorCount \ 2
}
Parallel.ForEach(htmlFiles, options, Sub(htmlFile)
Try
Dim fileName As String = Path.GetFileNameWithoutExtension(htmlFile)
Dim outputPath As String = Path.Combine(outputFolder, $"{fileName}.pdf")
Using pdf = renderer.RenderHtmlFileAsPdf(htmlFile)
pdf.SaveAs(outputPath)
End Using
Interlocked.Increment(processed)
Console.WriteLine($"[OK] {fileName}.pdf")
Catch ex As Exception
Interlocked.Increment(failed)
Console.WriteLine($"[ERROR] {Path.GetFileName(htmlFile)}: {ex.Message}")
End Try
End Sub)
Console.WriteLine($"\nComplete: {processed} succeeded, {failed} failed")
輸出
每張 HTML 發票都會渲染為對應的 PDF 檔案。 上圖顯示的是 INV-2026-001.pdf — 這是 5 個批次輸出結果之一。
針對基於範本的生成(例如:發票、報告),您通常會在渲染前將資料合併至 HTML 範本中。 操作方式很簡單:先載入一次 HTML 範本,使用 string.Replace 注入每筆記錄的資料(客戶名稱、總計、日期),然後將填入資料的 HTML 傳遞給平行迴圈內的 RenderHtmlAsPdf。 IronPDF 亦提供 RenderHtmlAsPdfAsync,適用於您希望使用 async/await 取代 Parallel.ForEach 的情境 —— 我們將在後續章節中詳細探討 async 模式。
批次 PDF 合併
將多份 PDF 合併為單一文件,在法律(合併案件檔案文件)、金融(將月報合併為季報)及出版工作流程中十分常見。
:path=/static-assets/pdf/content-code-examples/tutorials/batch-pdf-processing-csharp/batch-processing-merge.cs
using IronPdf;
using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
string inputFolder = "documents/";
string outputFolder = "merged/";
Directory.CreateDirectory(outputFolder);
// Group PDFs by prefix (e.g., "invoice-2026-01-*.pdf" -> one merged file)
var pdfFiles = Directory.GetFiles(inputFolder, "*.pdf");
var groups = pdfFiles
.GroupBy(f => Path.GetFileName(f).Split('-').Take(3).Aggregate((a, b) => $"{a}-{b}"))
.Where(g => g.Count() > 1);
Console.WriteLine($"Found {groups.Count()} groups to merge");
foreach (var group in groups)
{
string groupName = group.Key;
var filesToMerge = group.OrderBy(f => f).ToList();
Console.WriteLine($"Merging {filesToMerge.Count} files into {groupName}.pdf");
try
{
// Load all PDFs for this group
var pdfDocs = new List<PdfDocument>();
foreach (string filePath in filesToMerge)
{
pdfDocs.Add(PdfDocument.FromFile(filePath));
}
// Merge all documents
using var merged = PdfDocument.Merge(pdfDocs);
merged.SaveAs(Path.Combine(outputFolder, $"{groupName}-merged.pdf"));
// Dispose source documents
foreach (var doc in pdfDocs)
{
doc.Dispose();
}
Console.WriteLine($" [OK] Created {groupName}-merged.pdf ({merged.PageCount} pages)");
}
catch (Exception ex)
{
Console.WriteLine($" [ERROR] {groupName}: {ex.Message}");
}
}
Console.WriteLine("\nMerge complete");
Imports IronPdf
Imports System
Imports System.IO
Imports System.Linq
Imports System.Collections.Generic
Module Program
Sub Main()
Dim inputFolder As String = "documents/"
Dim outputFolder As String = "merged/"
Directory.CreateDirectory(outputFolder)
' Group PDFs by prefix (e.g., "invoice-2026-01-*.pdf" -> one merged file)
Dim pdfFiles = Directory.GetFiles(inputFolder, "*.pdf")
Dim groups = pdfFiles _
.GroupBy(Function(f) Path.GetFileName(f).Split("-"c).Take(3).Aggregate(Function(a, b) $"{a}-{b}")) _
.Where(Function(g) g.Count() > 1)
Console.WriteLine($"Found {groups.Count()} groups to merge")
For Each group In groups
Dim groupName As String = group.Key
Dim filesToMerge = group.OrderBy(Function(f) f).ToList()
Console.WriteLine($"Merging {filesToMerge.Count} files into {groupName}.pdf")
Try
' Load all PDFs for this group
Dim pdfDocs As New List(Of PdfDocument)()
For Each filePath As String In filesToMerge
pdfDocs.Add(PdfDocument.FromFile(filePath))
Next
' Merge all documents
Using merged = PdfDocument.Merge(pdfDocs)
merged.SaveAs(Path.Combine(outputFolder, $"{groupName}-merged.pdf"))
End Using
' Dispose source documents
For Each doc In pdfDocs
doc.Dispose()
Next
Console.WriteLine($" [OK] Created {groupName}-merged.pdf ({merged.PageCount} pages)")
Catch ex As Exception
Console.WriteLine($" [ERROR] {groupName}: {ex.Message}")
End Try
Next
Console.WriteLine(vbCrLf & "Merge complete")
End Sub
End Module
在合併大量檔案時,請注意記憶體使用:PdfDocument.Merge 方法會將所有來源文件同時載入記憶體中。 若需合併數百個大型 PDF 檔案,建議分階段進行:先將 10 至 20 個檔案組合成中間文件,再將這些中間文件進行合併。
批次分割 PDF
將多頁 PDF 分割為單一頁面(或頁面範圍)是合併操作的逆向操作。 常見於郵件處理流程中,需將掃描的一批文件拆分為獨立記錄;以及在PRINT工作流程中,需將複合文件拆解為獨立文件。
輸入
以下程式碼示範如何透過 CopyPage 在並行迴圈中擷取個別頁面,並為每個頁面建立獨立的 PDF 檔案。 另一個 SplitByRange 輔助函式示範了如何擷取頁面範圍而非單一頁面,這對於將大型文件分割成較小的區段非常有用。
:path=/static-assets/pdf/content-code-examples/tutorials/batch-pdf-processing-csharp/batch-processing-split.cs
using IronPdf;
using System;
using System.IO;
using System.Threading.Tasks;
string inputFolder = "multipage/";
string outputFolder = "split/";
Directory.CreateDirectory(outputFolder);
string[] pdfFiles = Directory.GetFiles(inputFolder, "*.pdf");
Console.WriteLine($"Found {pdfFiles.Length} PDFs to split");
var options = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount / 2
};
Parallel.ForEach(pdfFiles, options, pdfFile =>
{
string baseName = Path.GetFileNameWithoutExtension(pdfFile);
try
{
using var pdf = PdfDocument.FromFile(pdfFile);
int pageCount = pdf.PageCount;
Console.WriteLine($"Splitting {baseName}.pdf ({pageCount} pages)");
// Extract each page as a separate PDF
for (int i = 0; i < pageCount; i++)
{
using var singlePage = pdf.CopyPage(i);
string outputPath = Path.Combine(outputFolder, $"{baseName}-page-{i + 1:D3}.pdf");
singlePage.SaveAs(outputPath);
}
Console.WriteLine($" [OK] Created {pageCount} files from {baseName}.pdf");
}
catch (Exception ex)
{
Console.WriteLine($" [ERROR] {baseName}: {ex.Message}");
}
});
// Alternative: Extract page ranges instead of individual pages
void SplitByRange(string inputFile, string outputFolder, int pagesPerChunk)
{
using var pdf = PdfDocument.FromFile(inputFile);
string baseName = Path.GetFileNameWithoutExtension(inputFile);
int totalPages = pdf.PageCount;
int chunkNumber = 1;
for (int startPage = 0; startPage < totalPages; startPage += pagesPerChunk)
{
int endPage = Math.Min(startPage + pagesPerChunk - 1, totalPages - 1);
using var chunk = pdf.CopyPages(startPage, endPage);
chunk.SaveAs(Path.Combine(outputFolder, $"{baseName}-chunk-{chunkNumber:D3}.pdf"));
chunkNumber++;
}
}
Console.WriteLine("\nSplit complete");
Imports IronPdf
Imports System
Imports System.IO
Imports System.Threading.Tasks
Module Program
Sub Main()
Dim inputFolder As String = "multipage/"
Dim outputFolder As String = "split/"
Directory.CreateDirectory(outputFolder)
Dim pdfFiles As String() = Directory.GetFiles(inputFolder, "*.pdf")
Console.WriteLine($"Found {pdfFiles.Length} PDFs to split")
Dim options As New ParallelOptions With {
.MaxDegreeOfParallelism = Environment.ProcessorCount \ 2
}
Parallel.ForEach(pdfFiles, options, Sub(pdfFile)
Dim baseName As String = Path.GetFileNameWithoutExtension(pdfFile)
Try
Using pdf = PdfDocument.FromFile(pdfFile)
Dim pageCount As Integer = pdf.PageCount
Console.WriteLine($"Splitting {baseName}.pdf ({pageCount} pages)")
' Extract each page as a separate PDF
For i As Integer = 0 To pageCount - 1
Using singlePage = pdf.CopyPage(i)
Dim outputPath As String = Path.Combine(outputFolder, $"{baseName}-page-{i + 1:D3}.pdf")
singlePage.SaveAs(outputPath)
End Using
Next
Console.WriteLine($" [OK] Created {pageCount} files from {baseName}.pdf")
End Using
Catch ex As Exception
Console.WriteLine($" [ERROR] {baseName}: {ex.Message}")
End Try
End Sub)
' Alternative: Extract page ranges instead of individual pages
Sub SplitByRange(inputFile As String, outputFolder As String, pagesPerChunk As Integer)
Using pdf = PdfDocument.FromFile(inputFile)
Dim baseName As String = Path.GetFileNameWithoutExtension(inputFile)
Dim totalPages As Integer = pdf.PageCount
Dim chunkNumber As Integer = 1
For startPage As Integer = 0 To totalPages - 1 Step pagesPerChunk
Dim endPage As Integer = Math.Min(startPage + pagesPerChunk - 1, totalPages - 1)
Using chunk = pdf.CopyPages(startPage, endPage)
chunk.SaveAs(Path.Combine(outputFolder, $"{baseName}-chunk-{chunkNumber:D3}.pdf"))
chunkNumber += 1
End Using
Next
End Using
End Sub
Console.WriteLine(vbCrLf & "Split complete")
End Sub
End Module
輸出
第 2 頁已擷取為獨立 PDF 檔案 (annual-report-page-2.pdf)
IronPDF 的 CopyPage 和 CopyPages 方法會建立新的 PdfDocument 物件,其中包含指定的頁面。 請記得在儲存後,將原始文件及每個擷取的頁面文件一併刪除。
批次壓縮
當儲存成本成為考量因素,或您需要在頻寬受限的連線環境下傳輸 PDF 檔案時,批次壓縮能大幅減少檔案儲存空間的佔用。 IronPDF 提供兩種壓縮方式:CompressImages 用於降低影像品質/大小,以及 CompressStructTree 用於移除結構性元資料。 較新的 CompressAndSaveAs API(於 2025.12 版本推出)透過結合多種優化技術,提供更卓越的壓縮效果。
:path=/static-assets/pdf/content-code-examples/tutorials/batch-pdf-processing-csharp/batch-processing-compression.cs
using IronPdf;
using System;
using System.IO;
using System.Threading.Tasks;
using System.Threading;
string inputFolder = "originals/";
string outputFolder = "compressed/";
Directory.CreateDirectory(outputFolder);
string[] pdfFiles = Directory.GetFiles(inputFolder, "*.pdf");
Console.WriteLine($"Found {pdfFiles.Length} PDFs to compress");
long totalOriginalSize = 0;
long totalCompressedSize = 0;
int processed = 0;
var options = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount / 2
};
Parallel.ForEach(pdfFiles, options, pdfFile =>
{
string fileName = Path.GetFileName(pdfFile);
string outputPath = Path.Combine(outputFolder, fileName);
try
{
long originalSize = new FileInfo(pdfFile).Length;
Interlocked.Add(ref totalOriginalSize, originalSize);
using var pdf = PdfDocument.FromFile(pdfFile);
// Apply compression with JPEG quality setting (0-100, lower = more compression)
pdf.CompressAndSaveAs(outputPath, 60);
long compressedSize = new FileInfo(outputPath).Length;
Interlocked.Add(ref totalCompressedSize, compressedSize);
Interlocked.Increment(ref processed);
double reduction = (1 - (double)compressedSize / originalSize) * 100;
Console.WriteLine($"[OK] {fileName}: {originalSize / 1024}KB → {compressedSize / 1024}KB ({reduction:F1}% reduction)");
}
catch (Exception ex)
{
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}");
}
});
double totalReduction = (1 - (double)totalCompressedSize / totalOriginalSize) * 100;
Console.WriteLine($"\nCompression complete:");
Console.WriteLine($" Files processed: {processed}");
Console.WriteLine($" Total original: {totalOriginalSize / 1024 / 1024}MB");
Console.WriteLine($" Total compressed: {totalCompressedSize / 1024 / 1024}MB");
Console.WriteLine($" Overall reduction: {totalReduction:F1}%");
Imports IronPdf
Imports System
Imports System.IO
Imports System.Threading.Tasks
Imports System.Threading
Module Program
Sub Main()
Dim inputFolder As String = "originals/"
Dim outputFolder As String = "compressed/"
Directory.CreateDirectory(outputFolder)
Dim pdfFiles As String() = Directory.GetFiles(inputFolder, "*.pdf")
Console.WriteLine($"Found {pdfFiles.Length} PDFs to compress")
Dim totalOriginalSize As Long = 0
Dim totalCompressedSize As Long = 0
Dim processed As Integer = 0
Dim options As New ParallelOptions With {
.MaxDegreeOfParallelism = Environment.ProcessorCount \ 2
}
Parallel.ForEach(pdfFiles, options, Sub(pdfFile)
Dim fileName As String = Path.GetFileName(pdfFile)
Dim outputPath As String = Path.Combine(outputFolder, fileName)
Try
Dim originalSize As Long = New FileInfo(pdfFile).Length
Interlocked.Add(totalOriginalSize, originalSize)
Using pdf = PdfDocument.FromFile(pdfFile)
' Apply compression with JPEG quality setting (0-100, lower = more compression)
pdf.CompressAndSaveAs(outputPath, 60)
End Using
Dim compressedSize As Long = New FileInfo(outputPath).Length
Interlocked.Add(totalCompressedSize, compressedSize)
Interlocked.Increment(processed)
Dim reduction As Double = (1 - CDbl(compressedSize) / originalSize) * 100
Console.WriteLine($"[OK] {fileName}: {originalSize \ 1024}KB → {compressedSize \ 1024}KB ({reduction:F1}% reduction)")
Catch ex As Exception
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}")
End Try
End Sub)
Dim totalReduction As Double = (1 - CDbl(totalCompressedSize) / totalOriginalSize) * 100
Console.WriteLine(vbCrLf & "Compression complete:")
Console.WriteLine($" Files processed: {processed}")
Console.WriteLine($" Total original: {totalOriginalSize \ 1024 \ 1024}MB")
Console.WriteLine($" Total compressed: {totalCompressedSize \ 1024 \ 1024}MB")
Console.WriteLine($" Overall reduction: {totalReduction:F1}%")
End Sub
End Module
關於壓縮的幾點注意事項:JPEG 品質設定低於 60 時,多數圖片會產生可見的壓縮失真。 ShrinkImage 選項在某些設定下可能會導致顯示失真 — 執行完整批次處理前,請先使用具代表性的樣本進行測試。 此外,移除結構樹 (CompressStructTree) 會影響壓縮 PDF 中的文字選取與搜尋功能,因此僅在不需要這些功能時才使用。
批次格式轉換 (PDF/A, PDF/UA)
將現有檔案轉換為符合標準的格式——例如用於長期歸檔的 PDF/A 或用於無障礙存取的 PDF/UA——是價值最高的批次處理作業之一。 IronPDF 支援完整的 PDF/A 版本(包括於 2025.11 版新增的 PDF/A-4)以及 PDF/UA 合規性(包括於 2025.12 版新增的 PDF/UA-2)。
輸入
此範例會先使用 PdfDocument.FromFile 載入每個 PDF 檔案,接著透過 SaveAsPdfA 並搭配 PdfAVersions.PdfA3b 參數,將其轉換為 PDF/A-3b 格式。 另一種 ConvertToPdfUA 函式示範了如何使用 SaveAsPdfUA 進行符合無障礙規範的轉換,儘管 PDF/UA 要求原始文件必須具備正確的結構標籤。
:path=/static-assets/pdf/content-code-examples/tutorials/batch-pdf-processing-csharp/batch-processing-format-conversion.cs
using IronPdf;
using System;
using System.IO;
using System.Threading.Tasks;
using System.Threading;
string inputFolder = "originals/";
string outputFolder = "pdfa-archive/";
Directory.CreateDirectory(outputFolder);
string[] pdfFiles = Directory.GetFiles(inputFolder, "*.pdf");
Console.WriteLine($"Found {pdfFiles.Length} PDFs to convert to PDF/A-3b");
int converted = 0;
int failed = 0;
var options = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount / 2
};
Parallel.ForEach(pdfFiles, options, pdfFile =>
{
string fileName = Path.GetFileName(pdfFile);
string outputPath = Path.Combine(outputFolder, fileName);
try
{
using var pdf = PdfDocument.FromFile(pdfFile);
// Convert to PDF/A-3b for long-term archival
pdf.SaveAsPdfA(outputPath, PdfAVersions.PdfA3b);
Interlocked.Increment(ref converted);
Console.WriteLine($"[OK] {fileName} → PDF/A-3b");
}
catch (Exception ex)
{
Interlocked.Increment(ref failed);
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}");
}
});
Console.WriteLine($"\nConversion complete: {converted} succeeded, {failed} failed");
// Alternative: Convert to PDF/UA for accessibility compliance
void ConvertToPdfUA(string inputFolder, string outputFolder)
{
Directory.CreateDirectory(outputFolder);
string[] files = Directory.GetFiles(inputFolder, "*.pdf");
Parallel.ForEach(files, pdfFile =>
{
string fileName = Path.GetFileName(pdfFile);
try
{
using var pdf = PdfDocument.FromFile(pdfFile);
// PDF/UA requires proper tagging - ensure source is well-structured
pdf.SaveAsPdfUA(Path.Combine(outputFolder, fileName));
Console.WriteLine($"[OK] {fileName} → PDF/UA");
}
catch (Exception ex)
{
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}");
}
});
}
Imports IronPdf
Imports System
Imports System.IO
Imports System.Threading.Tasks
Imports System.Threading
Module Program
Sub Main()
Dim inputFolder As String = "originals/"
Dim outputFolder As String = "pdfa-archive/"
Directory.CreateDirectory(outputFolder)
Dim pdfFiles As String() = Directory.GetFiles(inputFolder, "*.pdf")
Console.WriteLine($"Found {pdfFiles.Length} PDFs to convert to PDF/A-3b")
Dim converted As Integer = 0
Dim failed As Integer = 0
Dim options As New ParallelOptions With {
.MaxDegreeOfParallelism = Environment.ProcessorCount \ 2
}
Parallel.ForEach(pdfFiles, options, Sub(pdfFile)
Dim fileName As String = Path.GetFileName(pdfFile)
Dim outputPath As String = Path.Combine(outputFolder, fileName)
Try
Using pdf = PdfDocument.FromFile(pdfFile)
' Convert to PDF/A-3b for long-term archival
pdf.SaveAsPdfA(outputPath, PdfAVersions.PdfA3b)
Interlocked.Increment(converted)
Console.WriteLine($"[OK] {fileName} → PDF/A-3b")
End Using
Catch ex As Exception
Interlocked.Increment(failed)
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}")
End Try
End Sub)
Console.WriteLine($"{vbCrLf}Conversion complete: {converted} succeeded, {failed} failed")
End Sub
' Alternative: Convert to PDF/UA for accessibility compliance
Sub ConvertToPdfUA(inputFolder As String, outputFolder As String)
Directory.CreateDirectory(outputFolder)
Dim files As String() = Directory.GetFiles(inputFolder, "*.pdf")
Parallel.ForEach(files, Sub(pdfFile)
Dim fileName As String = Path.GetFileName(pdfFile)
Try
Using pdf = PdfDocument.FromFile(pdfFile)
' PDF/UA requires proper tagging - ensure source is well-structured
pdf.SaveAsPdfUA(Path.Combine(outputFolder, fileName))
Console.WriteLine($"[OK] {fileName} → PDF/UA")
End Using
Catch ex As Exception
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}")
End Try
End Sub)
End Sub
End Module
輸出

生成的 PDF 檔案在外觀上與原始檔完全一致(位元對位元),但現在已包含符合 PDF/A-3b 標準的元資料,以供歸檔系統使用。
格式轉換對於合規整改專案尤為重要,此類專案通常發生於組織發現其現有檔案庫不符合監管標準之時。 批次處理流程雖簡單明瞭,但驗證步驟至關重要——在視為完成之前,務必確認每個轉換後的檔案確實通過了合規性檢查。 我們將在下方的"容錯性"章節中詳細探討驗證機制。
建構具韌性的批次處理管線
一個在 100 個檔案上運作完美,卻在 50,000 個檔案中的第 4,327 個檔案時當機的批次處理流程,是毫無用處的。 韌性——即優雅處理錯誤、重試暫時性失敗,以及在系統崩潰後恢復運作的能力——正是區分生產級管道與原型系統的關鍵。
錯誤處理與失敗時跳過
最基本的容錯模式是"失敗時跳過":若單一檔案處理失敗,應記錄錯誤並繼續處理下一份檔案,而非中止整個批次作業。 這聽起來顯而易見,但在使用 Parallel.ForEach 時卻意外容易被忽略——任何並行任務中的未處理例外都會以 AggregateException 的形式傳播,並終止迴圈。
以下範例同時展示了"失敗時跳過"與"重試"的邏輯——透過 try-catch 語法包覆每個檔案以實現優雅的錯誤處理,並在內部重試迴圈中採用指數退避機制,以處理如 IOException 和 OutOfMemoryException 這類暫時性例外狀況:
:path=/static-assets/pdf/content-code-examples/tutorials/batch-pdf-processing-csharp/batch-processing-error-handling-retry.cs
using IronPdf;
using System;
using System.IO;
using System.Threading.Tasks;
using System.Threading;
using System.Collections.Concurrent;
string inputFolder = "input/";
string outputFolder = "output/";
string errorLogPath = "error-log.txt";
Directory.CreateDirectory(outputFolder);
string[] htmlFiles = Directory.GetFiles(inputFolder, "*.html");
var renderer = new ChromePdfRenderer();
var errorLog = new ConcurrentBag<string>();
int processed = 0;
int failed = 0;
int retried = 0;
const int maxRetries = 3;
var options = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount / 2
};
Parallel.ForEach(htmlFiles, options, htmlFile =>
{
string fileName = Path.GetFileNameWithoutExtension(htmlFile);
string outputPath = Path.Combine(outputFolder, $"{fileName}.pdf");
int attempt = 0;
bool success = false;
while (attempt < maxRetries && !success)
{
attempt++;
try
{
using var pdf = renderer.RenderHtmlFileAsPdf(htmlFile);
pdf.SaveAs(outputPath);
success = true;
Interlocked.Increment(ref processed);
if (attempt > 1)
{
Interlocked.Increment(ref retried);
Console.WriteLine($"[OK] {fileName}.pdf (succeeded on attempt {attempt})");
}
else
{
Console.WriteLine($"[OK] {fileName}.pdf");
}
}
catch (Exception ex) when (IsTransientException(ex) && attempt < maxRetries)
{
// Transient error - wait and retry with exponential backoff
int delayMs = (int)Math.Pow(2, attempt) * 500;
Console.WriteLine($"[RETRY] {fileName}: {ex.Message} (attempt {attempt}, waiting {delayMs}ms)");
Thread.Sleep(delayMs);
}
catch (Exception ex)
{
// Non-transient error or max retries exceeded
Interlocked.Increment(ref failed);
string errorMessage = $"{DateTime.Now:yyyy-MM-dd HH:mm:ss} | {fileName} | {ex.GetType().Name} | {ex.Message}";
errorLog.Add(errorMessage);
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}");
}
}
});
// Write error log
if (errorLog.Count > 0)
{
File.WriteAllLines(errorLogPath, errorLog);
}
Console.WriteLine($"\nBatch complete:");
Console.WriteLine($" Processed: {processed}");
Console.WriteLine($" Failed: {failed}");
Console.WriteLine($" Retried: {retried}");
if (failed > 0)
{
Console.WriteLine($" Error log: {errorLogPath}");
}
// Helper to identify transient exceptions worth retrying
bool IsTransientException(Exception ex)
{
return ex is IOException ||
ex is OutOfMemoryException ||
ex.Message.Contains("timeout", StringComparison.OrdinalIgnoreCase) ||
ex.Message.Contains("locked", StringComparison.OrdinalIgnoreCase);
}
Imports IronPdf
Imports System
Imports System.IO
Imports System.Threading
Imports System.Collections.Concurrent
Module Program
Sub Main()
Dim inputFolder As String = "input/"
Dim outputFolder As String = "output/"
Dim errorLogPath As String = "error-log.txt"
Directory.CreateDirectory(outputFolder)
Dim htmlFiles As String() = Directory.GetFiles(inputFolder, "*.html")
Dim renderer As New ChromePdfRenderer()
Dim errorLog As New ConcurrentBag(Of String)()
Dim processed As Integer = 0
Dim failed As Integer = 0
Dim retried As Integer = 0
Const maxRetries As Integer = 3
Dim options As New ParallelOptions With {
.MaxDegreeOfParallelism = Environment.ProcessorCount \ 2
}
Parallel.ForEach(htmlFiles, options, Sub(htmlFile)
Dim fileName As String = Path.GetFileNameWithoutExtension(htmlFile)
Dim outputPath As String = Path.Combine(outputFolder, $"{fileName}.pdf")
Dim attempt As Integer = 0
Dim success As Boolean = False
While attempt < maxRetries AndAlso Not success
attempt += 1
Try
Using pdf = renderer.RenderHtmlFileAsPdf(htmlFile)
pdf.SaveAs(outputPath)
success = True
Interlocked.Increment(processed)
If attempt > 1 Then
Interlocked.Increment(retried)
Console.WriteLine($"[OK] {fileName}.pdf (succeeded on attempt {attempt})")
Else
Console.WriteLine($"[OK] {fileName}.pdf")
End If
End Using
Catch ex As Exception When IsTransientException(ex) AndAlso attempt < maxRetries
' Transient error - wait and retry with exponential backoff
Dim delayMs As Integer = CInt(Math.Pow(2, attempt)) * 500
Console.WriteLine($"[RETRY] {fileName}: {ex.Message} (attempt {attempt}, waiting {delayMs}ms)")
Thread.Sleep(delayMs)
Catch ex As Exception
' Non-transient error or max retries exceeded
Interlocked.Increment(failed)
Dim errorMessage As String = $"{DateTime.Now:yyyy-MM-dd HH:mm:ss} | {fileName} | {ex.GetType().Name} | {ex.Message}"
errorLog.Add(errorMessage)
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}")
End Try
End While
End Sub)
' Write error log
If errorLog.Count > 0 Then
File.WriteAllLines(errorLogPath, errorLog)
End If
Console.WriteLine($"{vbCrLf}Batch complete:")
Console.WriteLine($" Processed: {processed}")
Console.WriteLine($" Failed: {failed}")
Console.WriteLine($" Retried: {retried}")
If failed > 0 Then
Console.WriteLine($" Error log: {errorLogPath}")
End If
End Sub
' Helper to identify transient exceptions worth retrying
Function IsTransientException(ex As Exception) As Boolean
Return TypeOf ex Is IOException OrElse
TypeOf ex Is OutOfMemoryException OrElse
ex.Message.Contains("timeout", StringComparison.OrdinalIgnoreCase) OrElse
ex.Message.Contains("locked", StringComparison.OrdinalIgnoreCase)
End Function
End Module
批次處理完成後,請檢視錯誤日誌以了解哪些檔案處理失敗及其原因。 常見的失敗原因包括:原始檔案損毀、受密碼保護的 PDF 檔案、原始內容中包含不支援的功能,以及處理超大文件時發生記憶體不足的情況。
針對暫時性失敗的重試邏輯
某些失敗是暫時的——只要重新嘗試,便會成功。 這些情況包括檔案系統爭用(其他程序已鎖定該檔案)、暫時性記憶體壓力(垃圾回收機制尚未跟上)以及在 HTML 內容中載入外部資源時的網路超時。 上述程式碼範例採用指數級延遲策略來處理這些情況——從短暫的延遲開始,並在每次重試時將延遲時間加倍,且設有最大重試次數上限(通常為 3 次)。
關鍵在於區分可重試與不可重試的錯誤。 若出現 IOException(檔案鎖定)或 OutOfMemoryException(暫時性壓力),建議重新嘗試。 ArgumentException(無效輸入)或持續出現的渲染錯誤並非 — 重試無濟於事,只會浪費時間與資源。
崩潰後恢復的檢查點機制
當批次工作在數小時內處理 50,000 個檔案時,即使在第 35,000 個檔案處發生當機,也不應意味著必須從頭開始。 檢查點功能 — 記錄哪些檔案已成功處理 — 讓您能夠從上次中斷之處繼續進行。
:path=/static-assets/pdf/content-code-examples/tutorials/batch-pdf-processing-csharp/batch-processing-checkpointing.cs
using IronPdf;
using System;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
using System.Threading;
using System.Collections.Generic;
string inputFolder = "input/";
string outputFolder = "output/";
string checkpointPath = "checkpoint.txt";
string errorLogPath = "errors.txt";
Directory.CreateDirectory(outputFolder);
// Load checkpoint - files already processed successfully
var completedFiles = new HashSet<string>();
if (File.Exists(checkpointPath))
{
completedFiles = new HashSet<string>(File.ReadAllLines(checkpointPath));
Console.WriteLine($"Resuming from checkpoint: {completedFiles.Count} files already processed");
}
// Get files to process (excluding already completed)
string[] allFiles = Directory.GetFiles(inputFolder, "*.html");
string[] filesToProcess = allFiles
.Where(f => !completedFiles.Contains(Path.GetFileName(f)))
.ToArray();
Console.WriteLine($"Files to process: {filesToProcess.Length} (skipping {completedFiles.Count} already done)");
var renderer = new ChromePdfRenderer();
var checkpointLock = new object();
int processed = 0;
int failed = 0;
var options = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount / 2
};
Parallel.ForEach(filesToProcess, options, htmlFile =>
{
string fileName = Path.GetFileName(htmlFile);
string baseName = Path.GetFileNameWithoutExtension(htmlFile);
string outputPath = Path.Combine(outputFolder, $"{baseName}.pdf");
try
{
using var pdf = renderer.RenderHtmlFileAsPdf(htmlFile);
pdf.SaveAs(outputPath);
// Record success in checkpoint (thread-safe)
lock (checkpointLock)
{
File.AppendAllText(checkpointPath, fileName + Environment.NewLine);
}
Interlocked.Increment(ref processed);
Console.WriteLine($"[OK] {baseName}.pdf");
}
catch (Exception ex)
{
Interlocked.Increment(ref failed);
// Log error for review
lock (checkpointLock)
{
File.AppendAllText(errorLogPath,
$"{DateTime.Now:yyyy-MM-dd HH:mm:ss} | {fileName} | {ex.Message}{Environment.NewLine}");
}
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}");
}
});
Console.WriteLine($"\nBatch complete:");
Console.WriteLine($" Newly processed: {processed}");
Console.WriteLine($" Failed: {failed}");
Console.WriteLine($" Total completed: {completedFiles.Count + processed}");
Console.WriteLine($" Checkpoint saved to: {checkpointPath}");
Imports IronPdf
Imports System
Imports System.IO
Imports System.Linq
Imports System.Threading.Tasks
Imports System.Threading
Imports System.Collections.Generic
Module Program
Sub Main()
Dim inputFolder As String = "input/"
Dim outputFolder As String = "output/"
Dim checkpointPath As String = "checkpoint.txt"
Dim errorLogPath As String = "errors.txt"
Directory.CreateDirectory(outputFolder)
' Load checkpoint - files already processed successfully
Dim completedFiles As New HashSet(Of String)()
If File.Exists(checkpointPath) Then
completedFiles = New HashSet(Of String)(File.ReadAllLines(checkpointPath))
Console.WriteLine($"Resuming from checkpoint: {completedFiles.Count} files already processed")
End If
' Get files to process (excluding already completed)
Dim allFiles As String() = Directory.GetFiles(inputFolder, "*.html")
Dim filesToProcess As String() = allFiles _
.Where(Function(f) Not completedFiles.Contains(Path.GetFileName(f))) _
.ToArray()
Console.WriteLine($"Files to process: {filesToProcess.Length} (skipping {completedFiles.Count} already done)")
Dim renderer As New ChromePdfRenderer()
Dim checkpointLock As New Object()
Dim processed As Integer = 0
Dim failed As Integer = 0
Dim options As New ParallelOptions With {
.MaxDegreeOfParallelism = Environment.ProcessorCount \ 2
}
Parallel.ForEach(filesToProcess, options, Sub(htmlFile)
Dim fileName As String = Path.GetFileName(htmlFile)
Dim baseName As String = Path.GetFileNameWithoutExtension(htmlFile)
Dim outputPath As String = Path.Combine(outputFolder, $"{baseName}.pdf")
Try
Using pdf = renderer.RenderHtmlFileAsPdf(htmlFile)
pdf.SaveAs(outputPath)
End Using
' Record success in checkpoint (thread-safe)
SyncLock checkpointLock
File.AppendAllText(checkpointPath, fileName & Environment.NewLine)
End SyncLock
Interlocked.Increment(processed)
Console.WriteLine($"[OK] {baseName}.pdf")
Catch ex As Exception
Interlocked.Increment(failed)
' Log error for review
SyncLock checkpointLock
File.AppendAllText(errorLogPath,
$"{DateTime.Now:yyyy-MM-dd HH:mm:ss} | {fileName} | {ex.Message}{Environment.NewLine}")
End SyncLock
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}")
End Try
End Sub)
Console.WriteLine(vbCrLf & "Batch complete:")
Console.WriteLine($" Newly processed: {processed}")
Console.WriteLine($" Failed: {failed}")
Console.WriteLine($" Total completed: {completedFiles.Count + processed}")
Console.WriteLine($" Checkpoint saved to: {checkpointPath}")
End Sub
End Module
檢查點檔案用作已完成工作的永久記錄。 當處理流程啟動時,它會讀取檢查點檔案,並跳過任何已成功處理過的檔案。 當檔案處理完成時,其路徑會被追加至檢查點檔案中。此方法簡單、基於檔案,且無需任何外部依賴項。
對於更複雜的場景,建議考慮使用資料庫表或分散式快取(如 Redis)作為檢查點儲存區,特別是在多個工作執行緒跨不同機器並行處理檔案的情況下。
處理前後的驗證
驗證是韌性管道的關鍵環節。預處理驗證能在問題輸入浪費處理時間之前將其篩除;後處理驗證則確保輸出符合您的品質與合規要求。
輸入
此實作將處理迴圈封裝在 PreValidate 與 PostValidate 這兩項輔助函式之中。 預驗證會在處理前檢查檔案大小、內容類型及基本的 HTML 結構。 後驗證階段會確認輸出 PDF 的頁數是否正確且檔案大小合理,將通過驗證的檔案移至獨立資料夾,同時將失敗檔案導向拒收資料夾以供人工審查。
:path=/static-assets/pdf/content-code-examples/tutorials/batch-pdf-processing-csharp/batch-processing-validation.cs
using IronPdf;
using System;
using System.IO;
using System.Threading.Tasks;
using System.Threading;
using System.Collections.Concurrent;
string inputFolder = "input/";
string outputFolder = "output/";
string validatedFolder = "validated/";
string rejectedFolder = "rejected/";
Directory.CreateDirectory(outputFolder);
Directory.CreateDirectory(validatedFolder);
Directory.CreateDirectory(rejectedFolder);
string[] inputFiles = Directory.GetFiles(inputFolder, "*.html");
var renderer = new ChromePdfRenderer();
int preValidationFailed = 0;
int processingFailed = 0;
int postValidationFailed = 0;
int succeeded = 0;
var options = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount / 2
};
Parallel.ForEach(inputFiles, options, inputFile =>
{
string fileName = Path.GetFileNameWithoutExtension(inputFile);
string outputPath = Path.Combine(outputFolder, $"{fileName}.pdf");
// Pre-validation: Check input file
if (!PreValidate(inputFile))
{
Interlocked.Increment(ref preValidationFailed);
Console.WriteLine($"[SKIP] {fileName}: Failed pre-validation");
return;
}
try
{
// Process
using var pdf = renderer.RenderHtmlFileAsPdf(inputFile);
pdf.SaveAs(outputPath);
// Post-validation: Check output file
if (PostValidate(outputPath))
{
// Move to validated folder
string validatedPath = Path.Combine(validatedFolder, $"{fileName}.pdf");
File.Move(outputPath, validatedPath, overwrite: true);
Interlocked.Increment(ref succeeded);
Console.WriteLine($"[OK] {fileName}.pdf (validated)");
}
else
{
// Move to rejected folder for manual review
string rejectedPath = Path.Combine(rejectedFolder, $"{fileName}.pdf");
File.Move(outputPath, rejectedPath, overwrite: true);
Interlocked.Increment(ref postValidationFailed);
Console.WriteLine($"[REJECT] {fileName}.pdf: Failed post-validation");
}
}
catch (Exception ex)
{
Interlocked.Increment(ref processingFailed);
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}");
}
});
Console.WriteLine($"\nValidation summary:");
Console.WriteLine($" Succeeded: {succeeded}");
Console.WriteLine($" Pre-validation failed: {preValidationFailed}");
Console.WriteLine($" Processing failed: {processingFailed}");
Console.WriteLine($" Post-validation failed: {postValidationFailed}");
// Pre-validation: Quick checks on input file
bool PreValidate(string filePath)
{
try
{
var fileInfo = new FileInfo(filePath);
// Check file exists and is readable
if (!fileInfo.Exists) return false;
// Check file is not empty
if (fileInfo.Length == 0) return false;
// Check file is not too large (e.g., 50MB limit)
if (fileInfo.Length > 50 * 1024 * 1024) return false;
// Quick content check - must be valid HTML
string content = File.ReadAllText(filePath);
if (string.IsNullOrWhiteSpace(content)) return false;
if (!content.Contains("<html", StringComparison.OrdinalIgnoreCase) &&
!content.Contains("<!DOCTYPE", StringComparison.OrdinalIgnoreCase))
{
return false;
}
return true;
}
catch
{
return false;
}
}
// Post-validation: Verify output PDF meets requirements
bool PostValidate(string pdfPath)
{
try
{
using var pdf = PdfDocument.FromFile(pdfPath);
// Check PDF has at least one page
if (pdf.PageCount < 1) return false;
// Check file size is reasonable (not just header, not corrupted)
var fileInfo = new FileInfo(pdfPath);
if (fileInfo.Length < 1024) return false;
return true;
}
catch
{
return false;
}
}
Imports IronPdf
Imports System
Imports System.IO
Imports System.Threading.Tasks
Imports System.Threading
Imports System.Collections.Concurrent
Module Program
Sub Main()
Dim inputFolder As String = "input/"
Dim outputFolder As String = "output/"
Dim validatedFolder As String = "validated/"
Dim rejectedFolder As String = "rejected/"
Directory.CreateDirectory(outputFolder)
Directory.CreateDirectory(validatedFolder)
Directory.CreateDirectory(rejectedFolder)
Dim inputFiles As String() = Directory.GetFiles(inputFolder, "*.html")
Dim renderer As New ChromePdfRenderer()
Dim preValidationFailed As Integer = 0
Dim processingFailed As Integer = 0
Dim postValidationFailed As Integer = 0
Dim succeeded As Integer = 0
Dim options As New ParallelOptions With {
.MaxDegreeOfParallelism = Environment.ProcessorCount \ 2
}
Parallel.ForEach(inputFiles, options, Sub(inputFile)
Dim fileName As String = Path.GetFileNameWithoutExtension(inputFile)
Dim outputPath As String = Path.Combine(outputFolder, $"{fileName}.pdf")
' Pre-validation: Check input file
If Not PreValidate(inputFile) Then
Interlocked.Increment(preValidationFailed)
Console.WriteLine($"[SKIP] {fileName}: Failed pre-validation")
Return
End If
Try
' Process
Using pdf = renderer.RenderHtmlFileAsPdf(inputFile)
pdf.SaveAs(outputPath)
' Post-validation: Check output file
If PostValidate(outputPath) Then
' Move to validated folder
Dim validatedPath As String = Path.Combine(validatedFolder, $"{fileName}.pdf")
File.Move(outputPath, validatedPath, overwrite:=True)
Interlocked.Increment(succeeded)
Console.WriteLine($"[OK] {fileName}.pdf (validated)")
Else
' Move to rejected folder for manual review
Dim rejectedPath As String = Path.Combine(rejectedFolder, $"{fileName}.pdf")
File.Move(outputPath, rejectedPath, overwrite:=True)
Interlocked.Increment(postValidationFailed)
Console.WriteLine($"[REJECT] {fileName}.pdf: Failed post-validation")
End If
End Using
Catch ex As Exception
Interlocked.Increment(processingFailed)
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}")
End Try
End Sub)
Console.WriteLine(vbCrLf & "Validation summary:")
Console.WriteLine($" Succeeded: {succeeded}")
Console.WriteLine($" Pre-validation failed: {preValidationFailed}")
Console.WriteLine($" Processing failed: {processingFailed}")
Console.WriteLine($" Post-validation failed: {postValidationFailed}")
End Sub
' Pre-validation: Quick checks on input file
Function PreValidate(filePath As String) As Boolean
Try
Dim fileInfo As New FileInfo(filePath)
' Check file exists and is readable
If Not fileInfo.Exists Then Return False
' Check file is not empty
If fileInfo.Length = 0 Then Return False
' Check file is not too large (e.g., 50MB limit)
If fileInfo.Length > 50 * 1024 * 1024 Then Return False
' Quick content check - must be valid HTML
Dim content As String = File.ReadAllText(filePath)
If String.IsNullOrWhiteSpace(content) Then Return False
If Not content.Contains("<html", StringComparison.OrdinalIgnoreCase) AndAlso
Not content.Contains("<!DOCTYPE", StringComparison.OrdinalIgnoreCase) Then
Return False
End If
Return True
Catch
Return False
End Try
End Function
' Post-validation: Verify output PDF meets requirements
Function PostValidate(pdfPath As String) As Boolean
Try
Using pdf = PdfDocument.FromFile(pdfPath)
' Check PDF has at least one page
If pdf.PageCount < 1 Then Return False
' Check file size is reasonable (not just header, not corrupted)
Dim fileInfo As New FileInfo(pdfPath)
If fileInfo.Length < 1024 Then Return False
Return True
End Using
Catch
Return False
End Try
End Function
End Module
輸出


所有 5 個檔案均通過驗證,並已移至"已驗證"資料夾。
預處理驗證應快速完成——您只需檢查明顯錯誤的輸入,而非執行完整處理。 後處理驗證可更為徹底,特別是針對合規轉換,其輸出必須符合特定標準(如 PDF/A、PDF/UA)。 任何未能通過後處理驗證的檔案,應標記為需人工審查,而非直接接受。
非同步與並行處理模式
IronPDF 同時支援 Parallel.ForEach(基於執行緒的並行處理)與 async/await(非同步 I/O)。 了解何時使用各工具——以及如何有效結合它們——是最大化工作效率的關鍵。
任務並行函式庫整合
Parallel.ForEach 是針對 CPU 密集型批次作業最簡單且最有效的方法。 IronPDF 的渲染引擎會大量消耗 CPU 資源(HTML 解析、CSS 版面配置、影像光柵化),而 Parallel.ForEach 會自動將這些工作分配至所有可用的處理核心。
:path=/static-assets/pdf/content-code-examples/tutorials/batch-pdf-processing-csharp/batch-processing-tpl.cs
using IronPdf;
using System;
using System.IO;
using System.Threading.Tasks;
using System.Threading;
using System.Diagnostics;
string inputFolder = "input/";
string outputFolder = "output/";
Directory.CreateDirectory(outputFolder);
string[] htmlFiles = Directory.GetFiles(inputFolder, "*.html");
var renderer = new ChromePdfRenderer();
Console.WriteLine($"Processing {htmlFiles.Length} files with {Environment.ProcessorCount} CPU cores");
int processed = 0;
var stopwatch = Stopwatch.StartNew();
// Configure parallelism based on system resources
// Rule of thumb: ProcessorCount / 2 for memory-intensive operations
var options = new ParallelOptions
{
MaxDegreeOfParallelism = Math.Max(1, Environment.ProcessorCount / 2)
};
Console.WriteLine($"Max parallelism: {options.MaxDegreeOfParallelism}");
// Use Parallel.ForEach for CPU-bound batch operations
Parallel.ForEach(htmlFiles, options, htmlFile =>
{
string fileName = Path.GetFileNameWithoutExtension(htmlFile);
string outputPath = Path.Combine(outputFolder, $"{fileName}.pdf");
try
{
// Render HTML to PDF
using var pdf = renderer.RenderHtmlFileAsPdf(htmlFile);
pdf.SaveAs(outputPath);
int current = Interlocked.Increment(ref processed);
// Progress reporting every 10 files
if (current % 10 == 0)
{
double elapsed = stopwatch.Elapsed.TotalSeconds;
double rate = current / elapsed;
double remaining = (htmlFiles.Length - current) / rate;
Console.WriteLine($"Progress: {current}/{htmlFiles.Length} ({rate:F1} files/sec, ~{remaining:F0}s remaining)");
}
}
catch (Exception ex)
{
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}");
}
});
stopwatch.Stop();
double totalRate = processed / stopwatch.Elapsed.TotalSeconds;
Console.WriteLine($"\nComplete:");
Console.WriteLine($" Files processed: {processed}/{htmlFiles.Length}");
Console.WriteLine($" Total time: {stopwatch.Elapsed.TotalSeconds:F1}s");
Console.WriteLine($" Average rate: {totalRate:F1} files/sec");
Console.WriteLine($" Time per file: {stopwatch.Elapsed.TotalMilliseconds / processed:F0}ms");
// Memory monitoring helper (call between chunks for large batches)
void CheckMemoryPressure()
{
const long memoryThreshold = 4L * 1024 * 1024 * 1024; // 4 GB
long currentMemory = GC.GetTotalMemory(forceFullCollection: false);
if (currentMemory > memoryThreshold)
{
Console.WriteLine($"Memory pressure detected ({currentMemory / 1024 / 1024}MB), forcing GC...");
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
}
}
Imports IronPdf
Imports System
Imports System.IO
Imports System.Threading.Tasks
Imports System.Threading
Imports System.Diagnostics
Module Program
Sub Main()
Dim inputFolder As String = "input/"
Dim outputFolder As String = "output/"
Directory.CreateDirectory(outputFolder)
Dim htmlFiles As String() = Directory.GetFiles(inputFolder, "*.html")
Dim renderer As New ChromePdfRenderer()
Console.WriteLine($"Processing {htmlFiles.Length} files with {Environment.ProcessorCount} CPU cores")
Dim processed As Integer = 0
Dim stopwatch As Stopwatch = Stopwatch.StartNew()
' Configure parallelism based on system resources
' Rule of thumb: ProcessorCount / 2 for memory-intensive operations
Dim options As New ParallelOptions With {
.MaxDegreeOfParallelism = Math.Max(1, Environment.ProcessorCount \ 2)
}
Console.WriteLine($"Max parallelism: {options.MaxDegreeOfParallelism}")
' Use Parallel.ForEach for CPU-bound batch operations
Parallel.ForEach(htmlFiles, options, Sub(htmlFile)
Dim fileName As String = Path.GetFileNameWithoutExtension(htmlFile)
Dim outputPath As String = Path.Combine(outputFolder, $"{fileName}.pdf")
Try
' Render HTML to PDF
Using pdf = renderer.RenderHtmlFileAsPdf(htmlFile)
pdf.SaveAs(outputPath)
End Using
Dim current As Integer = Interlocked.Increment(processed)
' Progress reporting every 10 files
If current Mod 10 = 0 Then
Dim elapsed As Double = stopwatch.Elapsed.TotalSeconds
Dim rate As Double = current / elapsed
Dim remaining As Double = (htmlFiles.Length - current) / rate
Console.WriteLine($"Progress: {current}/{htmlFiles.Length} ({rate:F1} files/sec, ~{remaining:F0}s remaining)")
End If
Catch ex As Exception
Console.WriteLine($"[ERROR] {fileName}: {ex.Message}")
End Try
End Sub)
stopwatch.Stop()
Dim totalRate As Double = processed / stopwatch.Elapsed.TotalSeconds
Console.WriteLine(vbCrLf & "Complete:")
Console.WriteLine($" Files processed: {processed}/{htmlFiles.Length}")
Console.WriteLine($" Total time: {stopwatch.Elapsed.TotalSeconds:F1}s")
Console.WriteLine($" Average rate: {totalRate:F1} files/sec")
Console.WriteLine($" Time per file: {stopwatch.Elapsed.TotalMilliseconds / processed:F0}ms")
' Memory monitoring helper (call between chunks for large batches)
CheckMemoryPressure()
End Sub
Sub CheckMemoryPressure()
Const memoryThreshold As Long = 4L * 1024 * 1024 * 1024 ' 4 GB
Dim currentMemory As Long = GC.GetTotalMemory(forceFullCollection:=False)
If currentMemory > memoryThreshold Then
Console.WriteLine($"Memory pressure detected ({currentMemory \ 1024 \ 1024}MB), forcing GC...")
GC.Collect()
GC.WaitForPendingFinalizers()
GC.Collect()
End If
End Sub
End Module
MaxDegreeOfParallelism 選項至關重要。 若未設定此參數,TPL 將嘗試使用所有可用核心,若每次渲染皆需大量資源,可能會導致記憶體負荷過高。請根據系統可用 RAM 除以典型單次渲染的記憶體消耗量來設定此值(對於複雜的 HTML,通常每同時進行一次渲染約需 100–300 MB)。
控制並發 (SemaphoreSlim)
當您需要比 Parallel.ForEach 所提供的更精細的並發控制時 —— 例如,在混合使用非同步 I/O 與 CPU 密集型渲染時 —— SemaphoreSlim 可讓您明確控制同時執行的操作數量。 模式很簡單:建立一個 SemaphoreSlim 並設定您所需的並發限制(例如 4 個並發渲染),在每次渲染前呼叫 WaitAsync,並在後續的 finally 區塊中呼叫 Release。 接著使用 Task.WhenAll 啟動所有任務。
當您的處理流程同時包含 I/O 密集型步驟(從 Blob 儲存讀取檔案、將結果寫入資料庫)與 CPU 密集型步驟(渲染 PDF)時,此模式特別有用。 信號量會限制受 CPU 限制的渲染並發性,同時允許受 I/O 限制的步驟不受限速地繼續進行。
Async/Await 最佳實務
IronPDF 提供其渲染方法的非同步版本,包括 RenderUrlAsPdfAsync 及 RenderHtmlFileAsPdfAsync。 這些工具非常適合用於網路應用程式(在這種情況下,阻塞請求執行緒是不可接受的),以及將 PDF 渲染與非同步 I/O 操作結合的處理流程。
批次處理的幾項重要非同步最佳實踐:
請勿使用 Task.Run 來封裝 IronPDF 的同步方法 — 應改用原生的非同步變體。 將同步方法包裹在 Task.Run 中,不僅會浪費線程池中的線程,還會增加開銷,卻毫無益處。
請勿在非同步任務中使用 .Result 或 .Wait() —— 此舉會阻塞呼叫執行緒,並可能在 UI 或 ASP.NET 環境中導致死鎖。 請務必使用 await。
請將 Task.WhenAll 呼叫進行批次處理,而非等待所有任務同時完成。 若您有 10,000 個任務,並同時對所有任務呼叫 Task.WhenAll,您將啟動 10,000 個並發操作。 請改用 .Chunk(10) 或類似方法將其分組處理,並依序等待各組的處理結果。
避免記憶體耗盡
記憶體耗盡是批次 PDF 處理中最常見的失敗模式。 防禦性策略是在每次渲染前使用 GC.GetTotalMemory() 監控記憶體使用量,並在消耗量超過閾值(例如 4 GB 或可用 RAM 的 80%)時觸發垃圾回收。 請先呼叫 GC.Collect(),接著呼叫 GC.WaitForPendingFinalizers() 以及第二個 GC.Collect(),以盡可能釋放記憶體,然後再繼續執行。 這雖會增加一點停頓,但能避免 OutOfMemoryException 在第 30,000 個檔案處導致整個批次崩潰的災難性後果。
將此與 TPL 章節中的 MaxDegreeOfParallelism 限流機制,以及記憶體管理章節中的 using 釋放模式結合,您便擁有了一套針對記憶體問題的三層防禦機制:限制並發性、積極釋放資源,以及透過安全閥進行監控。
批次工作的雲端部署
現代的批次處理越來越多地在雲端運行,您可以在雲端中根據工作負載需求彈性擴展運算資源,並僅需為實際使用的資源付費。 IronPDF 可在所有主要雲端平台上運行 — 以下是針對各平台架構批次處理流程的方法。
Azure Functions 與 Durable Functions
Azure Durable Functions 提供內建的扇出/扇入 (fan-out/fan-in) 模式協調功能,使其成為批次 PDF/A 處理的理想選擇。 協調器函式會將工作分配給多個活動函式實例,每個實例負責處理部分檔案。 您的協調器會在扇出迴圈中呼叫 CallActivityAsync,每個活動函式會實例化一個 ChromePdfRenderer,處理其負責的檔案區塊,而協調器則負責彙整結果。
Azure Functions 的關鍵注意事項:預設用量方案針對每次函式呼叫設有 5 分鐘的超時限制,且記憶體容量有限。 若需進行批次處理,請選用 Premium 或 Dedicated 方案,該方案支援更長的超時設定及更大的記憶體容量。 IronPDF 需要完整的 .NET 執行環境(非精簡版),因此請確保您的功能應用程式已設定為 .NET 8+ 並使用正確的執行環境識別碼。
AWS Lambda 與 Step Functions
AWS Step Functions 提供與 Azure Durable Functions 相似的協調功能。 狀態機的每個步驟都會呼叫一個 Lambda 函式,用以處理一組檔案。 您的 Lambda 處理程序會接收一批 S3 物件金鑰,載入每個 PDF 檔案(使用 PdfDocument.FromFile),套用您的處理流程(壓縮、格式轉換等),並將結果寫回輸出 S3 儲存桶。
AWS Lambda 的最大執行時間為 15 分鐘,且儲存空間有限(預設為 512 MB,可配置至 10 GB)。 對於大型批次工作,請使用 Step Functions 將工作負載分割為多個區塊,並在各自的 Lambda 呼叫中處理每個區塊。 請將中間結果儲存至 S3,而非本地儲存空間。
Kubernetes 工作排程
對於自行運作 Kubernetes 叢集的組織而言,批次 PDF 處理功能與 Kubernetes Jobs 及 CronJobs 非常契合。 每個 pod 會執行一個批次處理程序,該程序會從佇列(Azure Service Bus、RabbitMQ 或 SQS)中提取檔案,使用 IronPDF 進行處理,並將結果寫入物件儲存。 工作器迴圈遵循前文所述的相同模式:從佇列中取出訊息,使用 ChromePdfRenderer.RenderHtmlAsPdf() 或 PdfDocument.FromFile() 處理文件,上傳結果,並確認訊息。 將處理流程封裝在相同的 try-catch 中,並採用彈性模式中的重試邏輯,同時使用 SemaphoreSlim 來控制每個 Pod 的並發量。
IronPDF 提供官方 Docker 支援,並可在 Linux 容器上運行。 請使用 IronPdf NuGet 套件,並搭配您容器作業系統對應的原生執行階段套件(例如,針對基於 Linux 的映像檔請使用 IronPdf.Linux)。 針對 Kubernetes,請設定符合 IronPDF 記憶體需求的資源請求與限制(通常每個 Pod 需 512 MB 至 2 GB,視並發量而定)。 Horizontal Pod Autoscaler 可根據佇列深度調整工作節點的數量,而檢查點機制則確保即使 Pod 被驅逐,也不會遺失任何工作進度。

成本優化策略
若未審慎規劃資源分配,雲端批次處理的費用可能會相當高昂。 以下是影響最大的策略:
選擇合適的運算資源。PDF 渲染主要消耗 CPU 和記憶體資源,而非 GPU 資源。請使用運算最佳化執行個體(Azure 上的 C 系列、AWS 上的 C 型),而非通用型或記憶體最佳化執行個體。 您將獲得更佳的每渲染次數成本效益。
對於可容忍中斷的批次工作負載,請使用按需/預留型實例。 PDF 批次處理本質上具有可續傳特性(得益於檢查點機制),使其成為按次計費模式的理想選擇,此模式通常比隨選服務提供 60% 至 90% 的折扣。
若時程允許,請於非尖峰時段處理。 許多雲端服務供應商在夜間及週末提供更優惠的價格或更高的即時可用性。
盡早壓縮,一次儲存。將壓縮作業納入處理流程中,而非作為獨立步驟執行。從一開始就儲存壓縮後的 PDF 檔案,可降低檔案存檔期間的持續儲存成本。
對儲存空間進行分層。經常被存取的已處理 PDF 檔案應存放於熱儲存區; 鮮少被存取的歸檔 PDF 檔案應移至冷儲存或歸檔層級(Azure Cool/Archive、AWS S3 Glacier)。 僅此一項即可將儲存成本降低 50–80%。
實際應用流程範例
讓我們透過一套完整的、生產級批次處理流程,將所有環節串聯起來,以展示完整的作業流程:匯入 → 驗證 → 處理 → 歸檔 → 報告。
此範例會處理一個包含 HTML 發票範本的目錄,將其渲染為 PDF,壓縮輸出檔案,轉換為符合歸檔規範的 PDF/A-3b 格式,驗證結果,並在最後產生一份摘要報告。
使用上述批次轉換範例中的相同 5 張 HTML 發票...
:path=/static-assets/pdf/content-code-examples/tutorials/batch-pdf-processing-csharp/batch-processing-full-pipeline.cs
using IronPdf;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
using System.Threading;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Text.Json;
// Configuration
var config = new PipelineConfig
{
InputFolder = "input/",
OutputFolder = "output/",
ArchiveFolder = "archive/",
ErrorFolder = "errors/",
CheckpointPath = "pipeline-checkpoint.json",
ReportPath = "pipeline-report.json",
MaxConcurrency = Math.Max(1, Environment.ProcessorCount / 2),
MaxRetries = 3,
JpegQuality = 70
};
// Initialize folders
Directory.CreateDirectory(config.OutputFolder);
Directory.CreateDirectory(config.ArchiveFolder);
Directory.CreateDirectory(config.ErrorFolder);
// Load checkpoint for resume capability
var checkpoint = LoadCheckpoint(config.CheckpointPath);
var results = new ConcurrentBag<ProcessingResult>();
var stopwatch = Stopwatch.StartNew();
// Get files to process
string[] allFiles = Directory.GetFiles(config.InputFolder, "*.html");
string[] filesToProcess = allFiles
.Where(f => !checkpoint.CompletedFiles.Contains(Path.GetFileName(f)))
.ToArray();
Console.WriteLine($"Pipeline starting:");
Console.WriteLine($" Total files: {allFiles.Length}");
Console.WriteLine($" Already processed: {checkpoint.CompletedFiles.Count}");
Console.WriteLine($" To process: {filesToProcess.Length}");
Console.WriteLine($" Concurrency: {config.MaxConcurrency}");
var renderer = new ChromePdfRenderer();
var checkpointLock = new object();
var options = new ParallelOptions
{
MaxDegreeOfParallelism = config.MaxConcurrency
};
Parallel.ForEach(filesToProcess, options, inputFile =>
{
var result = new ProcessingResult
{
FileName = Path.GetFileName(inputFile),
StartTime = DateTime.UtcNow
};
try
{
// Stage: Pre-validation
if (!ValidateInput(inputFile))
{
result.Status = "PreValidationFailed";
result.Error = "Input file failed validation";
results.Add(result);
return;
}
string baseName = Path.GetFileNameWithoutExtension(inputFile);
string tempPath = Path.Combine(config.OutputFolder, $"{baseName}.pdf");
string archivePath = Path.Combine(config.ArchiveFolder, $"{baseName}.pdf");
// Stage: Process with retry
PdfDocument pdf = null;
int attempt = 0;
bool success = false;
while (attempt < config.MaxRetries && !success)
{
attempt++;
try
{
pdf = renderer.RenderHtmlFileAsPdf(inputFile);
success = true;
}
catch (Exception ex) when (IsTransient(ex) && attempt < config.MaxRetries)
{
Thread.Sleep((int)Math.Pow(2, attempt) * 500);
}
}
if (!success || pdf == null)
{
result.Status = "ProcessingFailed";
result.Error = "Max retries exceeded";
results.Add(result);
return;
}
using (pdf)
{
// Stage: Compress and convert to PDF/A-3b for archival
pdf.SaveAsPdfA(tempPath, PdfAVersions.PdfA3b);
}
// Stage: Post-validation
if (!ValidateOutput(tempPath))
{
File.Move(tempPath, Path.Combine(config.ErrorFolder, $"{baseName}.pdf"), overwrite: true);
result.Status = "PostValidationFailed";
result.Error = "Output file failed validation";
results.Add(result);
return;
}
// Stage: Archive
File.Move(tempPath, archivePath, overwrite: true);
// Update checkpoint
lock (checkpointLock)
{
checkpoint.CompletedFiles.Add(result.FileName);
SaveCheckpoint(config.CheckpointPath, checkpoint);
}
result.Status = "Success";
result.OutputSize = new FileInfo(archivePath).Length;
result.EndTime = DateTime.UtcNow;
results.Add(result);
Console.WriteLine($"[OK] {baseName}.pdf ({result.OutputSize / 1024}KB)");
}
catch (Exception ex)
{
result.Status = "Error";
result.Error = ex.Message;
result.EndTime = DateTime.UtcNow;
results.Add(result);
Console.WriteLine($"[ERROR] {result.FileName}: {ex.Message}");
}
});
stopwatch.Stop();
// Generate report
var report = new PipelineReport
{
TotalFiles = allFiles.Length,
ProcessedThisRun = results.Count,
Succeeded = results.Count(r => r.Status == "Success"),
PreValidationFailed = results.Count(r => r.Status == "PreValidationFailed"),
ProcessingFailed = results.Count(r => r.Status == "ProcessingFailed"),
PostValidationFailed = results.Count(r => r.Status == "PostValidationFailed"),
Errors = results.Count(r => r.Status == "Error"),
TotalDuration = stopwatch.Elapsed,
AverageFileTime = results.Any() ? TimeSpan.FromMilliseconds(stopwatch.Elapsed.TotalMilliseconds / results.Count) : TimeSpan.Zero
};
string reportJson = JsonSerializer.Serialize(report, new JsonSerializerOptions { WriteIndented = true });
File.WriteAllText(config.ReportPath, reportJson);
Console.WriteLine($"\n=== Pipeline Complete ===");
Console.WriteLine($"Succeeded: {report.Succeeded}");
Console.WriteLine($"Failed: {report.PreValidationFailed + report.ProcessingFailed + report.PostValidationFailed + report.Errors}");
Console.WriteLine($"Duration: {report.TotalDuration.TotalMinutes:F1} minutes");
Console.WriteLine($"Report: {config.ReportPath}");
// Helper methods
bool ValidateInput(string path)
{
try
{
var info = new FileInfo(path);
if (!info.Exists || info.Length == 0 || info.Length > 50 * 1024 * 1024) return false;
string content = File.ReadAllText(path);
return content.Contains("<html", StringComparison.OrdinalIgnoreCase) ||
content.Contains("<!DOCTYPE", StringComparison.OrdinalIgnoreCase);
}
catch { return false; }
}
bool ValidateOutput(string path)
{
try
{
using var pdf = PdfDocument.FromFile(path);
return pdf.PageCount > 0 && new FileInfo(path).Length > 1024;
}
catch { return false; }
}
bool IsTransient(Exception ex) =>
ex is IOException || ex is OutOfMemoryException ||
ex.Message.Contains("timeout", StringComparison.OrdinalIgnoreCase);
Checkpoint LoadCheckpoint(string path)
{
if (File.Exists(path))
{
string json = File.ReadAllText(path);
return JsonSerializer.Deserialize<Checkpoint>(json) ?? new Checkpoint();
}
return new Checkpoint();
}
void SaveCheckpoint(string path, Checkpoint cp) =>
File.WriteAllText(path, JsonSerializer.Serialize(cp));
ata classes
s PipelineConfig
public string InputFolder { get; set; } = "";
public string OutputFolder { get; set; } = "";
public string ArchiveFolder { get; set; } = "";
public string ErrorFolder { get; set; } = "";
public string CheckpointPath { get; set; } = "";
public string ReportPath { get; set; } = "";
public int MaxConcurrency { get; set; }
public int MaxRetries { get; set; }
public int JpegQuality { get; set; }
s Checkpoint
public HashSet<string> CompletedFiles { get; set; } = new();
s ProcessingResult
public string FileName { get; set; } = "";
public string Status { get; set; } = "";
public string Error { get; set; } = "";
public long OutputSize { get; set; }
public DateTime StartTime { get; set; }
public DateTime EndTime { get; set; }
s PipelineReport
public int TotalFiles { get; set; }
public int ProcessedThisRun { get; set; }
public int Succeeded { get; set; }
public int PreValidationFailed { get; set; }
public int ProcessingFailed { get; set; }
public int PostValidationFailed { get; set; }
public int Errors { get; set; }
public TimeSpan TotalDuration { get; set; }
public TimeSpan AverageFileTime { get; set; }
Imports IronPdf
Imports System
Imports System.Collections.Generic
Imports System.IO
Imports System.Linq
Imports System.Threading.Tasks
Imports System.Threading
Imports System.Collections.Concurrent
Imports System.Diagnostics
Imports System.Text.Json
' Configuration
Dim config As New PipelineConfig With {
.InputFolder = "input/",
.OutputFolder = "output/",
.ArchiveFolder = "archive/",
.ErrorFolder = "errors/",
.CheckpointPath = "pipeline-checkpoint.json",
.ReportPath = "pipeline-report.json",
.MaxConcurrency = Math.Max(1, Environment.ProcessorCount \ 2),
.MaxRetries = 3,
.JpegQuality = 70
}
' Initialize folders
Directory.CreateDirectory(config.OutputFolder)
Directory.CreateDirectory(config.ArchiveFolder)
Directory.CreateDirectory(config.ErrorFolder)
' Load checkpoint for resume capability
Dim checkpoint As Checkpoint = LoadCheckpoint(config.CheckpointPath)
Dim results As New ConcurrentBag(Of ProcessingResult)()
Dim stopwatch As Stopwatch = Stopwatch.StartNew()
' Get files to process
Dim allFiles As String() = Directory.GetFiles(config.InputFolder, "*.html")
Dim filesToProcess As String() = allFiles.
Where(Function(f) Not checkpoint.CompletedFiles.Contains(Path.GetFileName(f))).
ToArray()
Console.WriteLine("Pipeline starting:")
Console.WriteLine($" Total files: {allFiles.Length}")
Console.WriteLine($" Already processed: {checkpoint.CompletedFiles.Count}")
Console.WriteLine($" To process: {filesToProcess.Length}")
Console.WriteLine($" Concurrency: {config.MaxConcurrency}")
Dim renderer As New ChromePdfRenderer()
Dim checkpointLock As New Object()
Dim options As New ParallelOptions With {
.MaxDegreeOfParallelism = config.MaxConcurrency
}
Parallel.ForEach(filesToProcess, options, Sub(inputFile)
Dim result As New ProcessingResult With {
.FileName = Path.GetFileName(inputFile),
.StartTime = DateTime.UtcNow
}
Try
' Stage: Pre-validation
If Not ValidateInput(inputFile) Then
result.Status = "PreValidationFailed"
result.Error = "Input file failed validation"
results.Add(result)
Return
End If
Dim baseName As String = Path.GetFileNameWithoutExtension(inputFile)
Dim tempPath As String = Path.Combine(config.OutputFolder, $"{baseName}.pdf")
Dim archivePath As String = Path.Combine(config.ArchiveFolder, $"{baseName}.pdf")
' Stage: Process with retry
Dim pdf As PdfDocument = Nothing
Dim attempt As Integer = 0
Dim success As Boolean = False
While attempt < config.MaxRetries AndAlso Not success
attempt += 1
Try
pdf = renderer.RenderHtmlFileAsPdf(inputFile)
success = True
Catch ex As Exception When IsTransient(ex) AndAlso attempt < config.MaxRetries
Thread.Sleep(CInt(Math.Pow(2, attempt)) * 500)
End Try
End While
If Not success OrElse pdf Is Nothing Then
result.Status = "ProcessingFailed"
result.Error = "Max retries exceeded"
results.Add(result)
Return
End If
Using pdf
' Stage: Compress and convert to PDF/A-3b for archival
pdf.SaveAsPdfA(tempPath, PdfAVersions.PdfA3b)
End Using
' Stage: Post-validation
If Not ValidateOutput(tempPath) Then
File.Move(tempPath, Path.Combine(config.ErrorFolder, $"{baseName}.pdf"), overwrite:=True)
result.Status = "PostValidationFailed"
result.Error = "Output file failed validation"
results.Add(result)
Return
End If
' Stage: Archive
File.Move(tempPath, archivePath, overwrite:=True)
' Update checkpoint
SyncLock checkpointLock
checkpoint.CompletedFiles.Add(result.FileName)
SaveCheckpoint(config.CheckpointPath, checkpoint)
End SyncLock
result.Status = "Success"
result.OutputSize = New FileInfo(archivePath).Length
result.EndTime = DateTime.UtcNow
results.Add(result)
Console.WriteLine($"[OK] {baseName}.pdf ({result.OutputSize \ 1024}KB)")
Catch ex As Exception
result.Status = "Error"
result.Error = ex.Message
result.EndTime = DateTime.UtcNow
results.Add(result)
Console.WriteLine($"[ERROR] {result.FileName}: {ex.Message}")
End Try
End Sub)
stopwatch.Stop()
' Generate report
Dim report As New PipelineReport With {
.TotalFiles = allFiles.Length,
.ProcessedThisRun = results.Count,
.Succeeded = results.Count(Function(r) r.Status = "Success"),
.PreValidationFailed = results.Count(Function(r) r.Status = "PreValidationFailed"),
.ProcessingFailed = results.Count(Function(r) r.Status = "ProcessingFailed"),
.PostValidationFailed = results.Count(Function(r) r.Status = "PostValidationFailed"),
.Errors = results.Count(Function(r) r.Status = "Error"),
.TotalDuration = stopwatch.Elapsed,
.AverageFileTime = If(results.Any(), TimeSpan.FromMilliseconds(stopwatch.Elapsed.TotalMilliseconds / results.Count), TimeSpan.Zero)
}
Dim reportJson As String = JsonSerializer.Serialize(report, New JsonSerializerOptions With {.WriteIndented = True})
File.WriteAllText(config.ReportPath, reportJson)
Console.WriteLine(vbCrLf & "=== Pipeline Complete ===")
Console.WriteLine($"Succeeded: {report.Succeeded}")
Console.WriteLine($"Failed: {report.PreValidationFailed + report.ProcessingFailed + report.PostValidationFailed + report.Errors}")
Console.WriteLine($"Duration: {report.TotalDuration.TotalMinutes:F1} minutes")
Console.WriteLine($"Report: {config.ReportPath}")
' Helper methods
Function ValidateInput(path As String) As Boolean
Try
Dim info As New FileInfo(path)
If Not info.Exists OrElse info.Length = 0 OrElse info.Length > 50 * 1024 * 1024 Then Return False
Dim content As String = File.ReadAllText(path)
Return content.Contains("<html", StringComparison.OrdinalIgnoreCase) OrElse
content.Contains("<!DOCTYPE", StringComparison.OrdinalIgnoreCase)
Catch
Return False
End Try
End Function
Function ValidateOutput(path As String) As Boolean
Try
Using pdf = PdfDocument.FromFile(path)
Return pdf.PageCount > 0 AndAlso New FileInfo(path).Length > 1024
End Using
Catch
Return False
End Try
End Function
Function IsTransient(ex As Exception) As Boolean
Return TypeOf ex Is IOException OrElse TypeOf ex Is OutOfMemoryException OrElse
ex.Message.Contains("timeout", StringComparison.OrdinalIgnoreCase)
End Function
Function LoadCheckpoint(path As String) As Checkpoint
If File.Exists(path) Then
Dim json As String = File.ReadAllText(path)
Return JsonSerializer.Deserialize(Of Checkpoint)(json) OrElse New Checkpoint()
End If
Return New Checkpoint()
End Function
Sub SaveCheckpoint(path As String, cp As Checkpoint)
File.WriteAllText(path, JsonSerializer.Serialize(cp))
End Sub
' Data classes
Class PipelineConfig
Public Property InputFolder As String = ""
Public Property OutputFolder As String = ""
Public Property ArchiveFolder As String = ""
Public Property ErrorFolder As String = ""
Public Property CheckpointPath As String = ""
Public Property ReportPath As String = ""
Public Property MaxConcurrency As Integer
Public Property MaxRetries As Integer
Public Property JpegQuality As Integer
End Class
Class Checkpoint
Public Property CompletedFiles As HashSet(Of String) = New HashSet(Of String)()
End Class
Class ProcessingResult
Public Property FileName As String = ""
Public Property Status As String = ""
Public Property Error As String = ""
Public Property OutputSize As Long
Public Property StartTime As DateTime
Public Property EndTime As DateTime
End Class
Class PipelineReport
Public Property TotalFiles As Integer
Public Property ProcessedThisRun As Integer
Public Property Succeeded As Integer
Public Property PreValidationFailed As Integer
Public Property ProcessingFailed As Integer
Public Property PostValidationFailed As Integer
Public Property Errors As Integer
Public Property TotalDuration As TimeSpan
Public Property AverageFileTime As TimeSpan
End Class
輸出

顯示批次處理結果的管線報告。
此處理流程整合了本教學中涵蓋的所有模式:具受控並發性的平行處理、採用"失敗時跳過"的單檔錯誤處理、針對暫時性錯誤的重試邏輯、用於崩潰後恢復的檢查點機制、前後處理驗證、透過顯式釋放進行的記憶體管理,以及包含最終摘要報告的全面性記錄。
此處理流程的輸出包含一個壓縮目錄,內含符合 PDF/A-3b 標準的歸檔檔案、用於恢復處理能力的檢查點檔案、無法處理檔案的錯誤日誌,以及包含處理統計資料的摘要報告。 這正是您處理任何嚴謹的批次 PDF 處理工作負載時所需的解決方案。
後續步驟
大規模批次處理 PDF 並非僅僅在迴圈中呼叫渲染方法那麼簡單。這需要針對並發、記憶體管理、錯誤處理及部署進行周詳的架構規劃——並搭配合適的函式庫,才能讓一切順利運作。 IronPDF 提供執行緒安全的渲染引擎、非同步 API 介面、壓縮工具及格式轉換功能,這些構成了任何 .NET 批次 PDF 處理流程的基礎。
無論您是正在建置能在日出前生成數千份 PDF 的夜間報告生成器、將舊有文件檔案庫遷移至符合 PDF/A 標準,還是要在 Kubernetes 上架設雲原生處理服務,本教學中的模式都能為您提供一個經過驗證的建構框架。 透過受控的並發進行平行處理,可維持高吞吐量。 當個別檔案發生問題時,"失敗跳過"與"重試"邏輯能確保處理流程持續運作。 檢查點功能可確保您不會遺失任何進度。 此外,雲端部署模式讓您能根據工作負載彈性擴展運算資源。
準備好開始建構了嗎? 立即下載 IronPDF 並體驗免費試用版——同一個函式庫即可處理從單一檔案渲染到數十萬檔案的批次處理流程。 若您對特定使用情境的擴展、部署或架構有任何疑問,歡迎聯繫我們的工程支援團隊——我們曾協助各類規模的團隊建置批次處理管道,並樂意協助您正確建置。
常見問題
什麼是 C# 中的批次 PDF 處理?
C# 中的批次 PDF 處理,是指利用 C# 程式語言同時自動處理大量 PDF 文件。此方法非常適合用於大規模自動化文件工作流程。
IronPDF 如何協助進行批次 PDF 處理?
IronPDF 提供強大的工具與函式庫,可簡化 C# 中的批次 PDF 處理流程。它支援平行處理,能同時高效處理數千份 PDF 檔案。
使用 IronPDF 進行並行處理有哪些好處?
透過 IronPDF 進行平行處理,可實現更快、更高效的 PDF 批次處理。此方法能最大化資源利用率,並顯著縮短處理時間。
IronPDF 能否部署於雲端平台進行批次處理?
是的,IronPDF 可部署於 Azure Functions、AWS Lambda 和 Kubernetes 等雲端平台,實現可擴展且靈活的批次 PDF 處理。
IronPDF 在批次處理 PDF 時如何處理錯誤?
IronPDF 內建錯誤處理與重試邏輯功能,可確保批次 PDF 處理過程的可靠性。這些功能有助於在無需人工介入的情況下管理並修正錯誤。
在 IronPDF 的 PDF 處理中,重試邏輯扮演什麼角色?
IronPDF 中的重試邏輯可確保暫時性問題不會中斷批次處理工作流程。若發生錯誤,IronPDF 會自動嘗試重新處理失敗的文件。
為什麼 C# 是進行 PDF 批次處理的合適語言?
C# 是一種功能強大的程式語言,擁有豐富的函式庫與框架,使其成為批次處理 PDF 的理想選擇。它能與 IronPDF 無縫整合,實現高效的文件自動化處理。
IronPDF 如何確保處理 PDF 文件時的安全性?
IronPDF 透過提供加密與密碼保護功能,支援 PDF 文件的資安處理,確保處理後的文件能維持機密性與安全性。
企業中批量處理 PDF 的應用場景有哪些?
企業會運用批次 PDF 處理功能來執行諸如批量生成發票、文件數位化及大規模報告分發等任務。IronPDF 透過自動化與簡化文件工作流程,協助實現這些應用場景。
IronPDF 能否處理不同的 PDF 格式和版本?
是的,IronPDF 專為處理各種 PDF 格式與版本而設計,確保在批次處理任務中具備相容性與靈活性。

