IronPDF 教程 人工智慧驅動的PDF處理 使用 IronPDF 在 C# 中實現 AI 驅動的 PDF 處理:摘要、提取和分析文檔 Ahmad Sohail 更新:2026年2月4日 下載 IronPDF NuGet 下載 DLL 下載 Windows 安裝程式 開始免費試用 法學碩士副本 法學碩士副本 將頁面複製為 Markdown 格式,用於 LLMs 在 ChatGPT 中打開 請向 ChatGPT 諮詢此頁面 在雙子座打開 請向 Gemini 詢問此頁面 在 Grok 中打開 向 Grok 詢問此頁面 打開困惑 向 Perplexity 詢問有關此頁面的信息 分享 在 Facebook 上分享 分享到 X(Twitter) 在 LinkedIn 上分享 複製連結 電子郵件文章 This article was translated from English: Does it need improvement? Translated View the article in English 使用 IronPDF 在C#中進行 AI 驅動的 PDF 處理, .NET 開發人員可以直接在現有的 PDF 工作流程之上總結文件、提取結構化資料並建立問答系統——使用基於Microsoft Semantic Kernel構建的IronPdf.Extensions.AI套件與Azure OpenAI和OpenAI模型無縫連接。 無論您是建立法律發現工具、財務分析流程或文件智慧平台,IronPDF 都能處理 PDF 擷取和上下文準備,讓您可以專注於 AI 邏輯。 TL;DR:快速入門指南 本教學課程介紹如何在 C# .NET 中將 IronPDF 連接到 AI 服務,以實現文件摘要、資料擷取和智慧查詢。 -適用對象:建立文件智慧型應用程式的 .NET 開發人員-法律發現系統、財務分析工具、合規性審查平台,或任何需要從大量 PDF 文件中提取意義的應用程式。 -你將建立的功能:單一文件摘要、使用自訂模式的結構化 JSON 資料擷取、文件內容問答、長文件的 RAG 管道以及跨文件庫的批次 AI 處理工作流程。 -運行環境:任何具有 Azure OpenAI 或 OpenAI API 金鑰的 .NET 6+ 環境。 AI 擴充功能與 Microsoft Semantic Kernel 集成,可自動處理上下文視窗管理、分塊和編排。 -何時使用此方法:當您的應用程式需要處理 PDF 而不只是提取文字時——了解合約義務、總結研究論文、將財務表格提取為結構化數據,或大規模回答用戶關於文件內容的問題。 -從技術角度來看,這很重要:原始文字擷取會遺失文件結構-表格會崩塌,多列佈局會破壞,語意關係會消失。 IronPDF 透過保留結構和管理標記限制來準備 AI 使用的文檔,以便模型能夠接收乾淨、組織良好的輸入。 只需幾行程式碼即可產生 PDF 文件摘要: 立即開始使用 NuGet 建立 PDF 檔案: 使用 NuGet 套件管理器安裝 IronPDF PM > Install-Package IronPdf 複製並運行這段程式碼。 await IronPdf.AI.PdfAIEngine.Summarize("contract.pdf", "summary.txt", azureEndpoint, azureApiKey); 部署到您的生產環境進行測試 立即開始在您的專案中使用 IronPDF,免費試用! 免費試用30天 購買或註冊 IronPDF 的 30 天試用版後,請在應用程式開始時新增您的授權金鑰。 IronPdf.License.LicenseKey = "KEY"; IronPdf.License.LicenseKey = "KEY"; Imports IronPdf IronPdf.License.LicenseKey = "KEY" $vbLabelText $csharpLabel !{--010011000100100101000010010100100100000101010010010110010101111101001110010101010101010101010101010101010101010 0100010111110100100101001101010100010000010100110001001100010111110100001001001100010011110010101010 as-heading:2(目錄) -人工智慧+PDF的機遇 IronPDF 的內建 AI 集成 -文檔摘要 智慧資料擷取 -透過文件進行問答 -批量人工智慧處理 -實際應用案例 故障排除和技術支持 人工智慧+PDF的機遇 為什麼 PDF 是最大的未開發資料來源 PDF檔案是現代企業中結構化商業知識的最大儲存庫之一。專業文件——合約、財務報表、合規報告、法律摘要和研究論文——主要以PDF格式儲存。 這些文件包含重要的商業情報:定義義務和責任的合約條款、驅動投資決策的財務指標、確保合規性的監管要求以及指導策略的研究成果。 然而,傳統的 PDF 處理方法存在嚴重的限制。 基本的文字擷取工具可以從頁面中提取原始字符,但會失去關鍵的上下文:表格結構會坍塌成亂碼,多列佈局會變得毫無意義,各部分之間的語義關係也會消失。 這項突破源自於人工智慧理解上下文和結構的能力。 現代法學碩士不僅能看懂文字,還能理解文件的組織結構,辨識合約條款或財務表格等模式,甚至可以從複雜的版面中提取意義。 GPT-5 的統一推理系統及其即時路由功能,以及 Claude Sonnet 4.5 的增強型代理功能,與早期模型相比,都顯著降低了幻覺率,使其能夠可靠地進行專業文件分析。 法學碩士如何理解文件結構 大型語言模型為PDF分析帶來了先進的自然語言處理能力。 GPT-5 的混合架構具有多個子模型(主模型、迷你模型、思考模型、奈米模型),並配有即時路由器,可根據任務的複雜性動態選擇最佳變體——簡單的問題會路由到速度更快的模型,而複雜的推理任務則會啟用完整的模型。 Claude Opus 4.6 特別擅長長時間運行的代理任務,其代理團隊可以直接協調分段作業,並且擁有 100 萬個標記的上下文窗口,可以處理整個文檔庫而無需分塊。 AI模型如何分析PDF文件結構並識別元素 這種對情境的理解使法學碩士能夠完成需要真正理解的任務。 在分析合約時,法學碩士不僅可以識別包含"終止"一詞的條款,還可以了解允許終止的具體條件、涉及的通知要求以及由此產生的責任。 實現此功能的技術基礎是驅動現代語言學習模型的Transformer架構,其中GPT-5的上下文視窗支援多達272,000個輸入標記,而Claude Sonnet 4.5的200K標記視窗提供了全面的文件覆蓋。 IronPDF 的內建 AI 集成 安裝 IronPDF 和 AI 擴充程式 要開始使用 AI 驅動的 PDF 處理,需要 IronPDF 核心庫、AI 擴充包和 Microsoft Semantic Kernel 相依性。 使用 NuGet 套件管理器安裝 IronPDF: PM > Install-Package IronPdf PM > Install-Package IronPdf.Extensions.AI PM > Install-Package Microsoft.SemanticKernel PM > Install-Package Microsoft.SemanticKernel.Plugins.Memory PM > Install-Package IronPdf PM > Install-Package IronPdf.Extensions.AI PM > Install-Package Microsoft.SemanticKernel PM > Install-Package Microsoft.SemanticKernel.Plugins.Memory SHELL 這些軟體包協同工作,提供完整的解決方案。 IronPDF 處理所有與 PDF 相關的操作——文字擷取、頁面渲染、格式轉換——而 AI 擴充功能則透過 Microsoft Semantic Kernel 管理與語言模型的整合。 [{i:(語意核心軟體包包含實驗性 API。添加<NoWarn>$(NoWarn);SKEXP0001;SKEXP0010;SKEXP0050</NoWarn>將以下內容新增至您的 .csproj 檔案中的 PropertyGroup 以抑制編譯器警告。 設定您的 OpenAI/Azure API 金鑰 在使用人工智慧功能之前,您需要配置對人工智慧服務提供者的存取權限。 IronPDF 的 AI 擴充功能同時支援 OpenAI 和 Azure OpenAI。 Azure OpenAI 通常是企業應用程式的首選,因為它提供了增強的安全功能、合規性認證以及將資料保留在特定地理區域內的能力。 若要設定 Azure OpenAI,您需要從 Azure 入口網站取得 Azure 終結點 URL、API 金鑰以及聊天模型和嵌入模型的部署名稱。 初始化人工智慧引擎 IronPDF 的 AI 擴充元件底層使用了 Microsoft Semantic Kernel。 在使用任何 AI 功能之前,必須使用 Azure OpenAI 憑證初始化內核,並配置用於文件處理的記憶體儲存。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/configure-azure-credentials.cs // Initialize IronPDF AI with Azure OpenAI credentials using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel with Azure OpenAI var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); // Create memory store for document embeddings var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); // Initialize IronPDF AI IronDocumentAI.Initialize(kernel, memory); Console.WriteLine("IronPDF AI initialized successfully with Azure OpenAI"); Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel with Azure OpenAI Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() ' Create memory store for document embeddings Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() ' Initialize IronPDF AI IronDocumentAI.Initialize(kernel, memory) Console.WriteLine("IronPDF AI initialized successfully with Azure OpenAI") $vbLabelText $csharpLabel 初始化過程會建立兩個關鍵元件: -核心:透過 Azure OpenAI 處理聊天補全和文字嵌入生成 -記憶體:儲存用於語義搜尋和檢索操作的文件嵌入 使用IronDocumentAI.Initialize()初始化後,您就可以在整個應用程式中使用 AI 功能。 對於生產環境應用,強烈建議將憑證儲存在環境變數或 Azure Key Vault 中。 IronPDF 如何為 AI 環境準備 PDF 文件 AI驅動的PDF處理中最具挑戰性的方面之一是準備語言模型可以使用的文件。 雖然 GPT-5 支援高達 272,000 個輸入標記,而 Claude Opus 4.6 現在提供了 100 萬個標記的上下文窗口,但單一法律合約或財務報告仍然很容易超過舊模型的限制。 IronPDF 的 AI 擴充功能透過智慧型文件準備來處理這種複雜性。 當您呼叫 AI 方法時,IronPDF 首先從 PDF 中提取文本,同時保留結構資訊——識別段落、保留表格結構並保持各部分之間的關係。 對於超出上下文限制的文檔,IronPDF 會在語義斷點處實施策略性分塊-文檔結構中的自然劃分,例如章節標題、分頁符號或段落邊界。 文檔摘要 單一文件摘要 文件摘要透過將冗長的文件濃縮成易於理解的見解,從而提供即時價值。 Summarize方法處理整個工作流程:提取文字、準備供 AI 使用、向語言模型請求摘要以及保存結果。 輸入 程式碼使用PdfDocument.FromFile()載入 PDF,並呼叫pdf.Summarize()產生簡潔的摘要,然後將結果儲存到文字檔案中。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/single-document-summary.cs // Summarize a PDF document using IronPDF AI using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); // Load and summarize PDF var pdf = PdfDocument.FromFile("sample-report.pdf"); string summary = await pdf.Summarize(); Console.WriteLine("Document Summary:"); Console.WriteLine(summary); File.WriteAllText("report-summary.txt", summary); Console.WriteLine("\nSummary saved to report-summary.txt"); Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) ' Load and summarize PDF Dim pdf = PdfDocument.FromFile("sample-report.pdf") Dim summary As String = Await pdf.Summarize() Console.WriteLine("Document Summary:") Console.WriteLine(summary) File.WriteAllText("report-summary.txt", summary) Console.WriteLine(vbCrLf & "Summary saved to report-summary.txt") $vbLabelText $csharpLabel 控制台輸出 ! 控制台輸出顯示 C# 中的 PDF 文件摘要結果 摘要生成過程採用複雜的提示機制,以確保高品質的結果。 2026 年,GPT-5 和 Claude Sonnet 4.5 都具有顯著改進的指令追蹤能力,確保摘要能夠捕捉到關鍵訊息,同時保持簡潔易讀。 有關文件摘要技術和進階選項的更詳細解釋,請參閱我們的操作指南。 多文檔綜合 許多現實場景需要綜合多個文件中的資訊。 法律團隊可能需要找出合約組合中的共同條款,或者財務分析師可能想要比較季度報告中的各項指標。 多文檔綜合方法包括分別處理每個文件以提取關鍵訊息,然後將這些見解匯總以進行最終綜合。 此範例遍歷多個 PDF,對每個 PDF 呼叫pdf.Summarize() ,然後使用pdf.Query()和合併後的摘要產生統一的綜合報告。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/multi-document-synthesis.cs // Synthesize insights across multiple related documents (e.g., quarterly reports into annual summary) using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); // Define documents to synthesize string[] documentPaths = { "Q1-report.pdf", "Q2-report.pdf", "Q3-report.pdf", "Q4-report.pdf" }; var documentSummaries = new List<string>(); // Summarize each document foreach (string path in documentPaths) { var pdf = PdfDocument.FromFile(path); string summary = await pdf.Summarize(); documentSummaries.Add($"=== {Path.GetFileName(path)} ===\n{summary}"); Console.WriteLine($"Processed: {path}"); } // Combine and synthesize across all documents string combinedSummaries = string.Join("\n\n", documentSummaries); var synthesisDoc = PdfDocument.FromFile(documentPaths[0]); string synthesisQuery = @"Based on the quarterly summaries below, provide an annual synthesis: 1. Overall trends across quarters 2. Key achievements and challenges 3. Year-over-year patterns Summaries: " + combinedSummaries; string synthesis = await synthesisDoc.Query(synthesisQuery); Console.WriteLine("\n=== Annual Synthesis ==="); Console.WriteLine(synthesis); File.WriteAllText("annual-synthesis.txt", synthesis); Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI Imports System.IO ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) ' Define documents to synthesize Dim documentPaths As String() = { "Q1-report.pdf", "Q2-report.pdf", "Q3-report.pdf", "Q4-report.pdf" } Dim documentSummaries = New List(Of String)() ' Summarize each document For Each path As String In documentPaths Dim pdf = PdfDocument.FromFile(path) Dim summary As String = Await pdf.Summarize() documentSummaries.Add($"=== {Path.GetFileName(path)} ==={vbCrLf}{summary}") Console.WriteLine($"Processed: {path}") Next ' Combine and synthesize across all documents Dim combinedSummaries As String = String.Join(vbCrLf & vbCrLf, documentSummaries) Dim synthesisDoc = PdfDocument.FromFile(documentPaths(0)) Dim synthesisQuery As String = "Based on the quarterly summaries below, provide an annual synthesis:" & vbCrLf & "1. Overall trends across quarters" & vbCrLf & "2. Key achievements and challenges" & vbCrLf & "3. Year-over-year patterns" & vbCrLf & vbCrLf & "Summaries:" & vbCrLf & combinedSummaries Dim synthesis As String = Await synthesisDoc.Query(synthesisQuery) Console.WriteLine(vbCrLf & "=== Annual Synthesis ===") Console.WriteLine(synthesis) File.WriteAllText("annual-synthesis.txt", synthesis) $vbLabelText $csharpLabel 這種模式可以有效地擴展到大型文件集。 透過並行處理文檔和管理中間結果,您可以分析數百份文檔,同時保持連貫的綜合分析。 執行摘要生成 執行摘要需要採用與標準摘要不同的方法。 執行摘要不應只是簡單地濃縮內容,而應找出最重要的業務信息,突出關鍵決策或建議,並以適合領導層審查的形式呈現調查結果。 程式碼使用pdf.Query()函數,透過結構化的提示,以商業語言請求關鍵決策、重要發現、財務影響和風險評估。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/executive-summary.cs // Generate executive summary from strategic documents for C-suite leadership using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); var pdf = PdfDocument.FromFile("strategic-plan.pdf"); string executiveQuery = @"Create an executive summary for C-suite leadership. Include: **Key Decisions Required:** - List any decisions needing executive approval **Critical Findings:** - Top 3-5 most important findings (bullet points) **Financial Impact:** - Revenue/cost implications if mentioned **Risk Assessment:** - High-priority risks identified **Recommended Actions:** - Immediate next steps Keep under 500 words. Use business language appropriate for board presentation."; string executiveSummary = await pdf.Query(executiveQuery); File.WriteAllText("executive-summary.txt", executiveSummary); Console.WriteLine("Executive summary saved to executive-summary.txt"); Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) Dim pdf = PdfDocument.FromFile("strategic-plan.pdf") Dim executiveQuery As String = "Create an executive summary for C-suite leadership. Include:" & vbCrLf & vbCrLf & "**Key Decisions Required:**" & vbCrLf & "- List any decisions needing executive approval" & vbCrLf & vbCrLf & "**Critical Findings:**" & vbCrLf & "- Top 3-5 most important findings (bullet points)" & vbCrLf & vbCrLf & "**Financial Impact:**" & vbCrLf & "- Revenue/cost implications if mentioned" & vbCrLf & vbCrLf & "**Risk Assessment:**" & vbCrLf & "- High-priority risks identified" & vbCrLf & vbCrLf & "**Recommended Actions:**" & vbCrLf & "- Immediate next steps" & vbCrLf & vbCrLf & "Keep under 500 words. Use business language appropriate for board presentation." Dim executiveSummary As String = Await pdf.Query(executiveQuery) File.WriteAllText("executive-summary.txt", executiveSummary) Console.WriteLine("Executive summary saved to executive-summary.txt") $vbLabelText $csharpLabel 最終形成的執行摘要優先考慮可操作的信息,而不是全面的報道,準確地向決策者提供他們所需的信息,而不會提供過多的細節。 智慧資料擷取 將結構化資料提取為 JSON AI驅動的PDF處理最強大的應用之一是從非結構化文件中提取結構化資料。 2026 年成功進行結構化擷取的關鍵在於使用具有結構化輸出模式的 JSON 模式。 GPT-5 引入了改進的結構化輸出,而 Claude Sonnet 4.5 提供了增強的工具編排,以實現可靠的資料擷取。 輸入 程式碼使用 JSON 模式提示呼叫pdf.Query() ,然後使用JsonSerializer.Deserialize()解析和驗證提取的發票資料。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/extract-invoice-json.cs // Extract structured invoice data as JSON from PDF using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; using System.Text.Json; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); var pdf = PdfDocument.FromFile("sample-invoice.pdf"); // Define JSON schema for extraction string extractionQuery = @"Extract invoice data and return as JSON with this exact structure: { ""invoiceNumber"": ""string"", ""invoiceDate"": ""YYYY-MM-DD"", ""dueDate"": ""YYYY-MM-DD"", ""vendor"": { ""name"": ""string"", ""address"": ""string"", ""taxId"": ""string or null"" }, ""customer"": { ""name"": ""string"", ""address"": ""string"" }, ""lineItems"": [ { ""description"": ""string"", ""quantity"": number, ""unitPrice"": number, ""total"": number } ], ""subtotal"": number, ""taxRate"": number, ""taxAmount"": number, ""total"": number, ""currency"": ""string"" } Return ONLY valid JSON, no additional text."; string jsonResponse = await pdf.Query(extractionQuery); // Parse and save JSON try { var invoiceData = JsonSerializer.Deserialize<JsonElement>(jsonResponse); string formattedJson = JsonSerializer.Serialize(invoiceData, new JsonSerializerOptions { WriteIndented = true }); Console.WriteLine("Extracted Invoice Data:"); Console.WriteLine(formattedJson); File.WriteAllText("invoice-data.json", formattedJson); } catch (JsonException) { Console.WriteLine("Unable to parse JSON response"); File.WriteAllText("invoice-raw-response.txt", jsonResponse); } Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI Imports System.Text.Json ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) Dim pdf = PdfDocument.FromFile("sample-invoice.pdf") ' Define JSON schema for extraction Dim extractionQuery As String = "Extract invoice data and return as JSON with this exact structure:" & vbCrLf & _ "{" & vbCrLf & _ " ""invoiceNumber"": ""string""," & vbCrLf & _ " ""invoiceDate"": ""YYYY-MM-DD""," & vbCrLf & _ " ""dueDate"": ""YYYY-MM-DD""," & vbCrLf & _ " ""vendor"": {" & vbCrLf & _ " ""name"": ""string""," & vbCrLf & _ " ""address"": ""string""," & vbCrLf & _ " ""taxId"": ""string or null""" & vbCrLf & _ " }," & vbCrLf & _ " ""customer"": {" & vbCrLf & _ " ""name"": ""string""," & vbCrLf & _ " ""address"": ""string""" & vbCrLf & _ " }," & vbCrLf & _ " ""lineItems"": [" & vbCrLf & _ " {" & vbCrLf & _ " ""description"": ""string""," & vbCrLf & _ " ""quantity"": number," & vbCrLf & _ " ""unitPrice"": number," & vbCrLf & _ " ""total"": number" & vbCrLf & _ " }" & vbCrLf & _ " ]," & vbCrLf & _ " ""subtotal"": number," & vbCrLf & _ " ""taxRate"": number," & vbCrLf & _ " ""taxAmount"": number," & vbCrLf & _ " ""total"": number," & vbCrLf & _ " ""currency"": ""string""" & vbCrLf & _ "}" & vbCrLf & _ vbCrLf & _ "Return ONLY valid JSON, no additional text." Dim jsonResponse As String = Await pdf.Query(extractionQuery) ' Parse and save JSON Try Dim invoiceData = JsonSerializer.Deserialize(Of JsonElement)(jsonResponse) Dim formattedJson As String = JsonSerializer.Serialize(invoiceData, New JsonSerializerOptions With {.WriteIndented = True}) Console.WriteLine("Extracted Invoice Data:") Console.WriteLine(formattedJson) File.WriteAllText("invoice-data.json", formattedJson) Catch ex As JsonException Console.WriteLine("Unable to parse JSON response") File.WriteAllText("invoice-raw-response.txt", jsonResponse) End Try $vbLabelText $csharpLabel 生成的 JSON 檔案的部分螢幕截圖 從 PDF 提取結構化 JSON 格式的發票數據 2026 年的現代人工智慧模型支援結構化輸出模式,可確保產生符合所提供模式的有效 JSON 回應。 這樣就無需對格式錯誤的回應進行複雜的錯誤處理。 合約條款識別 法律合約包含一些具有特殊重要性的特定條款:終止條款、責任限制、賠償要求、智慧財產權轉讓和保密義務。 人工智慧驅動的條款辨識技術可自動完成此分析,同時保持高精度。 本範例使用pdf.Query()和以條款為中心的 JSON 模式來提取合約類型、參與者、關鍵日期以及具有風險等級的各個條款。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/contract-clause-analysis.cs // Analyze contract clauses and identify key terms, risks, and critical dates using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; using System.Text.Json; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); var pdf = PdfDocument.FromFile("contract.pdf"); // Define JSON schema for contract analysis string clauseQuery = @"Analyze this contract and identify key clauses. Return JSON: { ""contractType"": ""string"", ""parties"": [""string""], ""effectiveDate"": ""string"", ""clauses"": [ { ""type"": ""Termination|Liability|Indemnification|Confidentiality|IP|Payment|Warranty|Other"", ""title"": ""string"", ""summary"": ""string"", ""riskLevel"": ""Low|Medium|High"", ""keyTerms"": [""string""] } ], ""criticalDates"": [ { ""description"": ""string"", ""date"": ""string"" } ], ""overallRiskAssessment"": ""Low|Medium|High"", ""recommendations"": [""string""] } Focus on: termination rights, liability caps, indemnification, IP ownership, confidentiality, payment terms. Return ONLY valid JSON."; string analysisJson = await pdf.Query(clauseQuery); try { var analysis = JsonSerializer.Deserialize<JsonElement>(analysisJson); string formatted = JsonSerializer.Serialize(analysis, new JsonSerializerOptions { WriteIndented = true }); Console.WriteLine("Contract Clause Analysis:"); Console.WriteLine(formatted); File.WriteAllText("contract-analysis.json", formatted); // Display high-risk clauses Console.WriteLine("\n=== High Risk Clauses ==="); foreach (var clause in analysis.GetProperty("clauses").EnumerateArray()) { if (clause.GetProperty("riskLevel").GetString() == "High") { Console.WriteLine($"- {clause.GetProperty("type")}: {clause.GetProperty("summary")}"); } } } catch (JsonException) { Console.WriteLine("Unable to parse contract analysis"); File.WriteAllText("contract-analysis-raw.txt", analysisJson); } Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI Imports System.Text.Json ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) Dim pdf = PdfDocument.FromFile("contract.pdf") ' Define JSON schema for contract analysis Dim clauseQuery As String = "Analyze this contract and identify key clauses. Return JSON: { ""contractType"": ""string"", ""parties"": [""string""], ""effectiveDate"": ""string"", ""clauses"": [ { ""type"": ""Termination|Liability|Indemnification|Confidentiality|IP|Payment|Warranty|Other"", ""title"": ""string"", ""summary"": ""string"", ""riskLevel"": ""Low|Medium|High"", ""keyTerms"": [""string""] } ], ""criticalDates"": [ { ""description"": ""string"", ""date"": ""string"" } ], ""overallRiskAssessment"": ""Low|Medium|High"", ""recommendations"": [""string""] } Focus on: termination rights, liability caps, indemnification, IP ownership, confidentiality, payment terms. Return ONLY valid JSON." Dim analysisJson As String = Await pdf.Query(clauseQuery) Try Dim analysis = JsonSerializer.Deserialize(Of JsonElement)(analysisJson) Dim formatted As String = JsonSerializer.Serialize(analysis, New JsonSerializerOptions With {.WriteIndented = True}) Console.WriteLine("Contract Clause Analysis:") Console.WriteLine(formatted) File.WriteAllText("contract-analysis.json", formatted) ' Display high-risk clauses Console.WriteLine(vbCrLf & "=== High Risk Clauses ===") For Each clause In analysis.GetProperty("clauses").EnumerateArray() If clause.GetProperty("riskLevel").GetString() = "High" Then Console.WriteLine($"- {clause.GetProperty("type")}: {clause.GetProperty("summary")}") End If Next Catch ex As JsonException Console.WriteLine("Unable to parse contract analysis") File.WriteAllText("contract-analysis-raw.txt", analysisJson) End Try $vbLabelText $csharpLabel 這項功能將合約審查從順序的、手動的流程轉變為自動化的、可擴展的工作流程。 法律團隊可以快速識別數百份合約中的高風險條款。 金融數據解析 財務文件包含嵌入在複雜敘述和表格中的關鍵定量資料。 人工智慧解析在財務文件方面表現出色,因為它理解上下文——區分歷史結果和未來預測,識別數字是以千還是百萬為單位,並理解不同指標之間的關係。 程式碼使用pdf.Query()和財務 JSON 模式,將損益表資料、資產負債表指標和未來指引提取到結構化輸出。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/financial-data-extraction.cs // Extract financial metrics from annual reports and earnings documents using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; using System.Text.Json; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); var pdf = PdfDocument.FromFile("annual-report.pdf"); // Define JSON schema for financial extraction (numbers in millions) string financialQuery = @"Extract financial metrics from this document. Return JSON: { ""reportPeriod"": ""string"", ""company"": ""string"", ""currency"": ""string"", ""incomeStatement"": { ""revenue"": number, ""costOfRevenue"": number, ""grossProfit"": number, ""operatingExpenses"": number, ""operatingIncome"": number, ""netIncome"": number, ""eps"": number }, ""balanceSheet"": { ""totalAssets"": number, ""totalLiabilities"": number, ""shareholdersEquity"": number, ""cash"": number, ""totalDebt"": number }, ""keyMetrics"": { ""revenueGrowthYoY"": ""string"", ""grossMargin"": ""string"", ""operatingMargin"": ""string"", ""netMargin"": ""string"", ""debtToEquity"": number }, ""guidance"": { ""nextQuarterRevenue"": ""string"", ""fullYearRevenue"": ""string"", ""notes"": ""string"" } } Use null for unavailable data. Numbers in millions unless stated. Return ONLY valid JSON."; string financialJson = await pdf.Query(financialQuery); try { var financials = JsonSerializer.Deserialize<JsonElement>(financialJson); string formatted = JsonSerializer.Serialize(financials, new JsonSerializerOptions { WriteIndented = true }); Console.WriteLine("Extracted Financial Data:"); Console.WriteLine(formatted); File.WriteAllText("financial-data.json", formatted); } catch (JsonException) { Console.WriteLine("Unable to parse financial data"); File.WriteAllText("financial-raw.txt", financialJson); } Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI Imports System.Text.Json ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) Dim pdf = PdfDocument.FromFile("annual-report.pdf") ' Define JSON schema for financial extraction (numbers in millions) Dim financialQuery As String = "Extract financial metrics from this document. Return JSON: { ""reportPeriod"": ""string"", ""company"": ""string"", ""currency"": ""string"", ""incomeStatement"": { ""revenue"": number, ""costOfRevenue"": number, ""grossProfit"": number, ""operatingExpenses"": number, ""operatingIncome"": number, ""netIncome"": number, ""eps"": number }, ""balanceSheet"": { ""totalAssets"": number, ""totalLiabilities"": number, ""shareholdersEquity"": number, ""cash"": number, ""totalDebt"": number }, ""keyMetrics"": { ""revenueGrowthYoY"": ""string"", ""grossMargin"": ""string"", ""operatingMargin"": ""string"", ""netMargin"": ""string"", ""debtToEquity"": number }, ""guidance"": { ""nextQuarterRevenue"": ""string"", ""fullYearRevenue"": ""string"", ""notes"": ""string"" } } Use null for unavailable data. Numbers in millions unless stated. Return ONLY valid JSON." Dim financialJson As String = Await pdf.Query(financialQuery) Try Dim financials = JsonSerializer.Deserialize(Of JsonElement)(financialJson) Dim formatted As String = JsonSerializer.Serialize(financials, New JsonSerializerOptions With {.WriteIndented = True}) Console.WriteLine("Extracted Financial Data:") Console.WriteLine(formatted) File.WriteAllText("financial-data.json", formatted) Catch ex As JsonException Console.WriteLine("Unable to parse financial data") File.WriteAllText("financial-raw.txt", financialJson) End Try $vbLabelText $csharpLabel 提取的結構化資料可以直接輸入到財務模型、時間序列資料庫或分析平台中,從而實現跨報告期間的指標自動追蹤。 自訂提取提示 許多組織根據其特定領域、文件格式或業務流程,都有獨特的提取需求。 IronPDF 的 AI 整合完全支援自訂擷取提示,讓您可以精確定義要擷取哪些資訊以及如何組織這些資訊。 本範例示範如何使用以研究為中心的模式提取方法使用pdf.Query() ,並從學術論文中得出關鍵發現及其置信度和局限性。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/custom-research-extraction.cs // Extract structured research metadata from academic papers using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; using System.Text.Json; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); var pdf = PdfDocument.FromFile("research-paper.pdf"); // Define JSON schema for research paper extraction string researchQuery = @"Extract structured information from this research paper. Return JSON: { ""title"": ""string"", ""authors"": [""string""], ""institution"": ""string"", ""publicationDate"": ""string"", ""abstract"": ""string"", ""researchQuestion"": ""string"", ""methodology"": { ""type"": ""Quantitative|Qualitative|Mixed Methods"", ""approach"": ""string"", ""sampleSize"": ""string"", ""dataCollection"": ""string"" }, ""keyFindings"": [ { ""finding"": ""string"", ""significance"": ""string"", ""confidence"": ""High|Medium|Low"" } ], ""limitations"": [""string""], ""futureWork"": [""string""], ""keywords"": [""string""] } Focus on extracting verifiable claims and noting uncertainty. Return ONLY valid JSON."; string extractionResult = await pdf.Query(researchQuery); try { var research = JsonSerializer.Deserialize<JsonElement>(extractionResult); string formatted = JsonSerializer.Serialize(research, new JsonSerializerOptions { WriteIndented = true }); Console.WriteLine("Research Paper Extraction:"); Console.WriteLine(formatted); File.WriteAllText("research-extraction.json", formatted); // Display key findings with confidence levels Console.WriteLine("\n=== Key Findings ==="); foreach (var finding in research.GetProperty("keyFindings").EnumerateArray()) { string confidence = finding.GetProperty("confidence").GetString() ?? "Unknown"; Console.WriteLine($"[{confidence}] {finding.GetProperty("finding")}"); } } catch (JsonException) { Console.WriteLine("Unable to parse research extraction"); File.WriteAllText("research-raw.txt", extractionResult); } Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI Imports System.Text.Json ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) Dim pdf = PdfDocument.FromFile("research-paper.pdf") ' Define JSON schema for research paper extraction Dim researchQuery As String = "Extract structured information from this research paper. Return JSON: { ""title"": ""string"", ""authors"": [""string""], ""institution"": ""string"", ""publicationDate"": ""string"", ""abstract"": ""string"", ""researchQuestion"": ""string"", ""methodology"": { ""type"": ""Quantitative|Qualitative|Mixed Methods"", ""approach"": ""string"", ""sampleSize"": ""string"", ""dataCollection"": ""string"" }, ""keyFindings"": [ { ""finding"": ""string"", ""significance"": ""string"", ""confidence"": ""High|Medium|Low"" } ], ""limitations"": [""string""], ""futureWork"": [""string""], ""keywords"": [""string""] } Focus on extracting verifiable claims and noting uncertainty. Return ONLY valid JSON." Dim extractionResult As String = Await pdf.Query(researchQuery) Try Dim research = JsonSerializer.Deserialize(Of JsonElement)(extractionResult) Dim formatted As String = JsonSerializer.Serialize(research, New JsonSerializerOptions With {.WriteIndented = True}) Console.WriteLine("Research Paper Extraction:") Console.WriteLine(formatted) File.WriteAllText("research-extraction.json", formatted) ' Display key findings with confidence levels Console.WriteLine(vbCrLf & "=== Key Findings ===") For Each finding In research.GetProperty("keyFindings").EnumerateArray() Dim confidence As String = finding.GetProperty("confidence").GetString() OrElse "Unknown" Console.WriteLine($"[{confidence}] {finding.GetProperty("finding")}") Next Catch ex As JsonException Console.WriteLine("Unable to parse research extraction") File.WriteAllText("research-raw.txt", extractionResult) End Try $vbLabelText $csharpLabel 自訂提示將人工智慧驅動的提取功能從通用工具轉變為根據您的特定需求量身定制的專業解決方案。 透過文件進行問答 建構PDF問答系統 問答系統使用戶能夠以對話的方式與 PDF 文件進行交互,用自然語言提出問題並獲得準確的、上下文相關的答案。 基本模式包括從 PDF 中提取文本,將其與用戶的問題結合成提示,並向 AI 請求答案。 輸入 程式碼呼叫pdf.Memorize()對文件進行索引以進行語義搜索,然後使用pdf.Query()進入互動式循環來回答使用者問題。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/pdf-question-answering.cs // Interactive Q&A system for querying PDF documents using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); var pdf = PdfDocument.FromFile("sample-legal-document.pdf"); // Memorize document to enable persistent querying await pdf.Memorize(); Console.WriteLine("PDF Q&A System - Type 'exit' to quit\n"); Console.WriteLine($"Document loaded and memorized: {pdf.PageCount} pages\n"); // Interactive Q&A loop while (true) { Console.Write("Your question: "); string? question = Console.ReadLine(); if (string.IsNullOrWhiteSpace(question) || question.ToLower() == "exit") break; string answer = await pdf.Query(question); Console.WriteLine($"\nAnswer: {answer}\n"); Console.WriteLine(new string('-', 50) + "\n"); } Console.WriteLine("Q&A session ended."); Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) Dim pdf = PdfDocument.FromFile("sample-legal-document.pdf") ' Memorize document to enable persistent querying Await pdf.Memorize() Console.WriteLine("PDF Q&A System - Type 'exit' to quit" & vbCrLf) Console.WriteLine($"Document loaded and memorized: {pdf.PageCount} pages" & vbCrLf) ' Interactive Q&A loop While True Console.Write("Your question: ") Dim question As String = Console.ReadLine() If String.IsNullOrWhiteSpace(question) OrElse question.ToLower() = "exit" Then Exit While End If Dim answer As String = Await pdf.Query(question) Console.WriteLine($"{vbCrLf}Answer: {answer}{vbCrLf}") Console.WriteLine(New String("-"c, 50) & vbCrLf) End While Console.WriteLine("Q&A session ended.") $vbLabelText $csharpLabel 控制台輸出 ! C# 中的 PDF 問答系統控制台輸出 2026 年實現有效問答的關鍵在於限制人工智慧只能根據文件內容進行回答。 GPT-5 的"安全完成"訓練方法和 Claude Sonnet 4.5 的改進對齊大大降低了幻覺發生率。 將長文檔分塊以適應上下文視窗 大多數現實世界的文檔都超出了人工智慧的上下文視窗。 有效的分塊策略對於處理這些文件至關重要。 分塊是指將文件分割成足夠小的片段,使其能夠適應上下文窗口,同時保持語義連貫性。 此程式碼遍歷pdf.Pages ,建立DocumentChunk對象,並配置maxChunkTokens和overlapTokens以實現上下文連續性。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/semantic-document-chunking.cs // Split long documents into overlapping chunks for RAG systems using IronPdf; var pdf = PdfDocument.FromFile("long-document.pdf"); // Chunking configuration int maxChunkTokens = 4000; // Leave room for prompts and responses int overlapTokens = 200; // Overlap for context continuity int approxCharsPerToken = 4; // Rough estimate for tokenization int maxChunkChars = maxChunkTokens * approxCharsPerToken; int overlapChars = overlapTokens * approxCharsPerToken; var chunks = new List<DocumentChunk>(); var currentChunk = new System.Text.StringBuilder(); int chunkStartPage = 1; int currentPage = 1; for (int i = 0; i < pdf.PageCount; i++) { string pageText = pdf.Pages[i].Text; currentPage = i + 1; if (currentChunk.Length + pageText.Length > maxChunkChars && currentChunk.Length > 0) { chunks.Add(new DocumentChunk { Text = currentChunk.ToString(), StartPage = chunkStartPage, EndPage = currentPage - 1, ChunkIndex = chunks.Count }); // Create overlap with previous chunk for continuity string overlap = currentChunk.Length > overlapChars ? currentChunk.ToString().Substring(currentChunk.Length - overlapChars) : currentChunk.ToString(); currentChunk.Clear(); currentChunk.Append(overlap); chunkStartPage = currentPage - 1; } currentChunk.AppendLine($"\n--- Page {currentPage} ---\n"); currentChunk.Append(pageText); } if (currentChunk.Length > 0) { chunks.Add(new DocumentChunk { Text = currentChunk.ToString(), StartPage = chunkStartPage, EndPage = currentPage, ChunkIndex = chunks.Count }); } Console.WriteLine($"Document chunked into {chunks.Count} segments"); foreach (var chunk in chunks) { Console.WriteLine($" Chunk {chunk.ChunkIndex + 1}: Pages {chunk.StartPage}-{chunk.EndPage} ({chunk.Text.Length} chars)"); } // Save chunk metadata for RAG indexing File.WriteAllText("chunks-metadata.json", System.Text.Json.JsonSerializer.Serialize( chunks.Select(c => new { c.ChunkIndex, c.StartPage, c.EndPage, Length = c.Text.Length }), new System.Text.Json.JsonSerializerOptions { WriteIndented = true } )); public class DocumentChunk { public string Text { get; set; } = ""; public int StartPage { get; set; } public int EndPage { get; set; } public int ChunkIndex { get; set; } } Imports IronPdf Imports System.Text Imports System.Text.Json Imports System.IO ' Split long documents into overlapping chunks for RAG systems Dim pdf = PdfDocument.FromFile("long-document.pdf") ' Chunking configuration Dim maxChunkTokens As Integer = 4000 ' Leave room for prompts and responses Dim overlapTokens As Integer = 200 ' Overlap for context continuity Dim approxCharsPerToken As Integer = 4 ' Rough estimate for tokenization Dim maxChunkChars As Integer = maxChunkTokens * approxCharsPerToken Dim overlapChars As Integer = overlapTokens * approxCharsPerToken Dim chunks As New List(Of DocumentChunk)() Dim currentChunk As New StringBuilder() Dim chunkStartPage As Integer = 1 Dim currentPage As Integer = 1 For i As Integer = 0 To pdf.PageCount - 1 Dim pageText As String = pdf.Pages(i).Text currentPage = i + 1 If currentChunk.Length + pageText.Length > maxChunkChars AndAlso currentChunk.Length > 0 Then chunks.Add(New DocumentChunk With { .Text = currentChunk.ToString(), .StartPage = chunkStartPage, .EndPage = currentPage - 1, .ChunkIndex = chunks.Count }) ' Create overlap with previous chunk for continuity Dim overlap As String = If(currentChunk.Length > overlapChars, currentChunk.ToString().Substring(currentChunk.Length - overlapChars), currentChunk.ToString()) currentChunk.Clear() currentChunk.Append(overlap) chunkStartPage = currentPage - 1 End If currentChunk.AppendLine(vbCrLf & "--- Page " & currentPage & " ---" & vbCrLf) currentChunk.Append(pageText) Next If currentChunk.Length > 0 Then chunks.Add(New DocumentChunk With { .Text = currentChunk.ToString(), .StartPage = chunkStartPage, .EndPage = currentPage, .ChunkIndex = chunks.Count }) End If Console.WriteLine($"Document chunked into {chunks.Count} segments") For Each chunk In chunks Console.WriteLine($" Chunk {chunk.ChunkIndex + 1}: Pages {chunk.StartPage}-{chunk.EndPage} ({chunk.Text.Length} chars)") Next ' Save chunk metadata for RAG indexing File.WriteAllText("chunks-metadata.json", JsonSerializer.Serialize( chunks.Select(Function(c) New With {Key .ChunkIndex = c.ChunkIndex, Key .StartPage = c.StartPage, Key .EndPage = c.EndPage, Key .Length = c.Text.Length}), New JsonSerializerOptions With {.WriteIndented = True} )) Public Class DocumentChunk Public Property Text As String = "" Public Property StartPage As Integer Public Property EndPage As Integer Public Property ChunkIndex As Integer End Class $vbLabelText $csharpLabel PDF文件中固定分塊與語意分塊的比較 重疊的資料塊提供了跨越邊界的連續性,確保即使相關資訊跨越資料塊邊界,人工智慧也能獲得足夠的上下文。 RAG(檢索增強生成)模式 檢索增強生成代表了 2026 年人工智慧驅動的文檔分析的一種強大模式。 RAG 系統不是將整個文件輸入人工智慧,而是先檢索與給定查詢相關的部分,然後將這些部分用作生成答案的上下文。 RAG 工作流程分為三個主要階段:文件準備(分割和建立嵌入)、檢索(搜尋相關區塊)和產生(使用檢索到的區塊作為 AI 回應的上下文)。 程式碼透過對每個 PDF 呼叫pdf.Memorize()來建立多個 PDF 的索引,然後使用pdf.Query()從組合文件記憶體中檢索答案。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/rag-system-implementation.cs // Retrieval-Augmented Generation (RAG) system for querying across multiple indexed documents using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); // Index all documents in folder string[] documentPaths = Directory.GetFiles("documents/", "*.pdf"); Console.WriteLine($"Indexing {documentPaths.Length} documents...\n"); // Memorize each document (creates embeddings for retrieval) foreach (string path in documentPaths) { var pdf = PdfDocument.FromFile(path); await pdf.Memorize(); Console.WriteLine($"Indexed: {Path.GetFileName(path)} ({pdf.PageCount} pages)"); } Console.WriteLine("\n=== RAG System Ready ===\n"); // Query across all indexed documents string query = "What are the key compliance requirements for data retention?"; Console.WriteLine($"Query: {query}\n"); var searchPdf = PdfDocument.FromFile(documentPaths[0]); string answer = await searchPdf.Query(query); Console.WriteLine($"Answer: {answer}"); // Interactive query loop Console.WriteLine("\n--- Enter questions (type 'exit' to quit) ---\n"); while (true) { Console.Write("Question: "); string? userQuery = Console.ReadLine(); if (string.IsNullOrWhiteSpace(userQuery) || userQuery.ToLower() == "exit") break; string response = await searchPdf.Query(userQuery); Console.WriteLine($"\nAnswer: {response}\n"); } Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI Imports System.IO ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) ' Index all documents in folder Dim documentPaths As String() = Directory.GetFiles("documents/", "*.pdf") Console.WriteLine($"Indexing {documentPaths.Length} documents..." & vbCrLf) ' Memorize each document (creates embeddings for retrieval) For Each path As String In documentPaths Dim pdf = PdfDocument.FromFile(path) Await pdf.Memorize() Console.WriteLine($"Indexed: {Path.GetFileName(path)} ({pdf.PageCount} pages)") Next Console.WriteLine(vbCrLf & "=== RAG System Ready ===" & vbCrLf) ' Query across all indexed documents Dim query As String = "What are the key compliance requirements for data retention?" Console.WriteLine($"Query: {query}" & vbCrLf) Dim searchPdf = PdfDocument.FromFile(documentPaths(0)) Dim answer As String = Await searchPdf.Query(query) Console.WriteLine($"Answer: {answer}") ' Interactive query loop Console.WriteLine(vbCrLf & "--- Enter questions (type 'exit' to quit) ---" & vbCrLf) While True Console.Write("Question: ") Dim userQuery As String = Console.ReadLine() If String.IsNullOrWhiteSpace(userQuery) OrElse userQuery.ToLower() = "exit" Then Exit While End If Dim response As String = Await searchPdf.Query(userQuery) Console.WriteLine(vbCrLf & $"Answer: {response}" & vbCrLf) End While $vbLabelText $csharpLabel RAG 系統擅長處理大型文件集合-法律案件資料庫、技術文件庫、研究檔案。 透過僅檢索相關部分,它們在保持回應品質的同時,還能擴展到幾乎無限大的文件大小。 引用PDF頁面中的來源 對於專業應用而言,人工智慧的答案必須是可驗證的。 引用方法涉及在分塊和檢索過程中維護有關分塊來源的元資料。 每個資料塊不僅儲存文字內容,還儲存其來源頁碼、章節標題以及在文件中的位置。 輸入 程式碼使用pdf.Query()和引用說明,然後呼叫ExtractCitedPages()和正規表示式來解析頁面引用,並使用pdf.Pages[pageNum - 1].Text來驗證來源。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/answer-with-citations.cs // Answer questions with page citations and source verification using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; using System.Text.RegularExpressions; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); var pdf = PdfDocument.FromFile("sample-legal-document.pdf"); await pdf.Memorize(); string question = "What are the termination conditions in this agreement?"; // Request citations in query string citationQuery = $@"{question} IMPORTANT: Include specific page citations in your answer using the format (Page X) or (Pages X-Y). Only cite information that appears in the document."; string answerWithCitations = await pdf.Query(citationQuery); Console.WriteLine("Question: " + question); Console.WriteLine("\nAnswer with Citations:"); Console.WriteLine(answerWithCitations); // Extract cited page numbers using regex var citedPages = ExtractCitedPages(answerWithCitations); Console.WriteLine($"\nCited pages: {string.Join(", ", citedPages)}"); // Verify citations with page excerpts Console.WriteLine("\n=== Source Verification ==="); foreach (int pageNum in citedPages.Take(3)) { if (pageNum <= pdf.PageCount && pageNum > 0) { string pageText = pdf.Pages[pageNum - 1].Text; string excerpt = pageText.Length > 200 ? pageText.Substring(0, 200) + "..." : pageText; Console.WriteLine($"\nPage {pageNum} excerpt:\n{excerpt}"); } } // Extract page numbers from citation format (Page X) or (Pages X-Y) List<int> ExtractCitedPages(string text) { var pages = new HashSet<int>(); var matches = Regex.Matches(text, @"\(Pages?\s*(\d+)(?:\s*-\s*(\d+))?\)", RegexOptions.IgnoreCase); foreach (Match match in matches) { int startPage = int.Parse(match.Groups[1].Value); pages.Add(startPage); if (match.Groups[2].Success) { int endPage = int.Parse(match.Groups[2].Value); for (int p = startPage; p <= endPage; p++) pages.Add(p); } } return pages.OrderBy(p => p).ToList(); } Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI Imports System.Text.RegularExpressions ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) Dim pdf = PdfDocument.FromFile("sample-legal-document.pdf") Await pdf.Memorize() Dim question As String = "What are the termination conditions in this agreement?" ' Request citations in query Dim citationQuery As String = $"{question} IMPORTANT: Include specific page citations in your answer using the format (Page X) or (Pages X-Y). Only cite information that appears in the document." Dim answerWithCitations As String = Await pdf.Query(citationQuery) Console.WriteLine("Question: " & question) Console.WriteLine(vbCrLf & "Answer with Citations:") Console.WriteLine(answerWithCitations) ' Extract cited page numbers using regex Dim citedPages = ExtractCitedPages(answerWithCitations) Console.WriteLine(vbCrLf & "Cited pages: " & String.Join(", ", citedPages)) ' Verify citations with page excerpts Console.WriteLine(vbCrLf & "=== Source Verification ===") For Each pageNum As Integer In citedPages.Take(3) If pageNum <= pdf.PageCount AndAlso pageNum > 0 Then Dim pageText As String = pdf.Pages(pageNum - 1).Text Dim excerpt As String = If(pageText.Length > 200, pageText.Substring(0, 200) & "...", pageText) Console.WriteLine(vbCrLf & "Page " & pageNum & " excerpt:" & vbCrLf & excerpt) End If Next ' Extract page numbers from citation format (Page X) or (Pages X-Y) Function ExtractCitedPages(text As String) As List(Of Integer) Dim pages = New HashSet(Of Integer)() Dim matches = Regex.Matches(text, "\((Pages?)\s*(\d+)(?:\s*-\s*(\d+))?\)", RegexOptions.IgnoreCase) For Each match As Match In matches Dim startPage As Integer = Integer.Parse(match.Groups(2).Value) pages.Add(startPage) If match.Groups(3).Success Then Dim endPage As Integer = Integer.Parse(match.Groups(3).Value) For p As Integer = startPage To endPage pages.Add(p) Next End If Next Return pages.OrderBy(Function(p) p).ToList() End Function $vbLabelText $csharpLabel 控制台輸出 控制台輸出顯示 AI 回答及其 PDF 頁面引用。 引用可以將人工智慧產生的答案從不透明的輸出轉化為透明、可驗證的資訊。 使用者可以查看原始資料來驗證答案,並增強對人工智慧輔助分析的信心。 批量人工智慧處理 大規模處理文件庫 企業文件處理通常涉及成千上萬甚至數百萬個PDF文件。 可擴展批量處理的基礎是並行化。 IronPDF 是線程安全的,允許並發處理 PDF 文件而不會相互幹擾。 此程式碼使用可配置maxConcurrency的SemaphoreSlim並行處理 PDF,對每個 PDF 呼叫pdf.Summarize() ,同時追蹤ConcurrentBag中的結果。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/batch-document-processing.cs // Process multiple documents in parallel with rate limiting using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; using System.Collections.Concurrent; using System.Text; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); // Configure parallel processing with rate limiting int maxConcurrency = 3; string inputFolder = "documents/"; string outputFolder = "summaries/"; Directory.CreateDirectory(outputFolder); string[] pdfFiles = Directory.GetFiles(inputFolder, "*.pdf"); Console.WriteLine($"Processing {pdfFiles.Length} documents...\n"); var results = new ConcurrentBag<ProcessingResult>(); var semaphore = new SemaphoreSlim(maxConcurrency); var tasks = pdfFiles.Select(async filePath => { await semaphore.WaitAsync(); var result = new ProcessingResult { FilePath = filePath }; try { var stopwatch = System.Diagnostics.Stopwatch.StartNew(); var pdf = PdfDocument.FromFile(filePath); string summary = await pdf.Summarize(); string outputPath = Path.Combine(outputFolder, Path.GetFileNameWithoutExtension(filePath) + "-summary.txt"); await File.WriteAllTextAsync(outputPath, summary); stopwatch.Stop(); result.Success = true; result.ProcessingTime = stopwatch.Elapsed; result.OutputPath = outputPath; Console.WriteLine($"[OK] {Path.GetFileName(filePath)} ({stopwatch.ElapsedMilliseconds}ms)"); } catch (Exception ex) { result.Success = false; result.ErrorMessage = ex.Message; Console.WriteLine($"[ERROR] {Path.GetFileName(filePath)}: {ex.Message}"); } finally { semaphore.Release(); results.Add(result); } }).ToArray(); await Task.WhenAll(tasks); // Generate processing report var successful = results.Where(r => r.Success).ToList(); var failed = results.Where(r => !r.Success).ToList(); var report = new StringBuilder(); report.AppendLine("=== Batch Processing Report ==="); report.AppendLine($"Successful: {successful.Count}"); report.AppendLine($"Failed: {failed.Count}"); if (successful.Any()) { var avgTime = TimeSpan.FromMilliseconds(successful.Average(r => r.ProcessingTime.TotalMilliseconds)); report.AppendLine($"Average processing time: {avgTime.TotalSeconds:F1}s"); } if (failed.Any()) { report.AppendLine("\nFailed documents:"); foreach (var fail in failed) report.AppendLine($" - {Path.GetFileName(fail.FilePath)}: {fail.ErrorMessage}"); } string reportText = report.ToString(); Console.WriteLine($"\n{reportText}"); File.WriteAllText(Path.Combine(outputFolder, "processing-report.txt"), reportText); class ProcessingResult { public string FilePath { get; set; } = ""; public bool Success { get; set; } public TimeSpan ProcessingTime { get; set; } public string OutputPath { get; set; } = ""; public string ErrorMessage { get; set; } = ""; } Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI Imports System.Collections.Concurrent Imports System.Text ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) ' Configure parallel processing with rate limiting Dim maxConcurrency As Integer = 3 Dim inputFolder As String = "documents/" Dim outputFolder As String = "summaries/" Directory.CreateDirectory(outputFolder) Dim pdfFiles As String() = Directory.GetFiles(inputFolder, "*.pdf") Console.WriteLine($"Processing {pdfFiles.Length} documents..." & vbCrLf) Dim results = New ConcurrentBag(Of ProcessingResult)() Dim semaphore = New SemaphoreSlim(maxConcurrency) Dim tasks = pdfFiles.Select(Async Function(filePath) Await semaphore.WaitAsync() Dim result = New ProcessingResult With {.FilePath = filePath} Try Dim stopwatch = System.Diagnostics.Stopwatch.StartNew() Dim pdf = PdfDocument.FromFile(filePath) Dim summary As String = Await pdf.Summarize() Dim outputPath = Path.Combine(outputFolder, Path.GetFileNameWithoutExtension(filePath) & "-summary.txt") Await File.WriteAllTextAsync(outputPath, summary) stopwatch.Stop() result.Success = True result.ProcessingTime = stopwatch.Elapsed result.OutputPath = outputPath Console.WriteLine($"[OK] {Path.GetFileName(filePath)} ({stopwatch.ElapsedMilliseconds}ms)") Catch ex As Exception result.Success = False result.ErrorMessage = ex.Message Console.WriteLine($"[ERROR] {Path.GetFileName(filePath)}: {ex.Message}") Finally semaphore.Release() results.Add(result) End Try End Function).ToArray() Await Task.WhenAll(tasks) ' Generate processing report Dim successful = results.Where(Function(r) r.Success).ToList() Dim failed = results.Where(Function(r) Not r.Success).ToList() Dim report = New StringBuilder() report.AppendLine("=== Batch Processing Report ===") report.AppendLine($"Successful: {successful.Count}") report.AppendLine($"Failed: {failed.Count}") If successful.Any() Then Dim avgTime = TimeSpan.FromMilliseconds(successful.Average(Function(r) r.ProcessingTime.TotalMilliseconds)) report.AppendLine($"Average processing time: {avgTime.TotalSeconds:F1}s") End If If failed.Any() Then report.AppendLine(vbCrLf & "Failed documents:") For Each fail In failed report.AppendLine($" - {Path.GetFileName(fail.FilePath)}: {fail.ErrorMessage}") Next End If Dim reportText As String = report.ToString() Console.WriteLine(vbCrLf & reportText) File.WriteAllText(Path.Combine(outputFolder, "processing-report.txt"), reportText) Class ProcessingResult Public Property FilePath As String = "" Public Property Success As Boolean Public Property ProcessingTime As TimeSpan Public Property OutputPath As String = "" Public Property ErrorMessage As String = "" End Class $vbLabelText $csharpLabel 大規模應用中,穩健的錯誤處理至關重要。 生產系統採用指數退避重試邏輯、對失敗文件進行單獨的錯誤日誌記錄、可復原處理。 成本管理和代幣使用 AI API的費用通常按代幣收取。 2026 年,GPT-5 的定價為每百萬個輸入代幣 1.25 美元,每百萬個輸出代幣 10 美元,而 Claude Sonnet 4.5 的定價為每百萬個輸入代幣 3 美元,每百萬個輸出代幣 15 美元。 主要的成本優化策略是最大限度地減少不必要的代幣使用。 OpenAI 的 Batch API 提供 50% 的代幣成本折扣,但處理時間會更長(最多 24 小時)。 對於隔夜處理或定期分析,批量處理可節省大量成本。 程式碼使用pdf.ExtractAllText()提取文本,建立 JSONL 批次請求,透過HttpClient上傳到 OpenAI 檔案端點,並提交到批次 API。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/batch-api-processing.cs // Use OpenAI Batch API for 50% cost savings on large-scale document processing using IronPdf; using System.Text.Json; using System.Net.Http.Headers; string openAiApiKey = "your-openai-api-key"; string inputFolder = "documents/"; // Prepare batch requests in JSONL format var batchRequests = new List<string>(); string[] pdfFiles = Directory.GetFiles(inputFolder, "*.pdf"); Console.WriteLine($"Preparing batch for {pdfFiles.Length} documents...\n"); foreach (string filePath in pdfFiles) { var pdf = PdfDocument.FromFile(filePath); string pdfText = pdf.ExtractAllText(); // Truncate to stay within batch API limits if (pdfText.Length > 100000) pdfText = pdfText.Substring(0, 100000) + "\n[Truncated...]"; var request = new { custom_id = Path.GetFileNameWithoutExtension(filePath), method = "POST", url = "/v1/chat/completions", body = new { model = "gpt-4o", messages = new[] { new { role = "system", content = "Summarize the following document concisely." }, new { role = "user", content = pdfText } }, max_tokens = 1000 } }; batchRequests.Add(JsonSerializer.Serialize(request)); } // Create JSONL file string batchFilePath = "batch-requests.jsonl"; File.WriteAllLines(batchFilePath, batchRequests); Console.WriteLine($"Created batch file with {batchRequests.Count} requests"); // Upload file to OpenAI using var httpClient = new HttpClient(); httpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", openAiApiKey); using var fileContent = new MultipartFormDataContent(); fileContent.Add(new ByteArrayContent(File.ReadAllBytes(batchFilePath)), "file", "batch-requests.jsonl"); fileContent.Add(new StringContent("batch"), "purpose"); var uploadResponse = await httpClient.PostAsync("https://api.openai.com/v1/files", fileContent); var uploadResult = JsonSerializer.Deserialize<JsonElement>(await uploadResponse.Content.ReadAsStringAsync()); string fileId = uploadResult.GetProperty("id").GetString()!; Console.WriteLine($"Uploaded file: {fileId}"); // Create batch job (24-hour completion window for 50% discount) var batchJobRequest = new { input_file_id = fileId, endpoint = "/v1/chat/completions", completion_window = "24h" }; var batchResponse = await httpClient.PostAsync( "https://api.openai.com/v1/batches", new StringContent(JsonSerializer.Serialize(batchJobRequest), System.Text.Encoding.UTF8, "application/json") ); var batchResult = JsonSerializer.Deserialize<JsonElement>(await batchResponse.Content.ReadAsStringAsync()); string batchId = batchResult.GetProperty("id").GetString()!; Console.WriteLine($"\nBatch job created: {batchId}"); Console.WriteLine("Job will complete within 24 hours"); Console.WriteLine($"Check status: GET https://api.openai.com/v1/batches/{batchId}"); File.WriteAllText("batch-job-id.txt", batchId); Console.WriteLine("\nBatch ID saved to batch-job-id.txt"); Imports IronPdf Imports System.Text.Json Imports System.Net.Http.Headers Module Program Sub Main() Dim openAiApiKey As String = "your-openai-api-key" Dim inputFolder As String = "documents/" ' Prepare batch requests in JSONL format Dim batchRequests As New List(Of String)() Dim pdfFiles As String() = Directory.GetFiles(inputFolder, "*.pdf") Console.WriteLine($"Preparing batch for {pdfFiles.Length} documents..." & vbCrLf) For Each filePath As String In pdfFiles Dim pdf = PdfDocument.FromFile(filePath) Dim pdfText As String = pdf.ExtractAllText() ' Truncate to stay within batch API limits If pdfText.Length > 100000 Then pdfText = pdfText.Substring(0, 100000) & vbCrLf & "[Truncated...]" End If Dim request = New With { .custom_id = Path.GetFileNameWithoutExtension(filePath), .method = "POST", .url = "/v1/chat/completions", .body = New With { .model = "gpt-4o", .messages = New Object() { New With {.role = "system", .content = "Summarize the following document concisely."}, New With {.role = "user", .content = pdfText} }, .max_tokens = 1000 } } batchRequests.Add(JsonSerializer.Serialize(request)) Next ' Create JSONL file Dim batchFilePath As String = "batch-requests.jsonl" File.WriteAllLines(batchFilePath, batchRequests) Console.WriteLine($"Created batch file with {batchRequests.Count} requests") ' Upload file to OpenAI Using httpClient As New HttpClient() httpClient.DefaultRequestHeaders.Authorization = New AuthenticationHeaderValue("Bearer", openAiApiKey) Using fileContent As New MultipartFormDataContent() fileContent.Add(New ByteArrayContent(File.ReadAllBytes(batchFilePath)), "file", "batch-requests.jsonl") fileContent.Add(New StringContent("batch"), "purpose") Dim uploadResponse = Await httpClient.PostAsync("https://api.openai.com/v1/files", fileContent) Dim uploadResult = JsonSerializer.Deserialize(Of JsonElement)(Await uploadResponse.Content.ReadAsStringAsync()) Dim fileId As String = uploadResult.GetProperty("id").GetString() Console.WriteLine($"Uploaded file: {fileId}") ' Create batch job (24-hour completion window for 50% discount) Dim batchJobRequest = New With { .input_file_id = fileId, .endpoint = "/v1/chat/completions", .completion_window = "24h" } Dim batchResponse = Await httpClient.PostAsync( "https://api.openai.com/v1/batches", New StringContent(JsonSerializer.Serialize(batchJobRequest), System.Text.Encoding.UTF8, "application/json") ) Dim batchResult = JsonSerializer.Deserialize(Of JsonElement)(Await batchResponse.Content.ReadAsStringAsync()) Dim batchId As String = batchResult.GetProperty("id").GetString() Console.WriteLine(vbCrLf & $"Batch job created: {batchId}") Console.WriteLine("Job will complete within 24 hours") Console.WriteLine($"Check status: GET https://api.openai.com/v1/batches/{batchId}") File.WriteAllText("batch-job-id.txt", batchId) Console.WriteLine(vbCrLf & "Batch ID saved to batch-job-id.txt") End Using End Using End Sub End Module $vbLabelText $csharpLabel 在生產環境中監控令牌使用情況至關重要。 許多組織發現,80% 的文件都可以用更小、更便宜的型號來處理,而將昂貴的型號只用於處理複雜的情況。 快取和增量處理 對於增量更新的文件集合,智慧快取和增量處理策略可以顯著降低成本。 文檔級快取會將結果與來源 PDF 的雜湊值一起存儲,從而防止對未變更的文件進行不必要的重新處理。 DocumentCacheManager類別使用 SHA256 的ComputeFileHash()來偵測更改,並將結果儲存在具有LastAccessed時間戳記的CacheEntry物件中。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/incremental-caching.cs // Cache AI processing results using file hashes to avoid reprocessing unchanged documents using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; using System.Security.Cryptography; using System.Text.Json; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); // Configure caching string cacheFolder = "ai-cache/"; string documentsFolder = "documents/"; Directory.CreateDirectory(cacheFolder); var cacheManager = new DocumentCacheManager(cacheFolder); // Process documents with caching string[] pdfFiles = Directory.GetFiles(documentsFolder, "*.pdf"); int cached = 0, processed = 0; foreach (string filePath in pdfFiles) { string fileName = Path.GetFileName(filePath); string fileHash = cacheManager.ComputeFileHash(filePath); var cachedResult = cacheManager.GetCachedResult(fileName, fileHash); if (cachedResult != null) { Console.WriteLine($"[CACHE HIT] {fileName}"); cached++; continue; } Console.WriteLine($"[PROCESSING] {fileName}"); var pdf = PdfDocument.FromFile(filePath); string summary = await pdf.Summarize(); cacheManager.CacheResult(fileName, fileHash, summary); processed++; } Console.WriteLine($"\nProcessing complete: {cached} cached, {processed} newly processed"); Console.WriteLine($"Cost savings: {(cached * 100.0 / Math.Max(1, cached + processed)):F1}% served from cache"); // Hash-based cache manager with JSON index class DocumentCacheManager { private readonly string _cacheFolder; private readonly string _indexPath; private Dictionary<string, CacheEntry> _index; public DocumentCacheManager(string cacheFolder) { _cacheFolder = cacheFolder; _indexPath = Path.Combine(cacheFolder, "cache-index.json"); _index = LoadIndex(); } private Dictionary<string, CacheEntry> LoadIndex() { if (File.Exists(_indexPath)) { string json = File.ReadAllText(_indexPath); return JsonSerializer.Deserialize<Dictionary<string, CacheEntry>>(json) ?? new(); } return new Dictionary<string, CacheEntry>(); } private void SaveIndex() { string json = JsonSerializer.Serialize(_index, new JsonSerializerOptions { WriteIndented = true }); File.WriteAllText(_indexPath, json); } // SHA256 hash to detect file changes public string ComputeFileHash(string filePath) { using var sha256 = SHA256.Create(); using var stream = File.OpenRead(filePath); byte[] hash = sha256.ComputeHash(stream); return Convert.ToHexString(hash); } public string? GetCachedResult(string fileName, string currentHash) { if (_index.TryGetValue(fileName, out var entry)) { if (entry.FileHash == currentHash && File.Exists(entry.CachePath)) { entry.LastAccessed = DateTime.UtcNow; SaveIndex(); return File.ReadAllText(entry.CachePath); } } return null; } public void CacheResult(string fileName, string fileHash, string result) { string cachePath = Path.Combine(_cacheFolder, $"{Path.GetFileNameWithoutExtension(fileName)}-{fileHash[..8]}.txt"); File.WriteAllText(cachePath, result); _index[fileName] = new CacheEntry { FileHash = fileHash, CachePath = cachePath, CreatedAt = DateTime.UtcNow, LastAccessed = DateTime.UtcNow }; SaveIndex(); } } class CacheEntry { public string FileHash { get; set; } = ""; public string CachePath { get; set; } = ""; public DateTime CreatedAt { get; set; } public DateTime LastAccessed { get; set; } } Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI Imports System.Security.Cryptography Imports System.Text.Json ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) ' Configure caching Dim cacheFolder As String = "ai-cache/" Dim documentsFolder As String = "documents/" Directory.CreateDirectory(cacheFolder) Dim cacheManager = New DocumentCacheManager(cacheFolder) ' Process documents with caching Dim pdfFiles As String() = Directory.GetFiles(documentsFolder, "*.pdf") Dim cached As Integer = 0, processed As Integer = 0 For Each filePath As String In pdfFiles Dim fileName As String = Path.GetFileName(filePath) Dim fileHash As String = cacheManager.ComputeFileHash(filePath) Dim cachedResult = cacheManager.GetCachedResult(fileName, fileHash) If cachedResult IsNot Nothing Then Console.WriteLine($"[CACHE HIT] {fileName}") cached += 1 Continue For End If Console.WriteLine($"[PROCESSING] {fileName}") Dim pdf = PdfDocument.FromFile(filePath) Dim summary As String = Await pdf.Summarize() cacheManager.CacheResult(fileName, fileHash, summary) processed += 1 Next Console.WriteLine($"\nProcessing complete: {cached} cached, {processed} newly processed") Console.WriteLine($"Cost savings: {(cached * 100.0 / Math.Max(1, cached + processed)):F1}% served from cache") ' Hash-based cache manager with JSON index Class DocumentCacheManager Private ReadOnly _cacheFolder As String Private ReadOnly _indexPath As String Private _index As Dictionary(Of String, CacheEntry) Public Sub New(cacheFolder As String) _cacheFolder = cacheFolder _indexPath = Path.Combine(cacheFolder, "cache-index.json") _index = LoadIndex() End Sub Private Function LoadIndex() As Dictionary(Of String, CacheEntry) If File.Exists(_indexPath) Then Dim json As String = File.ReadAllText(_indexPath) Return JsonSerializer.Deserialize(Of Dictionary(Of String, CacheEntry))(json) ?? New Dictionary(Of String, CacheEntry)() End If Return New Dictionary(Of String, CacheEntry)() End Function Private Sub SaveIndex() Dim json As String = JsonSerializer.Serialize(_index, New JsonSerializerOptions With {.WriteIndented = True}) File.WriteAllText(_indexPath, json) End Sub ' SHA256 hash to detect file changes Public Function ComputeFileHash(filePath As String) As String Using sha256 = SHA256.Create() Using stream = File.OpenRead(filePath) Dim hash As Byte() = sha256.ComputeHash(stream) Return Convert.ToHexString(hash) End Using End Using End Function Public Function GetCachedResult(fileName As String, currentHash As String) As String If _index.TryGetValue(fileName, entry) Then If entry.FileHash = currentHash AndAlso File.Exists(entry.CachePath) Then entry.LastAccessed = DateTime.UtcNow SaveIndex() Return File.ReadAllText(entry.CachePath) End If End If Return Nothing End Function Public Sub CacheResult(fileName As String, fileHash As String, result As String) Dim cachePath As String = Path.Combine(_cacheFolder, $"{Path.GetFileNameWithoutExtension(fileName)}-{fileHash.Substring(0, 8)}.txt") File.WriteAllText(cachePath, result) _index(fileName) = New CacheEntry With { .FileHash = fileHash, .CachePath = cachePath, .CreatedAt = DateTime.UtcNow, .LastAccessed = DateTime.UtcNow } SaveIndex() End Sub End Class Class CacheEntry Public Property FileHash As String = "" Public Property CachePath As String = "" Public Property CreatedAt As DateTime Public Property LastAccessed As DateTime End Class $vbLabelText $csharpLabel 2026 年推出的 GPT-5 和 Claude Sonnet 4.5 還具有自動提示快取功能,可減少重複模式的有效令牌消耗 50-90%,為大規模操作節省大量成本。 實際使用案例 法律發現與合約分析 傳統的法律取證需要大量初級律師手動審查數十萬頁文件。 人工智慧驅動的發現改變了這個過程,能夠快速識別相關文件、自動進行特權審查並提取關鍵證據事實。 IronPDF 的 AI 整合支援複雜的法律工作流程:特權偵測、相關性評分、問題識別和關鍵資料擷取。 律師事務所表示,取證審查時間縮短了 70-80%,使他們能夠用更小的團隊處理更大的案件。 到 2026 年,隨著 GPT-5 和 Claude Sonnet 4.5 準確率的提高和幻覺率的降低,法律專業人士可以信賴人工智慧輔助分析來做出越來越重要的決策。 財務報告分析 金融分析師花費大量時間從獲利報告、美國證券交易委員會文件和分析師簡報中提取數據。 人工智慧驅動的財務文件處理可自動提取數據,使分析師能夠專注於解釋數據而不是收集數據。 本範例使用pdf.Query()和CompanyFinancials JSON 模式處理多個 10-K 文件,以提取和比較各公司的收入、利潤率和風險因素。 :path=/static-assets/pdf/content-code-examples/tutorials/ai-powered-pdf-processing-csharp/financial-sector-analysis.cs // Compare financial metrics across multiple company filings for sector analysis using IronPdf; using IronPdf.AI; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.Memory; using Microsoft.SemanticKernel.Connectors.OpenAI; using System.Text.Json; using System.Text; // Azure OpenAI configuration string azureEndpoint = "https://your-resource.openai.azure.com/"; string apiKey = "your-azure-api-key"; string chatDeployment = "gpt-4o"; string embeddingDeployment = "text-embedding-ada-002"; // Initialize Semantic Kernel var kernel = Kernel.CreateBuilder() .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) .Build(); var memory = new MemoryBuilder() .WithMemoryStore(new VolatileMemoryStore()) .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) .Build(); IronDocumentAI.Initialize(kernel, memory); // Analyze company filings string[] companyFilings = { "filings/company-a-10k.pdf", "filings/company-b-10k.pdf", "filings/company-c-10k.pdf" }; var sectorData = new List<CompanyFinancials>(); foreach (string filing in companyFilings) { Console.WriteLine($"Analyzing: {Path.GetFileName(filing)}"); var pdf = PdfDocument.FromFile(filing); // Define JSON schema for 10-K extraction (numbers in millions USD) string extractionQuery = @"Extract key financial metrics from this 10-K filing. Return JSON: { ""companyName"": ""string"", ""fiscalYear"": ""string"", ""revenue"": number, ""revenueGrowth"": number, ""grossMargin"": number, ""operatingMargin"": number, ""netIncome"": number, ""eps"": number, ""totalDebt"": number, ""cashPosition"": number, ""employeeCount"": number, ""keyRisks"": [""string""], ""guidance"": ""string"" } Numbers in millions USD. Growth/margins as percentages. Return ONLY valid JSON."; string result = await pdf.Query(extractionQuery); try { var financials = JsonSerializer.Deserialize<CompanyFinancials>(result); if (financials != null) sectorData.Add(financials); } catch { Console.WriteLine($" Warning: Could not parse financials for {filing}"); } } // Generate sector comparison report var report = new StringBuilder(); report.AppendLine("=== Sector Analysis Report ===\n"); report.AppendLine("Revenue Comparison (millions USD):"); foreach (var company in sectorData.OrderByDescending(c => c.Revenue)) report.AppendLine($" {company.CompanyName}: ${company.Revenue:N0} ({company.RevenueGrowth:+0.0;-0.0}% YoY)"); report.AppendLine("\nProfitability Margins:"); foreach (var company in sectorData.OrderByDescending(c => c.OperatingMargin)) report.AppendLine($" {company.CompanyName}: {company.GrossMargin:F1}% gross, {company.OperatingMargin:F1}% operating"); report.AppendLine("\nFinancial Health (Debt vs Cash):"); foreach (var company in sectorData) { double netDebt = company.TotalDebt - company.CashPosition; string status = netDebt < 0 ? "Net Cash" : "Net Debt"; report.AppendLine($" {company.CompanyName}: {status} ${Math.Abs(netDebt):N0}M"); } string reportText = report.ToString(); Console.WriteLine($"\n{reportText}"); File.WriteAllText("sector-analysis-report.txt", reportText); // Save full JSON data string outputJson = JsonSerializer.Serialize(sectorData, new JsonSerializerOptions { WriteIndented = true }); File.WriteAllText("sector-analysis.json", outputJson); Console.WriteLine("Analysis saved to sector-analysis.json and sector-analysis-report.txt"); class CompanyFinancials { public string CompanyName { get; set; } = ""; public string FiscalYear { get; set; } = ""; public double Revenue { get; set; } public double RevenueGrowth { get; set; } public double GrossMargin { get; set; } public double OperatingMargin { get; set; } public double NetIncome { get; set; } public double Eps { get; set; } public double TotalDebt { get; set; } public double CashPosition { get; set; } public int EmployeeCount { get; set; } public List<string> KeyRisks { get; set; } = new(); public string Guidance { get; set; } = ""; } Imports IronPdf Imports IronPdf.AI Imports Microsoft.SemanticKernel Imports Microsoft.SemanticKernel.Memory Imports Microsoft.SemanticKernel.Connectors.OpenAI Imports System.Text.Json Imports System.Text Imports System.IO ' Azure OpenAI configuration Dim azureEndpoint As String = "https://your-resource.openai.azure.com/" Dim apiKey As String = "your-azure-api-key" Dim chatDeployment As String = "gpt-4o" Dim embeddingDeployment As String = "text-embedding-ada-002" ' Initialize Semantic Kernel Dim kernel = Kernel.CreateBuilder() _ .AddAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .AddAzureOpenAIChatCompletion(chatDeployment, azureEndpoint, apiKey) _ .Build() Dim memory = New MemoryBuilder() _ .WithMemoryStore(New VolatileMemoryStore()) _ .WithAzureOpenAITextEmbeddingGeneration(embeddingDeployment, azureEndpoint, apiKey) _ .Build() IronDocumentAI.Initialize(kernel, memory) ' Analyze company filings Dim companyFilings As String() = { "filings/company-a-10k.pdf", "filings/company-b-10k.pdf", "filings/company-c-10k.pdf" } Dim sectorData = New List(Of CompanyFinancials)() For Each filing As String In companyFilings Console.WriteLine($"Analyzing: {Path.GetFileName(filing)}") Dim pdf = PdfDocument.FromFile(filing) ' Define JSON schema for 10-K extraction (numbers in millions USD) Dim extractionQuery As String = "Extract key financial metrics from this 10-K filing. Return JSON:" & vbCrLf & _ "{" & vbCrLf & _ " ""companyName"": ""string""," & vbCrLf & _ " ""fiscalYear"": ""string""," & vbCrLf & _ " ""revenue"": number," & vbCrLf & _ " ""revenueGrowth"": number," & vbCrLf & _ " ""grossMargin"": number," & vbCrLf & _ " ""operatingMargin"": number," & vbCrLf & _ " ""netIncome"": number," & vbCrLf & _ " ""eps"": number," & vbCrLf & _ " ""totalDebt"": number," & vbCrLf & _ " ""cashPosition"": number," & vbCrLf & _ " ""employeeCount"": number," & vbCrLf & _ " ""keyRisks"": [""string""]," & vbCrLf & _ " ""guidance"": ""string""" & vbCrLf & _ "}" & vbCrLf & _ "Numbers in millions USD. Growth/margins as percentages." & vbCrLf & _ "Return ONLY valid JSON." Dim result As String = Await pdf.Query(extractionQuery) Try Dim financials = JsonSerializer.Deserialize(Of CompanyFinancials)(result) If financials IsNot Nothing Then sectorData.Add(financials) End If Catch Console.WriteLine($" Warning: Could not parse financials for {filing}") End Try Next ' Generate sector comparison report Dim report = New StringBuilder() report.AppendLine("=== Sector Analysis Report ===" & vbCrLf) report.AppendLine("Revenue Comparison (millions USD):") For Each company In sectorData.OrderByDescending(Function(c) c.Revenue) report.AppendLine($" {company.CompanyName}: ${company.Revenue:N0} ({company.RevenueGrowth:+0.0;-0.0}% YoY)") Next report.AppendLine(vbCrLf & "Profitability Margins:") For Each company In sectorData.OrderByDescending(Function(c) c.OperatingMargin) report.AppendLine($" {company.CompanyName}: {company.GrossMargin:F1}% gross, {company.OperatingMargin:F1}% operating") Next report.AppendLine(vbCrLf & "Financial Health (Debt vs Cash):") For Each company In sectorData Dim netDebt As Double = company.TotalDebt - company.CashPosition Dim status As String = If(netDebt < 0, "Net Cash", "Net Debt") report.AppendLine($" {company.CompanyName}: {status} ${Math.Abs(netDebt):N0}M") Next Dim reportText As String = report.ToString() Console.WriteLine(vbCrLf & reportText) File.WriteAllText("sector-analysis-report.txt", reportText) ' Save full JSON data Dim outputJson As String = JsonSerializer.Serialize(sectorData, New JsonSerializerOptions With {.WriteIndented = True}) File.WriteAllText("sector-analysis.json", outputJson) Console.WriteLine("Analysis saved to sector-analysis.json and sector-analysis-report.txt") Public Class CompanyFinancials Public Property CompanyName As String = "" Public Property FiscalYear As String = "" Public Property Revenue As Double Public Property RevenueGrowth As Double Public Property GrossMargin As Double Public Property OperatingMargin As Double Public Property NetIncome As Double Public Property Eps As Double Public Property TotalDebt As Double Public Property CashPosition As Double Public Property EmployeeCount As Integer Public Property KeyRisks As List(Of String) = New List(Of String)() Public Property Guidance As String = "" End Class $vbLabelText $csharpLabel 投資公司利用人工智慧驅動的分析每天處理數千份文件,使分析師能夠監控更廣泛的市場覆蓋範圍,並更快地對新出現的機會做出反應。 研究論文摘要 學術研究每年產生數百萬篇論文。 人工智慧驅動的摘要功能可以幫助研究人員快速評估論文的相關性,了解關鍵發現,並確定值得詳細閱讀的論文。 有效的研究總結必須明確研究問題,解釋研究方法,總結主要發現並提出適當的注意事項,並將結果置於背景之中。 研究機構利用人工智慧摘要維護機構知識庫,自動處理新發表的論文。 2026 年,隨著 GPT-5 科學推理能力的提升和 Claude Sonnet 4.5 分析能力的增強,學術摘要的準確性將達到新的水平。 政府文件處理 政府機構會產生大量的文件資料-規章制度、公眾意見、環境影響報告、法庭文件、審計報告。 人工智慧驅動的文件處理技術透過監管合規性分析、環境影響評估和立法跟踪,使政府資訊能夠發揮作用。 公眾意見分析面臨獨特的挑戰—重大監管提案可能會收到數十萬條意見。 人工智慧系統可以按主題對評論進行分類,識別共同主題,檢測協同行動,並提取足以促使機構做出回應的實質論點。 2026 年推出的 AI 模型將為政府文件處理帶來前所未有的能力,支持民主透明度和知情決策。 故障排除和技術支持 常見錯誤的快速解決方法 首次渲染速度慢?正常。 Chrome 初始化需要 2-3 秒,然後速度會加快。 -遇到雲端問題?請至少使用 Azure B1 或同等資源。 -缺少資源?設定基礎路徑或以 base64 格式嵌入。 -缺少元素?新增 RenderDelay 以延遲 JavaScript 執行。 -記憶體不足?請更新至最新版 IronPDF 以解決效能問題。 -表單欄位有問題?請確保欄位名稱唯一,並更新至最新版本。 取得 IronPDF 工程師的協助,全天候 24/7 服務 IronPDF 提供 24/7 全天候工程師支援。 HTML 轉 PDF 或 AI 整合方面遇到問題? 聯絡我們: -全面故障排除指南 -效能優化策略 -工程支援請求 快速故障排除清單 後續步驟 既然您已經了解了人工智慧驅動的 PDF 處理,下一步就是探索 IronPDF 的更廣泛功能。 OpenAI 整合指南更深入地介紹了摘要、查詢和記憶模式,而文字和影像擷取教學則展示瞭如何在 AI 分析之前預處理 PDF。 對於文件組裝工作流程,學習如何合併和拆分 PDF以進行批量處理。 當您準備擴展到 AI 功能之外時,完整的 PDF 編輯教學涵蓋浮水印、頁首、頁尾、表單和註釋。 ChatGPT C# 教學展示了不同的模式,以說明其他 AI 整合方法。 生產環境部署在Azure WebApps 和 Functions 部署指南中有所介紹, C# PDF 建立教學涵蓋如何從 HTML、URL 和原始內容產生 PDF。 準備好開始? 立即開始您的 30 天免費試用,在生產環境中進行測試,無浮水印,靈活的授權模式可隨您的團隊規模擴展。 如果您對人工智慧整合或 IronPDF 的任何功能有任何疑問,我們的工程支援團隊隨時為您提供協助。 常見問題解答 在 C# 中使用 AI 進行 PDF 處理有哪些好處? 基於 C# 的 AI 驅動型 PDF 處理功能可實現文件摘要、資料擷取至 JSON 以及建置問答系統等進階功能。它能夠顯著提高處理大量文件的效率和準確性。 IronPDF 如何整合人工智慧摘要文件? IronPDF 透過利用 GPT-5 和 Claude 等模型整合人工智慧,可以分析和總結文檔,更容易獲得見解並快速理解大型文字。 RAG模式在AI驅動的PDF處理中扮演什麼角色? RAG(檢索和生成)模式用於 AI 驅動的 PDF 處理,以提高資訊檢索和產生的質量,從而實現更準確、更具上下文相關性的文件分析。 如何使用 IronPDF 從 PDF 中提取結構化資料? IronPDF 能夠將 PDF 中的結構化資料提取為 JSON 等格式,從而促進不同應用程式和系統之間的無縫資料整合和分析。 IronPDF能否利用人工智慧處理大型文件庫? 是的,IronPDF 可以利用 AI 模型有效地處理大型文件庫,自動執行摘要和資料擷取等任務,並且能夠很好地與 OpenAI 和 Azure OpenAI 整合。 IronPDF 支援哪些用於 PDF 處理的 AI 模型? IronPDF 支援 GPT-5 和 Claude 等高階 AI 模型,這些模型用於文件摘要和問答系統建置等任務,從而增強整體處理能力。 IronPDF 如何幫助建立問答系統? IronPDF 透過處理和分析文件來提取相關資訊,從而幫助建立問答系統,這些資訊隨後可用於產生對使用者查詢的準確回應。 C#中AI驅動的PDF處理的主要應用情境有哪些? 主要應用情境包括文件摘要、結構化資料擷取、問答系統開發,以及使用 OpenAI 等 AI 整合處理大規模文件處理任務。 是否可以使用 IronPDF 和 Azure OpenAI 進行文件處理? 是的,IronPDF 可以與 Azure OpenAI 集成,以增強文件處理任務,為 PDF 文件的摘要、提取和分析提供可擴展的解決方案。 IronPDF 如何利用人工智慧來改善文件分析? IronPDF 利用 AI 模型來改進文件分析,實現摘要、資料擷取和資訊檢索等任務的自動化和增強,從而提高文件處理效率和準確性。 Ahmad Sohail 立即與工程團隊聊天 全堆疊開發人員 Ahmad 是一名全堆疊開發人員,在 C#、Python 和 Web 技術方面有深厚的基礎。 在加入 Iron Software 團隊之前,Ahmad 從事自動化專案和 API 整合工作,專注於改善效能和開發人員體驗。在空閒時間,他喜歡嘗試 UI/UX 想法,為開源工具貢獻心力,偶爾也會鑽研技術撰寫和文件,讓複雜的主題更容易理解。 準備好開始了嗎? Nuget 下載 17,386,124 | 版本: 2026.2 剛剛發布 免費 NuGet 下載 總下載量:17,386,124 查看許可證