C# PDF 資料脫敏:使用 IronPDF 刪除敏感資料並清理文檔 Curtis Chau 更新:2026年2月3日 下載 IronPDF NuGet 下載 DLL 下載 Windows 安裝程式 開始免費試用 法學碩士副本 法學碩士副本 將頁面複製為 Markdown 格式,用於 LLMs 在 ChatGPT 中打開 請向 ChatGPT 諮詢此頁面 在雙子座打開 請向 Gemini 詢問此頁面 在 Grok 中打開 向 Grok 詢問此頁面 打開困惑 向 Perplexity 詢問有關此頁面的信息 分享 在 Facebook 上分享 分享到 X(Twitter) 在 LinkedIn 上分享 複製連結 電子郵件文章 This article was translated from English: Does it need improvement? Translated View the article in English 使用 IronPDF 在C# .NET中進行 PDF 編輯,可以永久地從文件的內部結構中刪除敏感內容,而不僅僅是視覺上覆蓋它,因此無論進行多少複製、搜尋或取證分析都無法恢復原始資料。 這遠不止是在文字上添加黑色矩形:IronPDF 提供使用正規表示式模式匹配的文字編輯、基於區域的簽名和圖像編輯、元資料剝離、文件清理以消除嵌入式腳本以及漏洞掃描,為 .NET 開發人員提供了一套完整的工具包,用於符合HIPAA 、 GDPR和PCI DSS標準的文件保護工作流程。 TL;DR:快速入門指南 本教學介紹如何在 C# .NET 中永久刪除 PDF 文件中的敏感內容,包括文字模式、圖像區域、元資料和嵌入式腳本。 -適用對象:在醫療保健、法律、金融或政府領域處理敏感文件的 .NET 開發人員。 -你將建立的功能:使用正規表示式模式配對進行文字編輯(社保號碼、信用卡、電子郵件),基於座標的區域編輯(用於簽名和照片),元資料清理,PDF 清理(用於移除嵌入式腳本),以及基於 YARA 的漏洞掃描。 -運行環境: .NET 10、.NET 8 LTS、.NET Framework 4.6.2+ 和 .NET Standard 2.0。所有操作均在本地運行,沒有外部相依性。 -何時使用此方法:當您需要共用文件以進行法律取證、資訊自由法案請求或對外分發,同時確保刪除的內容真正消失時。 -從技術角度來看,這很重要:視覺疊加層不會影響 PDF 內容流中的原始文字的恢復。 IronPDF 的編輯功能會從文件結構本身刪除字元數據,使恢復成為不可能。 只需幾行程式碼即可從 PDF 檔案中刪除敏感文字: 立即開始使用 NuGet 建立 PDF 檔案: 使用 NuGet 套件管理器安裝 IronPDF PM > Install-Package IronPdf 複製並運行這段程式碼。 using IronPdf; PdfDocument pdf = PdfDocument.FromFile("confidential-report.pdf"); pdf.RedactTextOnAllPages("CONFIDENTIAL"); pdf.SaveAs("redacted-report.pdf"); 部署到您的生產環境進行測試 立即開始在您的專案中使用 IronPDF,免費試用! 免費試用30天 購買或註冊 IronPDF 的 30 天試用版後,請在應用程式開始時新增您的授權金鑰。 IronPdf.License.LicenseKey = "KEY"; IronPdf.License.LicenseKey = "KEY"; Imports IronPdf IronPdf.License.LicenseKey = "KEY" $vbLabelText $csharpLabel !{--01001100010010010100001001010010010000010101001001011001010111110101001101010100010001010101010 10100010111110101010001010010010010010100000101001100010111110100001001001100010011111010000100100110001001111010101 !{--010011000100100101000010010100100100000101010010010110010101111101001110010101010101010101010101010101010101010 0100010111110100100101001101010100010000010100110001001100010111110100001001001100010011110010101010 as-heading:2(目錄) TL;DR:快速入門指南 快速概覽 -從PDF文件中刪除文本 -真實塗黑和視覺疊加有什麼區別? 如何刪除PDF文件中所有頁面的文字? 如何僅對特定頁面上的文字進行編輯? 如何自訂已編輯內容的外觀? -模式匹配和自動編輯 如何使用正規表示式尋找和編輯敏感模式? 如何建立一個可重複使用的敏感資料掃描器? 基於區域的編輯 -如何編輯PDF中的特定區域? -如何在不同頁面編輯多個區域? -從 PDF 元數據中移除敏感數據 如何刪除可能洩漏敏感資訊的元資料? .NET 中的 PDF 清理 如何清理 PDF 文件以移除嵌入的腳本和隱藏的威脅? 如何掃描PDF檔案以查找安全漏洞? -完整的工作流程 如何建構完整的脫敏和淨化流程? 後續步驟 真實編輯和視覺疊加有什麼不同? 對於任何處理敏感文件的人來說,了解真正的塗黑和視覺疊加之間的差異至關重要。 許多工具和手動方法會造成資料被編輯的假象,但實際上並沒有刪除底層資料。 這種虛假的安全感已經導致了許多備受矚目的資料外洩和合規失敗事件。 視覺疊加方法通常是在敏感內容上繪製不透明形狀。 PDF結構中的文字內容保持完整。 查看文件的人會看到一個黑色矩形,但原始字元仍然存在於文件的內容流中。 選取頁面上的所有文字、使用輔助使用工具或檢查原始 PDF 數據,即可顯示所有原本隱藏的內容。 當對方律師隨意地將經過刪減的文件恢復原狀時,法庭案件的公正性就會受到影響。 政府機構曾意外洩露機密訊息,這些資訊看似經過審查,但實際上完全可以恢復。 真正的編輯方式有所不同。 當您使用 IronPDF 的編輯方法時,該程式庫會在 PDF 的內部結構中找到指定的文字並將其完全刪除。 角色資料已從內容流中刪除。 視覺呈現會被塗改標記(通常為黑色矩形)取代,但原始內容已從檔案中刪除。無論進行多少次選擇、複製或取證分析,都無法恢復已永久刪除的內容。 IronPDF 透過在結構層面修改 PDF 來實現真正的內容編輯。 RedactTextOnAllPages方法及其變體搜尋頁面內容,識別匹配的文本,將其從文件物件模型中刪除,並可選擇在內容曾經出現的位置繪製視覺指示器。 這種方法符合 NIST 等組織制定的安全文件編輯指南。 其實際意義重大。 如果您需要對外共用文件、提交文件進行法律取證、根據資訊自由請求發布記錄或在保護個人識別資訊的同時分發報告,則只有真正的編輯才能提供足夠的保護。 對於只想將注意力從某些部分轉移到內部草稿的情況,視覺疊加層可能就足夠了,但絕不應該依賴它們來保護實際資料。 如需更多文件安全措施,請參閱我們關於加密 PDF和數位簽章的指南。 如何在C#中對整個PDF文件進行文字編輯? 最常見的編輯場景是刪除文件中所有特定文字。 或許你需要從報告中刪除某人的姓名,從財務報表中刪除帳號,或在對外分發之前刪除內部參考代碼。 IronPDF 的RedactTextOnAllPages方法讓這項操作變得簡單。 輸入 員工記錄文件,包含個人資訊,包括姓名、社會安全號碼和員工編號。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-text-all-pages.cs using IronPdf; // Load the source document PdfDocument pdf = PdfDocument.FromFile("employee-records.pdf"); // Redact an employee name from the entire document pdf.RedactTextOnAllPages("John Smith"); // Redact a Social Security Number pdf.RedactTextOnAllPages("123-45-6789"); // Redact an internal employee ID pdf.RedactTextOnAllPages("EMP-2024-0042"); // Save the cleaned document pdf.SaveAs("employee-records-redacted.pdf"); Imports IronPdf ' Load the source document Dim pdf As PdfDocument = PdfDocument.FromFile("employee-records.pdf") ' Redact an employee name from the entire document pdf.RedactTextOnAllPages("John Smith") ' Redact a Social Security Number pdf.RedactTextOnAllPages("123-45-6789") ' Redact an internal employee ID pdf.RedactTextOnAllPages("EMP-2024-0042") ' Save the cleaned document pdf.SaveAs("employee-records-redacted.pdf") $vbLabelText $csharpLabel 這段程式碼載入一個包含員工資訊的 PDF 文件,並透過對每個值呼叫RedactTextOnAllPages來刪除三條機密資料。 每次呼叫都會搜尋文件中的每一頁,並永久刪除員工姓名、社會安全號碼和內部識別碼的所有符合項目。 範例輸出 預設行為是在被編輯的文字出現的位置繪製黑色矩形,並在文件結構中用星號取代實際字元。 這既能直觀地確認已進行編輯,又能確保原始內容完全消失。 處理較長的文件或多個編輯目標時,您可以有效率地串聯這些呼叫: :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-text-list.cs using IronPdf; using System.Collections.Generic; // Load the document once PdfDocument pdf = PdfDocument.FromFile("quarterly-report.pdf"); // Define all terms that need redaction List<string> sensitiveTerms = new List<string> { "Project Titan", "Sarah Johnson", "Budget: $4.2M", "Q3-INTERNAL-2024", "sarah.johnson@company.com" }; // Redact each term foreach (string term in sensitiveTerms) { pdf.RedactTextOnAllPages(term); } // Save the result pdf.SaveAs("quarterly-report-public.pdf"); Imports IronPdf Imports System.Collections.Generic ' Load the document once Dim pdf As PdfDocument = PdfDocument.FromFile("quarterly-report.pdf") ' Define all terms that need redaction Dim sensitiveTerms As New List(Of String) From { "Project Titan", "Sarah Johnson", "Budget: $4.2M", "Q3-INTERNAL-2024", "sarah.johnson@company.com" } ' Redact each term For Each term As String In sensitiveTerms pdf.RedactTextOnAllPages(term) Next ' Save the result pdf.SaveAs("quarterly-report-public.pdf") $vbLabelText $csharpLabel 當您有一份已知的敏感值清單需要刪除時,這種模式非常有效。 文件載入一次,所有編輯操作都在記憶體中完成,最終結果保存。 每個術語都是獨立處理的,因此術語之間的部分匹配或格式差異不會影響其他編輯。 如何僅對特定頁面上的文字進行編輯? 有時你需要更精確地控制刪減的位置。 文件可能有一個封面頁,其中包含應該保持完整的信息,或者您可能知道機密數據只出現在某些部分。 IronPDF 提供RedactTextOnPage ,用於單頁文字編輯;以及RedactTextOnPages功能,用於針對多個特定頁面進行文字編輯。 輸入 這是一份多頁的合約文件,客戶姓名印在簽名頁上,財務條款則出現在文件的特定頁面上。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-specific-pages.cs using IronPdf; // Load the document PdfDocument pdf = PdfDocument.FromFile("contract-bundle.pdf"); // Redact text only on page 1 (index 0) pdf.RedactTextOnPage(0, "Client Name: Acme Corporation"); // Redact text on pages 3, 5, and 7 (indices 2, 4, 6) int[] financialPages = { 2, 4, 6 }; pdf.RedactTextOnPages(financialPages, "Payment Terms: Net 30"); // Other pages remain untouched except for the specific redactions applied pdf.SaveAs("contract-bundle-redacted.pdf"); Imports IronPdf ' Load the document Dim pdf As PdfDocument = PdfDocument.FromFile("contract-bundle.pdf") ' Redact text only on page 1 (index 0) pdf.RedactTextOnPage(0, "Client Name: Acme Corporation") ' Redact text on pages 3, 5, and 7 (indices 2, 4, 6) Dim financialPages As Integer() = {2, 4, 6} pdf.RedactTextOnPages(financialPages, "Payment Terms: Net 30") ' Other pages remain untouched except for the specific redactions applied pdf.SaveAs("contract-bundle-redacted.pdf") $vbLabelText $csharpLabel 此程式碼示範如何使用RedactTextOnPage對單一頁面進行定向編輯,以及如何使用RedactTextOnPages對多個特定頁面進行定向編輯。 僅從第 1 頁(索引 0)中刪除客戶名稱,而從第 3、5 和 7 頁(索引 2、4、6)中刪除付款條款,其餘頁面保持不變。 範例輸出 IronPDF 中的頁面索引是從零開始的,這表示第一頁的索引為 0,第二頁的索引為 1,依此類推。 這符合標準程式設計慣例,也與大多數開發人員對陣列存取的思考方式一致。 針對特定頁面進行處理可以提高處理大型文件的效能。 與其掃描數百頁查找只在少數位置出現的文本,不如直接指示編輯引擎在何處查找。 這對於批量處理場景非常重要,因為在這種場景下,您可能需要處理成千上萬份文件。 為了獲得最大吞吐量,請考慮使用非同步和多執行緒技術。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-large-document.cs using IronPdf; // Process a large document efficiently PdfDocument pdf = PdfDocument.FromFile("annual-report-500-pages.pdf"); // We know from document structure that: // - Executive summary with names is on pages 1-3 // - Financial data is on pages 45-60 // - Appendix with employee info is on pages 480-495 // Redact executive names from summary section for (int i = 0; i <= 2; i++) { pdf.RedactTextOnPage(i, "CEO: Robert Williams"); pdf.RedactTextOnPage(i, "CFO: Maria Garcia"); } // Redact specific financial figures from the financial section int[] financialSection = { 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 }; pdf.RedactTextOnPages(financialSection, "Net Revenue: $847M"); // Redact employee identifiers from appendix for (int i = 479; i <= 494; i++) { pdf.RedactTextOnPage(i, "Employee ID:"); } pdf.SaveAs("annual-report-public-release.pdf"); Imports IronPdf ' Process a large document efficiently Dim pdf As PdfDocument = PdfDocument.FromFile("annual-report-500-pages.pdf") ' We know from document structure that: ' - Executive summary with names is on pages 1-3 ' - Financial data is on pages 45-60 ' - Appendix with employee info is on pages 480-495 ' Redact executive names from summary section For i As Integer = 0 To 2 pdf.RedactTextOnPage(i, "CEO: Robert Williams") pdf.RedactTextOnPage(i, "CFO: Maria Garcia") Next ' Redact specific financial figures from the financial section Dim financialSection As Integer() = {44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59} pdf.RedactTextOnPages(financialSection, "Net Revenue: $847M") ' Redact employee identifiers from appendix For i As Integer = 479 To 494 pdf.RedactTextOnPage(i, "Employee ID:") Next pdf.SaveAs("annual-report-public-release.pdf") $vbLabelText $csharpLabel 這種有針對性的方法只處理 500 頁文件的相關部分,與掃描每一頁的每個編輯術語相比,大大縮短了執行時間。 如何自訂已編輯內容的外觀? IronPDF 提供了多個參數來控制編輯內容在最終文件中的顯示方式。 您可以調整區分大小寫、全字匹配、是否繪製視覺矩形以及在編輯內容的位置顯示什麼替換文字。 輸入 一份法律文件,其中包含各種敏感術語,包括分類標籤、密碼和內部參考代碼,需要不同的編輯處理。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/customize-redaction-appearance.cs using IronPdf; // Load the document PdfDocument pdf = PdfDocument.FromFile("legal-brief.pdf"); // Case-sensitive redaction: only matches exact case // "CLASSIFIED" will be redacted but "classified" or "Classified" will not pdf.RedactTextOnAllPages( "CLASSIFIED", CaseSensitive: true, OnlyMatchWholeWords: true, DrawRectangles: true, ReplacementText: "[REDACTED]" ); // Case-insensitive redaction: matches regardless of case // Will redact "Secret", "SECRET", "secret", etc. pdf.RedactTextOnAllPages( "secret", CaseSensitive: false, OnlyMatchWholeWords: true, DrawRectangles: true, ReplacementText: "*****" ); // Whole word disabled: matches partial strings too // Will redact "password", "passwords", "mypassword123", etc. pdf.RedactTextOnAllPages( "password", CaseSensitive: false, OnlyMatchWholeWords: false, DrawRectangles: true, ReplacementText: "XXXXX" ); // No visual rectangle: text is removed but no black box appears // Useful when you want seamless removal without obvious redaction marks pdf.RedactTextOnAllPages( "internal-reference-code", CaseSensitive: true, OnlyMatchWholeWords: true, DrawRectangles: false, ReplacementText: "" ); pdf.SaveAs("legal-brief-redacted.pdf"); Imports IronPdf ' Load the document Dim pdf As PdfDocument = PdfDocument.FromFile("legal-brief.pdf") ' Case-sensitive redaction: only matches exact case ' "CLASSIFIED" will be redacted but "classified" or "Classified" will not pdf.RedactTextOnAllPages( "CLASSIFIED", CaseSensitive:=True, OnlyMatchWholeWords:=True, DrawRectangles:=True, ReplacementText:="[REDACTED]" ) ' Case-insensitive redaction: matches regardless of case ' Will redact "Secret", "SECRET", "secret", etc. pdf.RedactTextOnAllPages( "secret", CaseSensitive:=False, OnlyMatchWholeWords:=True, DrawRectangles:=True, ReplacementText:="*****" ) ' Whole word disabled: matches partial strings too ' Will redact "password", "passwords", "mypassword123", etc. pdf.RedactTextOnAllPages( "password", CaseSensitive:=False, OnlyMatchWholeWords:=False, DrawRectangles:=True, ReplacementText:="XXXXX" ) ' No visual rectangle: text is removed but no black box appears ' Useful when you want seamless removal without obvious redaction marks pdf.RedactTextOnAllPages( "internal-reference-code", CaseSensitive:=True, OnlyMatchWholeWords:=True, DrawRectangles:=False, ReplacementText:="" ) pdf.SaveAs("legal-brief-redacted.pdf") $vbLabelText $csharpLabel 此程式碼示範了使用RedactTextOnAllPages的可選參數的四種不同的編輯配置。 它顯示區分大小寫的精確匹配(使用"[已編輯]"替換)、不區分大小寫的匹配(使用星號)、部分單字匹配(用於捕獲"密碼"等變體)以及無視覺矩形的隱形刪除,以實現無縫內容消除。 範例輸出 這些參數根據您的需求發揮不同的作用: CaseSensitive決定匹配是否考慮字母大小寫。 法律文件通常使用具有特定含義的大寫字母,因此區分大小寫的匹配可以確保您只刪除完全匹配的項目。 處理大小寫不一的一般文字時,可能需要進行不區分大小寫的匹配才能捕獲所有實例。 OnlyMatchWholeWords控制搜尋是符合完整單字還是部分字串。 在編輯姓名時,通常需要進行全詞匹配,這樣"Smith"就不會意外地編輯掉"Blacksmith"或"Smithfield"的一部分。 在對帳號前綴等模式進行編輯時,可能需要進行部分配對才能發現差異。 DrawRectangles指定是否在內容移除的地方顯示黑色方塊。 大多數監管和法律環境都要求使用可見的編輯標記,以證明內容是故意刪除的,而不是意外遺漏的。 內部工作流程可能更傾向於採用不可見的刪除方式,以獲得更簡潔的輸出。 ReplacementText定義了用於取代已編輯內容的字元。 常見選項包括星號、"已編輯"標籤或空字串。 如果有人嘗試從已編輯區域選擇或複製內容,則替換文字會出現在文件結構中。 如何使用正規表示式尋找和編輯敏感模式? 當您需要刪除特定值時,刪除已知文字字串是有效的,但許多機密資料類型遵循可預測的模式,而不是固定值。 社會安全號碼、信用卡號碼、電子郵件地址、電話號碼和日期都具有可識別的格式,可以用正規表示式進行比對。 建立基於模式的編輯系統,可以在無需預先知道每個特定值的情況下,從 PDF 內容中刪除私人資訊。 IronPDF 的文字擷取功能與編輯方法結合,可實現強大的模式匹配工作流程。 提取文本,使用 .NET 正規表示式識別匹配項,然後對每個發現的值進行編輯。 using IronPdf; using System.Text.RegularExpressions; using System.Collections.Generic; public class PatternRedactor { // Common patterns for sensitive data private static readonly Dictionary<string, string> SensitivePatterns = new Dictionary<string, string> { // US Social Security Number: 123-45-6789 { "SSN", @"\b\d{3}-\d{2}-\d{4}\b" }, // Credit Card Numbers: various formats with 13-19 digits { "CreditCard", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" }, // Email Addresses { "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" }, // US Phone Numbers: (123) 456-7890 or 123-456-7890 { "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }, // Dates: MM/DD/YYYY or MM-DD-YYYY { "Date", @"\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" }, // IP Addresses { "IPAddress", @"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b" } }; public void RedactPatterns(string inputPath, string outputPath, params string[] patternNames) { // Load the PDF PdfDocument pdf = PdfDocument.FromFile(inputPath); // Extract all text from the document string fullText = pdf.ExtractAllText(); // Track unique matches to avoid duplicate redaction attempts HashSet<string> matchesToRedact = new HashSet<string>(); // Find all matches for requested patterns foreach (string patternName in patternNames) { if (SensitivePatterns.TryGetValue(patternName, out string pattern)) { Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); MatchCollection matches = regex.Matches(fullText); foreach (Match match in matches) { matchesToRedact.Add(match.Value); } } } // Redact each unique match foreach (string sensitiveValue in matchesToRedact) { pdf.RedactTextOnAllPages(sensitiveValue); } // Save the redacted document pdf.SaveAs(outputPath); } } // Usage example class Program { static void Main() { PatternRedactor redactor = new PatternRedactor(); // Redact SSNs and credit cards from a financial document redactor.RedactPatterns( "customer-data.pdf", "customer-data-safe.pdf", "SSN", "CreditCard", "Email" ); } } using IronPdf; using System.Text.RegularExpressions; using System.Collections.Generic; public class PatternRedactor { // Common patterns for sensitive data private static readonly Dictionary<string, string> SensitivePatterns = new Dictionary<string, string> { // US Social Security Number: 123-45-6789 { "SSN", @"\b\d{3}-\d{2}-\d{4}\b" }, // Credit Card Numbers: various formats with 13-19 digits { "CreditCard", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" }, // Email Addresses { "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" }, // US Phone Numbers: (123) 456-7890 or 123-456-7890 { "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }, // Dates: MM/DD/YYYY or MM-DD-YYYY { "Date", @"\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" }, // IP Addresses { "IPAddress", @"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b" } }; public void RedactPatterns(string inputPath, string outputPath, params string[] patternNames) { // Load the PDF PdfDocument pdf = PdfDocument.FromFile(inputPath); // Extract all text from the document string fullText = pdf.ExtractAllText(); // Track unique matches to avoid duplicate redaction attempts HashSet<string> matchesToRedact = new HashSet<string>(); // Find all matches for requested patterns foreach (string patternName in patternNames) { if (SensitivePatterns.TryGetValue(patternName, out string pattern)) { Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); MatchCollection matches = regex.Matches(fullText); foreach (Match match in matches) { matchesToRedact.Add(match.Value); } } } // Redact each unique match foreach (string sensitiveValue in matchesToRedact) { pdf.RedactTextOnAllPages(sensitiveValue); } // Save the redacted document pdf.SaveAs(outputPath); } } // Usage example class Program { static void Main() { PatternRedactor redactor = new PatternRedactor(); // Redact SSNs and credit cards from a financial document redactor.RedactPatterns( "customer-data.pdf", "customer-data-safe.pdf", "SSN", "CreditCard", "Email" ); } } Imports IronPdf Imports System.Text.RegularExpressions Imports System.Collections.Generic Public Class PatternRedactor ' Common patterns for sensitive data Private Shared ReadOnly SensitivePatterns As New Dictionary(Of String, String) From { ' US Social Security Number: 123-45-6789 {"SSN", "\b\d{3}-\d{2}-\d{4}\b"}, ' Credit Card Numbers: various formats with 13-19 digits {"CreditCard", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"}, ' Email Addresses {"Email", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"}, ' US Phone Numbers: (123) 456-7890 or 123-456-7890 {"Phone", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"}, ' Dates: MM/DD/YYYY or MM-DD-YYYY {"Date", "\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b"}, ' IP Addresses {"IPAddress", "\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b"} } Public Sub RedactPatterns(inputPath As String, outputPath As String, ParamArray patternNames As String()) ' Load the PDF Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath) ' Extract all text from the document Dim fullText As String = pdf.ExtractAllText() ' Track unique matches to avoid duplicate redaction attempts Dim matchesToRedact As New HashSet(Of String)() ' Find all matches for requested patterns For Each patternName As String In patternNames Dim pattern As String = Nothing If SensitivePatterns.TryGetValue(patternName, pattern) Then Dim regex As New Regex(pattern, RegexOptions.IgnoreCase) Dim matches As MatchCollection = regex.Matches(fullText) For Each match As Match In matches matchesToRedact.Add(match.Value) Next End If Next ' Redact each unique match For Each sensitiveValue As String In matchesToRedact pdf.RedactTextOnAllPages(sensitiveValue) Next ' Save the redacted document pdf.SaveAs(outputPath) End Sub End Class ' Usage example Class Program Shared Sub Main() Dim redactor As New PatternRedactor() ' Redact SSNs and credit cards from a financial document redactor.RedactPatterns( "customer-data.pdf", "customer-data-safe.pdf", "SSN", "CreditCard", "Email" ) End Sub End Class $vbLabelText $csharpLabel 這種基於模式的方法具有良好的可擴充性,因為您只需定義一次模式,即可將其套用至任何文件。 新增的資料類型只需要在字典中新增新的正規表示式模式。 如何建立一個可重複使用的敏感資料掃描器? 對於生產環境,您通常需要掃描文件並報告其中存在的機密信息,然後再決定是否進行編輯。 這有助於合規性審計,並允許人工審核編輯決定。 下列類別除了提供編輯功能外,還提供掃描功能。 using IronPdf; using System.Collections.Generic; using System.Text.RegularExpressions; using System.Linq; public class SensitiveDataMatch { public string PatternType { get; set; } public string Value { get; set; } public int PageNumber { get; set; } } public class ScanResult { public string FilePath { get; set; } public List<SensitiveDataMatch> Matches { get; set; } = new List<SensitiveDataMatch>(); public bool ContainsSensitiveData => Matches.Count > 0; public Dictionary<string, int> GetSummary() { return Matches.GroupBy(m => m.PatternType) .ToDictionary(g => g.Key, g => g.Count()); } } public class DocumentScanner { private readonly Dictionary<string, string> _patterns; public DocumentScanner() { _patterns = new Dictionary<string, string> { { "Social Security Number", @"\b\d{3}-\d{2}-\d{4}\b" }, { "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" }, { "Email Address", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" }, { "Phone Number", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }, { "Date of Birth Pattern", @"\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" } }; } public ScanResult ScanDocument(string filePath) { ScanResult result = new ScanResult { FilePath = filePath }; PdfDocument pdf = PdfDocument.FromFile(filePath); // Scan each page individually to track location for (int pageIndex = 0; pageIndex < pdf.PageCount; pageIndex++) { string pageText = pdf.ExtractTextFromPage(pageIndex); foreach (var pattern in _patterns) { Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase); MatchCollection matches = regex.Matches(pageText); foreach (Match match in matches) { result.Matches.Add(new SensitiveDataMatch { PatternType = pattern.Key, Value = MaskValue(match.Value, pattern.Key), PageNumber = pageIndex + 1 }); } } } return result; } // Partially mask values for safe storage private string MaskValue(string value, string patternType) { if (patternType == "Social Security Number" && value.Length >= 4) { return "XXX-XX-" + value.Substring(value.Length - 4); } if (patternType == "Credit Card" && value.Length >= 4) { return "****-****-****-" + value.Substring(value.Length - 4); } if (patternType == "Email Address") { int atIndex = value.IndexOf('@'); if (atIndex > 2) { return value.Substring(0, 2) + "***" + value.Substring(atIndex); } } return value.Length > 4 ? value.Substring(0, 2) + "***" : "****"; } public void ScanAndRedact(string inputPath, string outputPath) { // First scan to identify sensitive data ScanResult scanResult = ScanDocument(inputPath); if (!scanResult.ContainsSensitiveData) { return; } // Load document for redaction PdfDocument pdf = PdfDocument.FromFile(inputPath); // Extract unique actual values (not masked) for redaction string fullText = pdf.ExtractAllText(); HashSet<string> valuesToRedact = new HashSet<string>(); foreach (var pattern in _patterns) { Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase); foreach (Match match in regex.Matches(fullText)) { valuesToRedact.Add(match.Value); } } // Apply redactions foreach (string value in valuesToRedact) { pdf.RedactTextOnAllPages(value); } pdf.SaveAs(outputPath); } } // Usage class Program { static void Main() { DocumentScanner scanner = new DocumentScanner(); // Scan only (for audit purposes) ScanResult result = scanner.ScanDocument("application-form.pdf"); var summary = result.GetSummary(); // Scan and redact in one operation scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf"); } } using IronPdf; using System.Collections.Generic; using System.Text.RegularExpressions; using System.Linq; public class SensitiveDataMatch { public string PatternType { get; set; } public string Value { get; set; } public int PageNumber { get; set; } } public class ScanResult { public string FilePath { get; set; } public List<SensitiveDataMatch> Matches { get; set; } = new List<SensitiveDataMatch>(); public bool ContainsSensitiveData => Matches.Count > 0; public Dictionary<string, int> GetSummary() { return Matches.GroupBy(m => m.PatternType) .ToDictionary(g => g.Key, g => g.Count()); } } public class DocumentScanner { private readonly Dictionary<string, string> _patterns; public DocumentScanner() { _patterns = new Dictionary<string, string> { { "Social Security Number", @"\b\d{3}-\d{2}-\d{4}\b" }, { "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" }, { "Email Address", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" }, { "Phone Number", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }, { "Date of Birth Pattern", @"\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" } }; } public ScanResult ScanDocument(string filePath) { ScanResult result = new ScanResult { FilePath = filePath }; PdfDocument pdf = PdfDocument.FromFile(filePath); // Scan each page individually to track location for (int pageIndex = 0; pageIndex < pdf.PageCount; pageIndex++) { string pageText = pdf.ExtractTextFromPage(pageIndex); foreach (var pattern in _patterns) { Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase); MatchCollection matches = regex.Matches(pageText); foreach (Match match in matches) { result.Matches.Add(new SensitiveDataMatch { PatternType = pattern.Key, Value = MaskValue(match.Value, pattern.Key), PageNumber = pageIndex + 1 }); } } } return result; } // Partially mask values for safe storage private string MaskValue(string value, string patternType) { if (patternType == "Social Security Number" && value.Length >= 4) { return "XXX-XX-" + value.Substring(value.Length - 4); } if (patternType == "Credit Card" && value.Length >= 4) { return "****-****-****-" + value.Substring(value.Length - 4); } if (patternType == "Email Address") { int atIndex = value.IndexOf('@'); if (atIndex > 2) { return value.Substring(0, 2) + "***" + value.Substring(atIndex); } } return value.Length > 4 ? value.Substring(0, 2) + "***" : "****"; } public void ScanAndRedact(string inputPath, string outputPath) { // First scan to identify sensitive data ScanResult scanResult = ScanDocument(inputPath); if (!scanResult.ContainsSensitiveData) { return; } // Load document for redaction PdfDocument pdf = PdfDocument.FromFile(inputPath); // Extract unique actual values (not masked) for redaction string fullText = pdf.ExtractAllText(); HashSet<string> valuesToRedact = new HashSet<string>(); foreach (var pattern in _patterns) { Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase); foreach (Match match in regex.Matches(fullText)) { valuesToRedact.Add(match.Value); } } // Apply redactions foreach (string value in valuesToRedact) { pdf.RedactTextOnAllPages(value); } pdf.SaveAs(outputPath); } } // Usage class Program { static void Main() { DocumentScanner scanner = new DocumentScanner(); // Scan only (for audit purposes) ScanResult result = scanner.ScanDocument("application-form.pdf"); var summary = result.GetSummary(); // Scan and redact in one operation scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf"); } } Imports IronPdf Imports System.Collections.Generic Imports System.Text.RegularExpressions Imports System.Linq Public Class SensitiveDataMatch Public Property PatternType As String Public Property Value As String Public Property PageNumber As Integer End Class Public Class ScanResult Public Property FilePath As String Public Property Matches As List(Of SensitiveDataMatch) = New List(Of SensitiveDataMatch)() Public ReadOnly Property ContainsSensitiveData As Boolean Get Return Matches.Count > 0 End Get End Property Public Function GetSummary() As Dictionary(Of String, Integer) Return Matches.GroupBy(Function(m) m.PatternType) _ .ToDictionary(Function(g) g.Key, Function(g) g.Count()) End Function End Class Public Class DocumentScanner Private ReadOnly _patterns As Dictionary(Of String, String) Public Sub New() _patterns = New Dictionary(Of String, String) From { {"Social Security Number", "\b\d{3}-\d{2}-\d{4}\b"}, {"Credit Card", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"}, {"Email Address", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"}, {"Phone Number", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"}, {"Date of Birth Pattern", "\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b"} } End Sub Public Function ScanDocument(filePath As String) As ScanResult Dim result As New ScanResult With {.FilePath = filePath} Dim pdf As PdfDocument = PdfDocument.FromFile(filePath) ' Scan each page individually to track location For pageIndex As Integer = 0 To pdf.PageCount - 1 Dim pageText As String = pdf.ExtractTextFromPage(pageIndex) For Each pattern In _patterns Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase) Dim matches As MatchCollection = regex.Matches(pageText) For Each match As Match In matches result.Matches.Add(New SensitiveDataMatch With { .PatternType = pattern.Key, .Value = MaskValue(match.Value, pattern.Key), .PageNumber = pageIndex + 1 }) Next Next Next Return result End Function ' Partially mask values for safe storage Private Function MaskValue(value As String, patternType As String) As String If patternType = "Social Security Number" AndAlso value.Length >= 4 Then Return "XXX-XX-" & value.Substring(value.Length - 4) End If If patternType = "Credit Card" AndAlso value.Length >= 4 Then Return "****-****-****-" & value.Substring(value.Length - 4) End If If patternType = "Email Address" Then Dim atIndex As Integer = value.IndexOf("@"c) If atIndex > 2 Then Return value.Substring(0, 2) & "***" & value.Substring(atIndex) End If End If Return If(value.Length > 4, value.Substring(0, 2) & "***", "****") End Function Public Sub ScanAndRedact(inputPath As String, outputPath As String) ' First scan to identify sensitive data Dim scanResult As ScanResult = ScanDocument(inputPath) If Not scanResult.ContainsSensitiveData Then Return End If ' Load document for redaction Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath) ' Extract unique actual values (not masked) for redaction Dim fullText As String = pdf.ExtractAllText() Dim valuesToRedact As New HashSet(Of String)() For Each pattern In _patterns Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase) For Each match As Match In regex.Matches(fullText) valuesToRedact.Add(match.Value) Next Next ' Apply redactions For Each value As String In valuesToRedact pdf.RedactTextOnAllPages(value) Next pdf.SaveAs(outputPath) End Sub End Class ' Usage Module Program Sub Main() Dim scanner As New DocumentScanner() ' Scan only (for audit purposes) Dim result As ScanResult = scanner.ScanDocument("application-form.pdf") Dim summary = result.GetSummary() ' Scan and redact in one operation scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf") End Sub End Module $vbLabelText $csharpLabel 掃描器可以顯示在任何修改發生之前存在的機密資訊。 這有助於合規工作流程,在此類工作流程中,您需要記錄發現和移除的物品。 遮罩功能可確保日誌檔案和報告本身不會成為資料外洩的來源。 如何在PDF檔案中隱藏特定區域? 文字編輯可以有效地處理基於字元的內容,但 PDF 文件中通常包含文字匹配無法處理的敏感資訊。 簽名、照片、手寫註解、印章和圖形元素需要採用不同的處理方法。 基於區域的編輯功能可讓您透過座標指定矩形區域,並永久遮蔽這些區域內的所有內容。 IronPDF 使用RectangleF結構來定義編輯區域。 您需要指定左上角的 X 和 Y 座標,然後指定區域的寬度和高度。 座標以頁面左下角為基準進行測量,這與 PDF 規範的座標系相符。 輸入 一份已簽署的協議文件,其中包含手寫簽名和照片身份證,需要使用基於坐標的區域定位進行編輯。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-region-basic.cs using IronPdf; using IronSoftware.Drawing; // Load a document with signature blocks and photos PdfDocument pdf = PdfDocument.FromFile("signed-agreement.pdf"); // Define a region for a signature block // Located 100 points from left, 650 points from bottom // Width of 200 points, height of 50 points RectangleF signatureRegion = new RectangleF(100, 650, 200, 50); // Redact the signature region on all pages pdf.RedactRegionsOnAllPages(signatureRegion); // Define a region for a photo ID in the upper right RectangleF photoRegion = new RectangleF(450, 700, 100, 120); pdf.RedactRegionsOnAllPages(photoRegion); // Save the document with regions redacted pdf.SaveAs("signed-agreement-redacted.pdf"); Imports IronPdf Imports IronSoftware.Drawing ' Load a document with signature blocks and photos Dim pdf As PdfDocument = PdfDocument.FromFile("signed-agreement.pdf") ' Define a region for a signature block ' Located 100 points from left, 650 points from bottom ' Width of 200 points, height of 50 points Dim signatureRegion As New RectangleF(100, 650, 200, 50) ' Redact the signature region on all pages pdf.RedactRegionsOnAllPages(signatureRegion) ' Define a region for a photo ID in the upper right Dim photoRegion As New RectangleF(450, 700, 100, 120) pdf.RedactRegionsOnAllPages(photoRegion) ' Save the document with regions redacted pdf.SaveAs("signed-agreement-redacted.pdf") $vbLabelText $csharpLabel 這段程式碼使用RectangleF結構來定義用於編輯的矩形區域。 簽名區域位於座標 (100, 650) 處,面積為 200x50 像素;照片區域位於 (450, 700) 處,面積為 100x120 像素。 RedactRegionsOnAllPages方法會在所有頁面上的這些區域套用黑色矩形。 範例輸出 確定正確的座標通常需要一些實驗或測量。 PDF 頁面通常使用座標系,其中一點等於 1/72 英吋。 標準的美國信紙寬度為 612 磅,高度為 792 磅。 A4 紙張的尺寸約為 595 磅 x 842 磅。 使用能夠隨著遊標移動顯示座標的 PDF 檢視工具會很有幫助,或者您也可以透過程式設計擷取頁面尺寸: :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-region-dimensions.cs using IronPdf; using IronSoftware.Drawing; PdfDocument pdf = PdfDocument.FromFile("form-document.pdf"); // Get dimensions of the first page var pageInfo = pdf.Pages[0]; // Calculate regions relative to page dimensions // Redact the bottom quarter of the page where signatures appear float signatureAreaHeight = (float)(pageInfo.Height / 4); RectangleF bottomQuarter = new RectangleF( 0, // Start at left edge 0, // Start at bottom (float)pageInfo.Width, // Full page width signatureAreaHeight // Quarter of page height ); pdf.RedactRegionsOnAllPages(bottomQuarter); // Redact a header area at the top containing letterhead with address float headerHeight = 100; RectangleF headerArea = new RectangleF( 0, (float)(pageInfo.Height - headerHeight), // Position from bottom (float)pageInfo.Width, headerHeight ); pdf.RedactRegionsOnAllPages(headerArea); pdf.SaveAs("form-document-redacted.pdf"); Imports IronPdf Imports IronSoftware.Drawing Dim pdf As PdfDocument = PdfDocument.FromFile("form-document.pdf") ' Get dimensions of the first page Dim pageInfo = pdf.Pages(0) ' Calculate regions relative to page dimensions ' Redact the bottom quarter of the page where signatures appear Dim signatureAreaHeight As Single = CSng(pageInfo.Height / 4) Dim bottomQuarter As New RectangleF(0, 0, CSng(pageInfo.Width), signatureAreaHeight) pdf.RedactRegionsOnAllPages(bottomQuarter) ' Redact a header area at the top containing letterhead with address Dim headerHeight As Single = 100 Dim headerArea As New RectangleF(0, CSng(pageInfo.Height - headerHeight), CSng(pageInfo.Width), headerHeight) pdf.RedactRegionsOnAllPages(headerArea) pdf.SaveAs("form-document-redacted.pdf") $vbLabelText $csharpLabel 如何跨不同頁面編輯多個區域? 複雜的文件通常需要在不同的頁面上對不同的區域進行編輯。 多頁表格的簽名欄位置可能不同,或者不同頁面可能在不同位置包含照片、印章或其他圖形元素。 IronPDF 包含針對特定頁面區域進行編輯的方法。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-multiple-regions.cs using IronPdf; using IronSoftware.Drawing; PdfDocument pdf = PdfDocument.FromFile("multi-page-application.pdf"); // Define page-specific redaction regions // Page 1: Cover page with applicant photo RectangleF page1Photo = new RectangleF(450, 600, 120, 150); pdf.RedactRegionOnPage(0, page1Photo); // Page 2: Personal information section RectangleF page2InfoBlock = new RectangleF(50, 400, 250, 200); pdf.RedactRegionOnPage(1, page2InfoBlock); // Pages 3-5: Signature lines at the same position RectangleF signatureLine = new RectangleF(100, 100, 200, 40); int[] signaturePages = { 2, 3, 4 }; pdf.RedactRegionOnPages(signaturePages, signatureLine); // Page 6: Multiple regions - notary stamp and witness signature RectangleF notaryStamp = new RectangleF(400, 150, 150, 150); RectangleF witnessSignature = new RectangleF(100, 150, 200, 40); pdf.RedactRegionOnPage(5, notaryStamp); pdf.RedactRegionOnPage(5, witnessSignature); pdf.SaveAs("multi-page-application-redacted.pdf"); Imports IronPdf Imports IronSoftware.Drawing Dim pdf As PdfDocument = PdfDocument.FromFile("multi-page-application.pdf") ' Define page-specific redaction regions ' Page 1: Cover page with applicant photo Dim page1Photo As New RectangleF(450, 600, 120, 150) pdf.RedactRegionOnPage(0, page1Photo) ' Page 2: Personal information section Dim page2InfoBlock As New RectangleF(50, 400, 250, 200) pdf.RedactRegionOnPage(1, page2InfoBlock) ' Pages 3-5: Signature lines at the same position Dim signatureLine As New RectangleF(100, 100, 200, 40) Dim signaturePages As Integer() = {2, 3, 4} pdf.RedactRegionOnPages(signaturePages, signatureLine) ' Page 6: Multiple regions - notary stamp and witness signature Dim notaryStamp As New RectangleF(400, 150, 150, 150) Dim witnessSignature As New RectangleF(100, 150, 200, 40) pdf.RedactRegionOnPage(5, notaryStamp) pdf.RedactRegionOnPage(5, witnessSignature) pdf.SaveAs("multi-page-application-redacted.pdf") $vbLabelText $csharpLabel 佈局一致的文件受益於可重複使用的區域定義: using IronPdf; using IronSoftware.Drawing; public class FormRegions { // Standard form regions based on common templates public static RectangleF HeaderLogo => new RectangleF(20, 720, 150, 60); public static RectangleF SignatureBlock => new RectangleF(72, 72, 200, 50); public static RectangleF DateField => new RectangleF(400, 72, 120, 20); public static RectangleF PhotoId => new RectangleF(480, 650, 100, 130); public static RectangleF AddressBlock => new RectangleF(72, 600, 250, 80); } class Program { static void Main() { PdfDocument pdf = PdfDocument.FromFile("standard-form.pdf"); // Apply standard redactions using predefined regions pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock); pdf.RedactRegionsOnAllPages(FormRegions.DateField); pdf.RedactRegionOnPage(0, FormRegions.PhotoId); pdf.SaveAs("standard-form-redacted.pdf"); } } using IronPdf; using IronSoftware.Drawing; public class FormRegions { // Standard form regions based on common templates public static RectangleF HeaderLogo => new RectangleF(20, 720, 150, 60); public static RectangleF SignatureBlock => new RectangleF(72, 72, 200, 50); public static RectangleF DateField => new RectangleF(400, 72, 120, 20); public static RectangleF PhotoId => new RectangleF(480, 650, 100, 130); public static RectangleF AddressBlock => new RectangleF(72, 600, 250, 80); } class Program { static void Main() { PdfDocument pdf = PdfDocument.FromFile("standard-form.pdf"); // Apply standard redactions using predefined regions pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock); pdf.RedactRegionsOnAllPages(FormRegions.DateField); pdf.RedactRegionOnPage(0, FormRegions.PhotoId); pdf.SaveAs("standard-form-redacted.pdf"); } } Imports IronPdf Imports IronSoftware.Drawing Public Class FormRegions ' Standard form regions based on common templates Public Shared ReadOnly Property HeaderLogo As RectangleF Get Return New RectangleF(20, 720, 150, 60) End Get End Property Public Shared ReadOnly Property SignatureBlock As RectangleF Get Return New RectangleF(72, 72, 200, 50) End Get End Property Public Shared ReadOnly Property DateField As RectangleF Get Return New RectangleF(400, 72, 120, 20) End Get End Property Public Shared ReadOnly Property PhotoId As RectangleF Get Return New RectangleF(480, 650, 100, 130) End Get End Property Public Shared ReadOnly Property AddressBlock As RectangleF Get Return New RectangleF(72, 600, 250, 80) End Get End Property End Class Module Program Sub Main() Dim pdf As PdfDocument = PdfDocument.FromFile("standard-form.pdf") ' Apply standard redactions using predefined regions pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock) pdf.RedactRegionsOnAllPages(FormRegions.DateField) pdf.RedactRegionOnPage(0, FormRegions.PhotoId) pdf.SaveAs("standard-form-redacted.pdf") End Sub End Module $vbLabelText $csharpLabel 如何刪除可能洩漏敏感資訊的元資料? PDF 元資料是資訊外洩中一個經常被忽視的來源。 每個 PDF 檔案都包含一些可以洩露敏感資訊的屬性:作者姓名和使用者名稱、用於建立文件的軟體、建立和修改時間戳記、原始檔案名稱、修訂歷史記錄以及各種應用程式新增的自訂屬性。 在對外共享文件之前,去除或清理這些元資料至關重要。 有關元資料操作的全面概述,請參閱我們的元資料操作指南。 IronPDF 透過MetaData屬性公開文件元數據,讓您可以讀取現有值、修改它們或將其完全刪除。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/view-metadata.cs using IronPdf; // Load a document containing sensitive metadata PdfDocument pdf = PdfDocument.FromFile("internal-report.pdf"); // Access current metadata properties string author = pdf.MetaData.Author; string title = pdf.MetaData.Title; string subject = pdf.MetaData.Subject; string keywords = pdf.MetaData.Keywords; string creator = pdf.MetaData.Creator; string producer = pdf.MetaData.Producer; DateTime? creationDate = pdf.MetaData.CreationDate; DateTime? modifiedDate = pdf.MetaData.ModifiedDate; // Get all metadata keys including custom properties var allKeys = pdf.MetaData.Keys(); Imports IronPdf ' Load a document containing sensitive metadata Dim pdf As PdfDocument = PdfDocument.FromFile("internal-report.pdf") ' Access current metadata properties Dim author As String = pdf.MetaData.Author Dim title As String = pdf.MetaData.Title Dim subject As String = pdf.MetaData.Subject Dim keywords As String = pdf.MetaData.Keywords Dim creator As String = pdf.MetaData.Creator Dim producer As String = pdf.MetaData.Producer Dim creationDate As DateTime? = pdf.MetaData.CreationDate Dim modifiedDate As DateTime? = pdf.MetaData.ModifiedDate ' Get all metadata keys including custom properties Dim allKeys = pdf.MetaData.Keys() $vbLabelText $csharpLabel 在分發前移除敏感元資料: 輸入 一份內部備忘錄,其中包含嵌入式元數據,例如作者姓名、創建時間戳和自訂屬性,這些內容可能會洩露敏感的組織資訊。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/remove-metadata.cs using IronPdf; using System; PdfDocument pdf = PdfDocument.FromFile("confidential-memo.pdf"); // Replace identifying metadata with generic values pdf.MetaData.Author = "Organization Name"; pdf.MetaData.Creator = "Document System"; pdf.MetaData.Producer = ""; pdf.MetaData.Title = "Public Document"; pdf.MetaData.Subject = ""; pdf.MetaData.Keywords = ""; // Normalize dates to remove timing information pdf.MetaData.CreationDate = DateTime.Now; pdf.MetaData.ModifiedDate = DateTime.Now; // Remove specific custom metadata keys pdf.MetaData.RemoveMetaDataKey("OriginalFilename"); pdf.MetaData.RemoveMetaDataKey("LastSavedBy"); pdf.MetaData.RemoveMetaDataKey("Company"); pdf.MetaData.RemoveMetaDataKey("Manager"); // Remove custom properties added by applications try { pdf.MetaData.CustomProperties.Remove("SourcePath"); } catch { } pdf.SaveAs("confidential-memo-cleaned.pdf"); Imports IronPdf Imports System Dim pdf As PdfDocument = PdfDocument.FromFile("confidential-memo.pdf") ' Replace identifying metadata with generic values pdf.MetaData.Author = "Organization Name" pdf.MetaData.Creator = "Document System" pdf.MetaData.Producer = "" pdf.MetaData.Title = "Public Document" pdf.MetaData.Subject = "" pdf.MetaData.Keywords = "" ' Normalize dates to remove timing information pdf.MetaData.CreationDate = DateTime.Now pdf.MetaData.ModifiedDate = DateTime.Now ' Remove specific custom metadata keys pdf.MetaData.RemoveMetaDataKey("OriginalFilename") pdf.MetaData.RemoveMetaDataKey("LastSavedBy") pdf.MetaData.RemoveMetaDataKey("Company") pdf.MetaData.RemoveMetaDataKey("Manager") ' Remove custom properties added by applications Try pdf.MetaData.CustomProperties.Remove("SourcePath") Catch End Try pdf.SaveAs("confidential-memo-cleaned.pdf") $vbLabelText $csharpLabel 此程式碼將標識元資料欄位替換為通用值,將時間戳規範化為當前日期,並刪除應用程式可能新增的自訂元資料鍵。 RemoveMetaDataKey方法針對的是諸如"OriginalFilename"和"LastSavedBy"之類的特定屬性,這些屬性可能會洩露內部資訊。 範例輸出 對批量操作進行徹底的元資料清理需要係統化的方法: using IronPdf; using System; using System.Collections.Generic; public class MetadataCleaner { private readonly string _defaultAuthor; private readonly string _defaultCreator; public MetadataCleaner(string organizationName) { _defaultAuthor = organizationName; _defaultCreator = $"{organizationName} Document System"; } public void CleanMetadata(PdfDocument pdf) { // Replace standard metadata fields pdf.MetaData.Author = _defaultAuthor; pdf.MetaData.Creator = _defaultCreator; pdf.MetaData.Producer = ""; pdf.MetaData.Subject = ""; pdf.MetaData.Keywords = ""; // Normalize timestamps DateTime now = DateTime.Now; pdf.MetaData.CreationDate = now; pdf.MetaData.ModifiedDate = now; // Get all keys and remove potentially sensitive ones List<string> keysToRemove = new List<string>(); foreach (string key in pdf.MetaData.Keys()) { // Keep only essential keys if (!IsEssentialKey(key)) { keysToRemove.Add(key); } } foreach (string key in keysToRemove) { pdf.MetaData.RemoveMetaDataKey(key); } } private bool IsEssentialKey(string key) { // Keep only the basic display properties string[] essentialKeys = { "Title", "Author", "CreationDate", "ModifiedDate" }; foreach (string essential in essentialKeys) { if (key.Equals(essential, StringComparison.OrdinalIgnoreCase)) { return true; } } return false; } } // Usage class Program { static void Main() { MetadataCleaner cleaner = new MetadataCleaner("Acme Corporation"); PdfDocument pdf = PdfDocument.FromFile("report.pdf"); cleaner.CleanMetadata(pdf); pdf.SaveAs("report-clean.pdf"); } } using IronPdf; using System; using System.Collections.Generic; public class MetadataCleaner { private readonly string _defaultAuthor; private readonly string _defaultCreator; public MetadataCleaner(string organizationName) { _defaultAuthor = organizationName; _defaultCreator = $"{organizationName} Document System"; } public void CleanMetadata(PdfDocument pdf) { // Replace standard metadata fields pdf.MetaData.Author = _defaultAuthor; pdf.MetaData.Creator = _defaultCreator; pdf.MetaData.Producer = ""; pdf.MetaData.Subject = ""; pdf.MetaData.Keywords = ""; // Normalize timestamps DateTime now = DateTime.Now; pdf.MetaData.CreationDate = now; pdf.MetaData.ModifiedDate = now; // Get all keys and remove potentially sensitive ones List<string> keysToRemove = new List<string>(); foreach (string key in pdf.MetaData.Keys()) { // Keep only essential keys if (!IsEssentialKey(key)) { keysToRemove.Add(key); } } foreach (string key in keysToRemove) { pdf.MetaData.RemoveMetaDataKey(key); } } private bool IsEssentialKey(string key) { // Keep only the basic display properties string[] essentialKeys = { "Title", "Author", "CreationDate", "ModifiedDate" }; foreach (string essential in essentialKeys) { if (key.Equals(essential, StringComparison.OrdinalIgnoreCase)) { return true; } } return false; } } // Usage class Program { static void Main() { MetadataCleaner cleaner = new MetadataCleaner("Acme Corporation"); PdfDocument pdf = PdfDocument.FromFile("report.pdf"); cleaner.CleanMetadata(pdf); pdf.SaveAs("report-clean.pdf"); } } Imports IronPdf Imports System Imports System.Collections.Generic Public Class MetadataCleaner Private ReadOnly _defaultAuthor As String Private ReadOnly _defaultCreator As String Public Sub New(organizationName As String) _defaultAuthor = organizationName _defaultCreator = $"{organizationName} Document System" End Sub Public Sub CleanMetadata(pdf As PdfDocument) ' Replace standard metadata fields pdf.MetaData.Author = _defaultAuthor pdf.MetaData.Creator = _defaultCreator pdf.MetaData.Producer = "" pdf.MetaData.Subject = "" pdf.MetaData.Keywords = "" ' Normalize timestamps Dim now As DateTime = DateTime.Now pdf.MetaData.CreationDate = now pdf.MetaData.ModifiedDate = now ' Get all keys and remove potentially sensitive ones Dim keysToRemove As New List(Of String)() For Each key As String In pdf.MetaData.Keys() ' Keep only essential keys If Not IsEssentialKey(key) Then keysToRemove.Add(key) End If Next For Each key As String In keysToRemove pdf.MetaData.RemoveMetaDataKey(key) Next End Sub Private Function IsEssentialKey(key As String) As Boolean ' Keep only the basic display properties Dim essentialKeys As String() = {"Title", "Author", "CreationDate", "ModifiedDate"} For Each essential As String In essentialKeys If key.Equals(essential, StringComparison.OrdinalIgnoreCase) Then Return True End If Next Return False End Function End Class ' Usage Class Program Shared Sub Main() Dim cleaner As New MetadataCleaner("Acme Corporation") Dim pdf As PdfDocument = PdfDocument.FromFile("report.pdf") cleaner.CleanMetadata(pdf) pdf.SaveAs("report-clean.pdf") End Sub End Class $vbLabelText $csharpLabel 如何清理 PDF 文件以移除嵌入的腳本和隱藏的威脅? PDF 清理可以解決除可見內容和元資料之外的安全性問題。 PDF 檔案可能包含 JavaScript 程式碼、嵌入式執行檔、觸發外部連線的表單操作以及其他潛在的惡意元素。 這些功能的存在有其合法用途,例如互動式表單和多媒體內容,但它們也造成了攻擊途徑。 對 PDF 檔案進行清潔可以移除這些活動元素,同時保留視覺內容。 有關消毒方法的更多詳細信息,請參閱我們的消毒 PDF 操作指南。 IronPDF 的Cleaner類別透過優雅的方法處理清理工作:將 PDF 轉換為影像格式,然後再轉換回來。 過程會移除 JavaScript、嵌入物件、表單操作和註釋,同時保持視覺外觀不變。 該庫提供了兩種具有不同特點的清理方法。 輸入 從外部來源接收的 PDF 文件可能包含 JavaScript、嵌入式物件或其他潛在的惡意活動內容。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/sanitize-pdf.cs using IronPdf; // Load a PDF that may contain active content PdfDocument pdf = PdfDocument.FromFile("received-document.pdf"); // Sanitize using SVG conversion // Faster processing, results in searchable text, slight layout variations possible PdfDocument sanitizedSvg = Cleaner.SanitizeWithSvg(pdf); sanitizedSvg.SaveAs("sanitized-svg.pdf"); // Sanitize using Bitmap conversion // Slower processing, text becomes image (not searchable), exact visual reproduction PdfDocument sanitizedBitmap = Cleaner.SanitizeWithBitmap(pdf); sanitizedBitmap.SaveAs("sanitized-bitmap.pdf"); Imports IronPdf ' Load a PDF that may contain active content Dim pdf As PdfDocument = PdfDocument.FromFile("received-document.pdf") ' Sanitize using SVG conversion ' Faster processing, results in searchable text, slight layout variations possible Dim sanitizedSvg As PdfDocument = Cleaner.SanitizeWithSvg(pdf) sanitizedSvg.SaveAs("sanitized-svg.pdf") ' Sanitize using Bitmap conversion ' Slower processing, text becomes image (not searchable), exact visual reproduction Dim sanitizedBitmap As PdfDocument = Cleaner.SanitizeWithBitmap(pdf) sanitizedBitmap.SaveAs("sanitized-bitmap.pdf") $vbLabelText $csharpLabel 這段程式碼示範了 IronPDF 的Cleaner類別提供的兩種清理方法。 SanitizeWithSvg透過 SVG 中間格式轉換 PDF,保留可搜尋文本,同時刪除活動內容。 SanitizeWithBitmap首先將頁面轉換為圖像,產生精確的視覺副本,但文字渲染為不可搜尋的圖形。 範例輸出 SVG 方法速度更快,並且能將文字保留為可搜尋的內容,因此適用於需要保持索引或可存取性的文件。 點陣圖方法會產生精確的視覺副本,但會將文字轉換為圖像,從而阻止文字選擇和搜尋。 根據您對輸出文件的要求進行選擇。 您也可以在清理過程中套用渲染選項來調整輸出: :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/sanitize-with-options.cs using IronPdf; // Load the potentially unsafe document PdfDocument pdf = PdfDocument.FromFile("untrusted-source.pdf"); // Configure rendering options for sanitization var renderOptions = new ChromePdfRenderOptions { MarginTop = 10, MarginBottom = 10, MarginLeft = 10, MarginRight = 10 }; // Sanitize with custom options PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf, renderOptions); sanitized.SaveAs("untrusted-source-safe.pdf"); Imports IronPdf ' Load the potentially unsafe document Dim pdf As PdfDocument = PdfDocument.FromFile("untrusted-source.pdf") ' Configure rendering options for sanitization Dim renderOptions As New ChromePdfRenderOptions With { .MarginTop = 10, .MarginBottom = 10, .MarginLeft = 10, .MarginRight = 10 } ' Sanitize with custom options Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf, renderOptions) sanitized.SaveAs("untrusted-source-safe.pdf") $vbLabelText $csharpLabel 高安全等級的環境通常需要將消毒與其他防護措施結合: using IronPdf; using System; public class SecureDocumentProcessor { public PdfDocument ProcessUntrustedDocument(string inputPath) { // Load the document PdfDocument original = PdfDocument.FromFile(inputPath); // Step 1: Sanitize to remove active content PdfDocument sanitized = Cleaner.SanitizeWithSvg(original); // Step 2: Clean metadata sanitized.MetaData.Author = "Processed Document"; sanitized.MetaData.Creator = "Secure Processor"; sanitized.MetaData.Producer = ""; sanitized.MetaData.CreationDate = DateTime.Now; sanitized.MetaData.ModifiedDate = DateTime.Now; // Remove all custom metadata foreach (string key in sanitized.MetaData.Keys()) { if (key != "Title" && key != "Author" && key != "CreationDate" && key != "ModifiedDate") { sanitized.MetaData.RemoveMetaDataKey(key); } } return sanitized; } } // Usage class Program { static void Main() { SecureDocumentProcessor processor = new SecureDocumentProcessor(); PdfDocument safe = processor.ProcessUntrustedDocument("email-attachment.pdf"); safe.SaveAs("email-attachment-safe.pdf"); } } using IronPdf; using System; public class SecureDocumentProcessor { public PdfDocument ProcessUntrustedDocument(string inputPath) { // Load the document PdfDocument original = PdfDocument.FromFile(inputPath); // Step 1: Sanitize to remove active content PdfDocument sanitized = Cleaner.SanitizeWithSvg(original); // Step 2: Clean metadata sanitized.MetaData.Author = "Processed Document"; sanitized.MetaData.Creator = "Secure Processor"; sanitized.MetaData.Producer = ""; sanitized.MetaData.CreationDate = DateTime.Now; sanitized.MetaData.ModifiedDate = DateTime.Now; // Remove all custom metadata foreach (string key in sanitized.MetaData.Keys()) { if (key != "Title" && key != "Author" && key != "CreationDate" && key != "ModifiedDate") { sanitized.MetaData.RemoveMetaDataKey(key); } } return sanitized; } } // Usage class Program { static void Main() { SecureDocumentProcessor processor = new SecureDocumentProcessor(); PdfDocument safe = processor.ProcessUntrustedDocument("email-attachment.pdf"); safe.SaveAs("email-attachment-safe.pdf"); } } Imports IronPdf Imports System Public Class SecureDocumentProcessor Public Function ProcessUntrustedDocument(inputPath As String) As PdfDocument ' Load the document Dim original As PdfDocument = PdfDocument.FromFile(inputPath) ' Step 1: Sanitize to remove active content Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(original) ' Step 2: Clean metadata sanitized.MetaData.Author = "Processed Document" sanitized.MetaData.Creator = "Secure Processor" sanitized.MetaData.Producer = "" sanitized.MetaData.CreationDate = DateTime.Now sanitized.MetaData.ModifiedDate = DateTime.Now ' Remove all custom metadata For Each key As String In sanitized.MetaData.Keys() If key <> "Title" AndAlso key <> "Author" AndAlso key <> "CreationDate" AndAlso key <> "ModifiedDate" Then sanitized.MetaData.RemoveMetaDataKey(key) End If Next Return sanitized End Function End Class ' Usage Module Program Sub Main() Dim processor As New SecureDocumentProcessor() Dim safe As PdfDocument = processor.ProcessUntrustedDocument("email-attachment.pdf") safe.SaveAs("email-attachment-safe.pdf") End Sub End Module $vbLabelText $csharpLabel 如何掃描PDF檔案以查找安全漏洞? 在處理或清理文件之前,您可能需要評估其中可能包含的潛在威脅。 IronPDF 的Cleaner.ScanPdf方法使用 YARA 規則檢查文檔,YARA 規則是惡意軟體分析和威脅檢測中常用的模式定義。 掃描結果可辨識與惡意PDF檔案相關的特徵。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/scan-vulnerabilities.cs using IronPdf; // Load the document to scan PdfDocument pdf = PdfDocument.FromFile("suspicious-document.pdf"); // Scan using default YARA rules CleanerScanResult scanResult = Cleaner.ScanPdf(pdf); // Check the scan results bool threatsDetected = scanResult.IsDetected; int riskCount = scanResult.Risks.Count; // Process identified risks if (scanResult.IsDetected) { foreach (var risk in scanResult.Risks) { // Handle each identified risk } // Sanitize the document before use PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf); sanitized.SaveAs("suspicious-document-safe.pdf"); } Imports IronPdf ' Load the document to scan Dim pdf As PdfDocument = PdfDocument.FromFile("suspicious-document.pdf") ' Scan using default YARA rules Dim scanResult As CleanerScanResult = Cleaner.ScanPdf(pdf) ' Check the scan results Dim threatsDetected As Boolean = scanResult.IsDetected Dim riskCount As Integer = scanResult.Risks.Count ' Process identified risks If scanResult.IsDetected Then For Each risk In scanResult.Risks ' Handle each identified risk Next ' Sanitize the document before use Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf) sanitized.SaveAs("suspicious-document-safe.pdf") End If $vbLabelText $csharpLabel 您可以提供自訂的 YARA 規則檔案以滿足特殊的檢測需求。 具有特定威脅模型或合規性需求的組織通常會維護自己的規則集,以針對特定的漏洞模式。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/scan-custom-yara.cs using IronPdf; PdfDocument pdf = PdfDocument.FromFile("incoming-document.pdf"); // Scan with custom YARA rules string[] customYaraFiles = { "corporate-rules.yar", "industry-specific.yar" }; CleanerScanResult result = Cleaner.ScanPdf(pdf, customYaraFiles); if (result.IsDetected) { // Document triggered custom rules and requires review or sanitization PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf); sanitized.SaveAs("incoming-document-safe.pdf"); } Imports IronPdf Dim pdf As PdfDocument = PdfDocument.FromFile("incoming-document.pdf") ' Scan with custom YARA rules Dim customYaraFiles As String() = {"corporate-rules.yar", "industry-specific.yar"} Dim result As CleanerScanResult = Cleaner.ScanPdf(pdf, customYaraFiles) If result.IsDetected Then ' Document triggered custom rules and requires review or sanitization Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf) sanitized.SaveAs("incoming-document-safe.pdf") End If $vbLabelText $csharpLabel 將掃描功能整合到文件接收工作流程中,有助於實現安全決策的自動化: using IronPdf; using System; using System.IO; public enum DocumentSafetyLevel { Safe, Suspicious, Dangerous } public class DocumentSecurityGateway { public DocumentSafetyLevel EvaluateDocument(string filePath) { PdfDocument pdf = PdfDocument.FromFile(filePath); CleanerScanResult scan = Cleaner.ScanPdf(pdf); if (!scan.IsDetected) { return DocumentSafetyLevel.Safe; } // Evaluate severity based on number of risks if (scan.Risks.Count > 5) { return DocumentSafetyLevel.Dangerous; } return DocumentSafetyLevel.Suspicious; } public PdfDocument ProcessIncomingDocument(string filePath, string outputDirectory) { DocumentSafetyLevel safety = EvaluateDocument(filePath); string fileName = Path.GetFileName(filePath); switch (safety) { case DocumentSafetyLevel.Safe: return PdfDocument.FromFile(filePath); case DocumentSafetyLevel.Suspicious: PdfDocument suspicious = PdfDocument.FromFile(filePath); return Cleaner.SanitizeWithSvg(suspicious); case DocumentSafetyLevel.Dangerous: throw new SecurityException($"Document {fileName} contains dangerous content"); default: throw new InvalidOperationException("Unknown safety level"); } } } using IronPdf; using System; using System.IO; public enum DocumentSafetyLevel { Safe, Suspicious, Dangerous } public class DocumentSecurityGateway { public DocumentSafetyLevel EvaluateDocument(string filePath) { PdfDocument pdf = PdfDocument.FromFile(filePath); CleanerScanResult scan = Cleaner.ScanPdf(pdf); if (!scan.IsDetected) { return DocumentSafetyLevel.Safe; } // Evaluate severity based on number of risks if (scan.Risks.Count > 5) { return DocumentSafetyLevel.Dangerous; } return DocumentSafetyLevel.Suspicious; } public PdfDocument ProcessIncomingDocument(string filePath, string outputDirectory) { DocumentSafetyLevel safety = EvaluateDocument(filePath); string fileName = Path.GetFileName(filePath); switch (safety) { case DocumentSafetyLevel.Safe: return PdfDocument.FromFile(filePath); case DocumentSafetyLevel.Suspicious: PdfDocument suspicious = PdfDocument.FromFile(filePath); return Cleaner.SanitizeWithSvg(suspicious); case DocumentSafetyLevel.Dangerous: throw new SecurityException($"Document {fileName} contains dangerous content"); default: throw new InvalidOperationException("Unknown safety level"); } } } Imports IronPdf Imports System Imports System.IO Public Enum DocumentSafetyLevel Safe Suspicious Dangerous End Enum Public Class DocumentSecurityGateway Public Function EvaluateDocument(filePath As String) As DocumentSafetyLevel Dim pdf As PdfDocument = PdfDocument.FromFile(filePath) Dim scan As CleanerScanResult = Cleaner.ScanPdf(pdf) If Not scan.IsDetected Then Return DocumentSafetyLevel.Safe End If ' Evaluate severity based on number of risks If scan.Risks.Count > 5 Then Return DocumentSafetyLevel.Dangerous End If Return DocumentSafetyLevel.Suspicious End Function Public Function ProcessIncomingDocument(filePath As String, outputDirectory As String) As PdfDocument Dim safety As DocumentSafetyLevel = EvaluateDocument(filePath) Dim fileName As String = Path.GetFileName(filePath) Select Case safety Case DocumentSafetyLevel.Safe Return PdfDocument.FromFile(filePath) Case DocumentSafetyLevel.Suspicious Dim suspicious As PdfDocument = PdfDocument.FromFile(filePath) Return Cleaner.SanitizeWithSvg(suspicious) Case DocumentSafetyLevel.Dangerous Throw New SecurityException($"Document {fileName} contains dangerous content") Case Else Throw New InvalidOperationException("Unknown safety level") End Select End Function End Class $vbLabelText $csharpLabel 如何建構完整的脫敏和淨化流程? 生產文件處理通常需要將多種保護技術結合起來,形成一個連貫的工作流程。 完整的流程可能包括:掃描傳入文件是否有威脅、清理通過初步篩選的文件、應用文字和區域編輯、移除元數據,以及產生記錄所有操作的稽核日誌。 這個例子展示了這種綜合方法。 using IronPdf; using IronSoftware.Drawing; using System; using System.Collections.Generic; using System.IO; using System.Text.RegularExpressions; public class DocumentProcessingResult { public string OriginalFile { get; set; } public string OutputFile { get; set; } public bool WasSanitized { get; set; } public int TextRedactionsApplied { get; set; } public int RegionRedactionsApplied { get; set; } public bool MetadataCleaned { get; set; } public List<string> SensitiveDataTypesFound { get; set; } = new List<string>(); public DateTime ProcessedAt { get; set; } public bool Success { get; set; } public string ErrorMessage { get; set; } } public class ComprehensiveDocumentProcessor { // Sensitive data patterns private readonly Dictionary<string, string> _sensitivePatterns = new Dictionary<string, string> { { "SSN", @"\b\d{3}-\d{2}-\d{4}\b" }, { "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" }, { "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" }, { "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" } }; // Standard regions to redact (signature areas, photo locations) private readonly List<RectangleF> _standardRedactionRegions = new List<RectangleF> { new RectangleF(72, 72, 200, 50), // Bottom left signature new RectangleF(350, 72, 200, 50) // Bottom right signature }; private readonly string _organizationName; public ComprehensiveDocumentProcessor(string organizationName) { _organizationName = organizationName; } public DocumentProcessingResult ProcessDocument( string inputPath, string outputPath, bool sanitize = true, bool redactPatterns = true, bool redactRegions = true, bool cleanMetadata = true, List<string> additionalTermsToRedact = null) { var result = new DocumentProcessingResult { OriginalFile = inputPath, OutputFile = outputPath, ProcessedAt = DateTime.Now }; try { // Load the document PdfDocument pdf = PdfDocument.FromFile(inputPath); // Step 1: Security scan CleanerScanResult scanResult = Cleaner.ScanPdf(pdf); if (scanResult.IsDetected && scanResult.Risks.Count > 10) { throw new SecurityException("Document contains too many security risks to process"); } // Step 2: Sanitization (if needed or requested) if (sanitize || scanResult.IsDetected) { pdf = Cleaner.SanitizeWithSvg(pdf); result.WasSanitized = true; } // Step 3: Pattern-based text redaction if (redactPatterns) { string fullText = pdf.ExtractAllText(); HashSet<string> valuesToRedact = new HashSet<string>(); foreach (var pattern in _sensitivePatterns) { Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase); MatchCollection matches = regex.Matches(fullText); if (matches.Count > 0) { result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})"); foreach (Match match in matches) { valuesToRedact.Add(match.Value); } } } // Apply redactions foreach (string value in valuesToRedact) { pdf.RedactTextOnAllPages(value); result.TextRedactionsApplied++; } } // Step 4: Additional specific terms if (additionalTermsToRedact != null) { foreach (string term in additionalTermsToRedact) { pdf.RedactTextOnAllPages(term); result.TextRedactionsApplied++; } } // Step 5: Region-based redaction if (redactRegions) { foreach (RectangleF region in _standardRedactionRegions) { pdf.RedactRegionsOnAllPages(region); result.RegionRedactionsApplied++; } } // Step 6: Metadata cleaning if (cleanMetadata) { pdf.MetaData.Author = _organizationName; pdf.MetaData.Creator = $"{_organizationName} Document Processor"; pdf.MetaData.Producer = ""; pdf.MetaData.Subject = ""; pdf.MetaData.Keywords = ""; pdf.MetaData.CreationDate = DateTime.Now; pdf.MetaData.ModifiedDate = DateTime.Now; result.MetadataCleaned = true; } // Step 7: Save the processed document pdf.SaveAs(outputPath); result.Success = true; } catch (Exception ex) { result.Success = false; result.ErrorMessage = ex.Message; } return result; } } // Usage example class Program { static void Main() { var processor = new ComprehensiveDocumentProcessor("Acme Corporation"); // Process a single document with all protections var result = processor.ProcessDocument( inputPath: "customer-application.pdf", outputPath: "customer-application-redacted.pdf", sanitize: true, redactPatterns: true, redactRegions: true, cleanMetadata: true, additionalTermsToRedact: new List<string> { "Project Alpha", "Internal Use Only" } ); // Batch process multiple documents string[] inputFiles = Directory.GetFiles("incoming", "*.pdf"); foreach (string file in inputFiles) { string outputFile = Path.Combine("processed", Path.GetFileName(file)); processor.ProcessDocument(file, outputFile); } } } using IronPdf; using IronSoftware.Drawing; using System; using System.Collections.Generic; using System.IO; using System.Text.RegularExpressions; public class DocumentProcessingResult { public string OriginalFile { get; set; } public string OutputFile { get; set; } public bool WasSanitized { get; set; } public int TextRedactionsApplied { get; set; } public int RegionRedactionsApplied { get; set; } public bool MetadataCleaned { get; set; } public List<string> SensitiveDataTypesFound { get; set; } = new List<string>(); public DateTime ProcessedAt { get; set; } public bool Success { get; set; } public string ErrorMessage { get; set; } } public class ComprehensiveDocumentProcessor { // Sensitive data patterns private readonly Dictionary<string, string> _sensitivePatterns = new Dictionary<string, string> { { "SSN", @"\b\d{3}-\d{2}-\d{4}\b" }, { "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" }, { "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" }, { "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" } }; // Standard regions to redact (signature areas, photo locations) private readonly List<RectangleF> _standardRedactionRegions = new List<RectangleF> { new RectangleF(72, 72, 200, 50), // Bottom left signature new RectangleF(350, 72, 200, 50) // Bottom right signature }; private readonly string _organizationName; public ComprehensiveDocumentProcessor(string organizationName) { _organizationName = organizationName; } public DocumentProcessingResult ProcessDocument( string inputPath, string outputPath, bool sanitize = true, bool redactPatterns = true, bool redactRegions = true, bool cleanMetadata = true, List<string> additionalTermsToRedact = null) { var result = new DocumentProcessingResult { OriginalFile = inputPath, OutputFile = outputPath, ProcessedAt = DateTime.Now }; try { // Load the document PdfDocument pdf = PdfDocument.FromFile(inputPath); // Step 1: Security scan CleanerScanResult scanResult = Cleaner.ScanPdf(pdf); if (scanResult.IsDetected && scanResult.Risks.Count > 10) { throw new SecurityException("Document contains too many security risks to process"); } // Step 2: Sanitization (if needed or requested) if (sanitize || scanResult.IsDetected) { pdf = Cleaner.SanitizeWithSvg(pdf); result.WasSanitized = true; } // Step 3: Pattern-based text redaction if (redactPatterns) { string fullText = pdf.ExtractAllText(); HashSet<string> valuesToRedact = new HashSet<string>(); foreach (var pattern in _sensitivePatterns) { Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase); MatchCollection matches = regex.Matches(fullText); if (matches.Count > 0) { result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})"); foreach (Match match in matches) { valuesToRedact.Add(match.Value); } } } // Apply redactions foreach (string value in valuesToRedact) { pdf.RedactTextOnAllPages(value); result.TextRedactionsApplied++; } } // Step 4: Additional specific terms if (additionalTermsToRedact != null) { foreach (string term in additionalTermsToRedact) { pdf.RedactTextOnAllPages(term); result.TextRedactionsApplied++; } } // Step 5: Region-based redaction if (redactRegions) { foreach (RectangleF region in _standardRedactionRegions) { pdf.RedactRegionsOnAllPages(region); result.RegionRedactionsApplied++; } } // Step 6: Metadata cleaning if (cleanMetadata) { pdf.MetaData.Author = _organizationName; pdf.MetaData.Creator = $"{_organizationName} Document Processor"; pdf.MetaData.Producer = ""; pdf.MetaData.Subject = ""; pdf.MetaData.Keywords = ""; pdf.MetaData.CreationDate = DateTime.Now; pdf.MetaData.ModifiedDate = DateTime.Now; result.MetadataCleaned = true; } // Step 7: Save the processed document pdf.SaveAs(outputPath); result.Success = true; } catch (Exception ex) { result.Success = false; result.ErrorMessage = ex.Message; } return result; } } // Usage example class Program { static void Main() { var processor = new ComprehensiveDocumentProcessor("Acme Corporation"); // Process a single document with all protections var result = processor.ProcessDocument( inputPath: "customer-application.pdf", outputPath: "customer-application-redacted.pdf", sanitize: true, redactPatterns: true, redactRegions: true, cleanMetadata: true, additionalTermsToRedact: new List<string> { "Project Alpha", "Internal Use Only" } ); // Batch process multiple documents string[] inputFiles = Directory.GetFiles("incoming", "*.pdf"); foreach (string file in inputFiles) { string outputFile = Path.Combine("processed", Path.GetFileName(file)); processor.ProcessDocument(file, outputFile); } } } Imports IronPdf Imports IronSoftware.Drawing Imports System Imports System.Collections.Generic Imports System.IO Imports System.Text.RegularExpressions Public Class DocumentProcessingResult Public Property OriginalFile As String Public Property OutputFile As String Public Property WasSanitized As Boolean Public Property TextRedactionsApplied As Integer Public Property RegionRedactionsApplied As Integer Public Property MetadataCleaned As Boolean Public Property SensitiveDataTypesFound As List(Of String) = New List(Of String)() Public Property ProcessedAt As DateTime Public Property Success As Boolean Public Property ErrorMessage As String End Class Public Class ComprehensiveDocumentProcessor ' Sensitive data patterns Private ReadOnly _sensitivePatterns As Dictionary(Of String, String) = New Dictionary(Of String, String) From { {"SSN", "\b\d{3}-\d{2}-\d{4}\b"}, {"Credit Card", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"}, {"Email", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"}, {"Phone", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"} } ' Standard regions to redact (signature areas, photo locations) Private ReadOnly _standardRedactionRegions As List(Of RectangleF) = New List(Of RectangleF) From { New RectangleF(72, 72, 200, 50), ' Bottom left signature New RectangleF(350, 72, 200, 50) ' Bottom right signature } Private ReadOnly _organizationName As String Public Sub New(organizationName As String) _organizationName = organizationName End Sub Public Function ProcessDocument( inputPath As String, outputPath As String, Optional sanitize As Boolean = True, Optional redactPatterns As Boolean = True, Optional redactRegions As Boolean = True, Optional cleanMetadata As Boolean = True, Optional additionalTermsToRedact As List(Of String) = Nothing) As DocumentProcessingResult Dim result As New DocumentProcessingResult With { .OriginalFile = inputPath, .OutputFile = outputPath, .ProcessedAt = DateTime.Now } Try ' Load the document Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath) ' Step 1: Security scan Dim scanResult As CleanerScanResult = Cleaner.ScanPdf(pdf) If scanResult.IsDetected AndAlso scanResult.Risks.Count > 10 Then Throw New SecurityException("Document contains too many security risks to process") End If ' Step 2: Sanitization (if needed or requested) If sanitize OrElse scanResult.IsDetected Then pdf = Cleaner.SanitizeWithSvg(pdf) result.WasSanitized = True End If ' Step 3: Pattern-based text redaction If redactPatterns Then Dim fullText As String = pdf.ExtractAllText() Dim valuesToRedact As New HashSet(Of String)() For Each pattern In _sensitivePatterns Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase) Dim matches As MatchCollection = regex.Matches(fullText) If matches.Count > 0 Then result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})") For Each match As Match In matches valuesToRedact.Add(match.Value) Next End If Next ' Apply redactions For Each value As String In valuesToRedact pdf.RedactTextOnAllPages(value) result.TextRedactionsApplied += 1 Next End If ' Step 4: Additional specific terms If additionalTermsToRedact IsNot Nothing Then For Each term As String In additionalTermsToRedact pdf.RedactTextOnAllPages(term) result.TextRedactionsApplied += 1 Next End If ' Step 5: Region-based redaction If redactRegions Then For Each region As RectangleF In _standardRedactionRegions pdf.RedactRegionsOnAllPages(region) result.RegionRedactionsApplied += 1 Next End If ' Step 6: Metadata cleaning If cleanMetadata Then pdf.MetaData.Author = _organizationName pdf.MetaData.Creator = $"{_organizationName} Document Processor" pdf.MetaData.Producer = "" pdf.MetaData.Subject = "" pdf.MetaData.Keywords = "" pdf.MetaData.CreationDate = DateTime.Now pdf.MetaData.ModifiedDate = DateTime.Now result.MetadataCleaned = True End If ' Step 7: Save the processed document pdf.SaveAs(outputPath) result.Success = True Catch ex As Exception result.Success = False result.ErrorMessage = ex.Message End Try Return result End Function End Class ' Usage example Class Program Shared Sub Main() Dim processor As New ComprehensiveDocumentProcessor("Acme Corporation") ' Process a single document with all protections Dim result = processor.ProcessDocument( inputPath:="customer-application.pdf", outputPath:="customer-application-redacted.pdf", sanitize:=True, redactPatterns:=True, redactRegions:=True, cleanMetadata:=True, additionalTermsToRedact:=New List(Of String) From {"Project Alpha", "Internal Use Only"} ) ' Batch process multiple documents Dim inputFiles As String() = Directory.GetFiles("incoming", "*.pdf") For Each file As String In inputFiles Dim outputFile As String = Path.Combine("processed", Path.GetFileName(file)) processor.ProcessDocument(file, outputFile) Next End Sub End Class $vbLabelText $csharpLabel 輸入 客戶申請表包含多種類型的敏感數據,包括社會安全號碼、信用卡號碼、電子郵件地址和簽名欄,需要全面保護。 範例輸出 這個綜合處理器將本指南中介紹的所有技術整合到一個可設定的類別中。 它可以掃描威脅,在必要時進行清理,查找並編輯敏感模式,應用區域編輯,清理元數據,並產生詳細報告。 您可以根據具體要求調整敏感度模式、編輯區域和處理選項。 後續步驟 保護PDF文件中的敏感資訊需要採取的不僅僅是表面上的措施。 真正的編輯會從文件結構中永久刪除內容。 模式匹配可自動發現並刪除社會安全號碼、信用卡詳細資訊和電子郵件地址等資料。 基於區域的編輯可以處理簽名、照片和其他文字匹配無法處理的圖形元素。 元資料清理可以消除可能洩漏作者、時間戳記或內部文件路徑的隱藏資訊。 清理操作會移除有安全風險的嵌入式腳本和活動內容。 IronPDF透過一致、精心設計的 API 提供所有這些功能,該 API 可與 C# 和 .NET 開發實踐自然整合。 本指南中演示的方法既可以處理單一文檔,也可以擴展到批量處理數千個文件。 無論您是為醫療保健資料建立合規工作流程、準備用於取證的法律文件,還是僅僅確保內部報告可以安全地與外部共享,這些技術都構成了負責任的文件處理的基礎。 為了實現全面的安全防護,請將密文功能與密碼保護、權限控制和數位簽章結合。 準備開始建造了嗎? 下載 IronPDF並免費試用。 該庫包含一個免費的開發許可證,因此您可以在購買生產許可證之前充分評估其編輯、文字擷取和清理功能。 如果您對實施或合規工作流程有任何疑問,請聯絡我們的工程支援團隊。 常見問題解答 什麼是PDF編輯? PDF 資訊編輯是指從 PDF 文件中永久移除敏感資訊的過程。這可能包括出於隱私或合規性原因需要隱藏的文字、圖像和元資料。 如何使用 C# 對 PDF 文件中的資訊進行編輯? 您可以使用 IronPDF 透過 C# 對 PDF 文件中的資訊進行編輯。它允許您永久刪除或隱藏 PDF 文件中的文字、圖像和元數據,確保其符合隱私和合規性標準。 為什麼PDF文件內容編輯對合規性至關重要? PDF 資料編輯對於遵守 HIPAA、GDPR 和 PCI DSS 等標準至關重要,因為它有助於保護敏感資料並防止未經授權存取機密資訊。 IronPDF 能否編輯 PDF 文件中的整個區域? 是的,IronPDF 可以對 PDF 文件的整個區域進行編輯。這樣,您可以定義文件中需要出於安全目的而隱藏或刪除的特定區域。 IronPDF可以對哪些類型的資料進行編輯? IronPDF 可以從 PDF 文件中刪除各種類型的數據,包括文字、圖像和元數據,從而確保全面的數據隱私和安全。 IronPDF是否支援文件清理? 是的,IronPDF 支援文件清理,即清理 PDF 文件以刪除可能不可見但仍可能構成隱私風險的隱藏資料或元資料。 是否可以使用 IronPDF 自動進行 PDF 內容編輯? 是的,IronPDF 允許使用 C# 實現 PDF 編輯流程的自動化,從而更容易處理需要刪除敏感資料的大量文件。 IronPDF 如何確保編輯內容的永久性? IronPDF 透過從文件中永久刪除選定的文字和圖像(而不是僅僅模糊它們)來確保編輯的永久性,這意味著它們無法恢復或查看。 IronPDF 能否編輯 PDF 文件中的元資料? 是的,IronPDF 可以編輯 PDF 文件中的元數據,確保徹底刪除所有形式的敏感數據,包括隱藏數據或背景數據。 使用 IronPDF 進行 PDF 編輯有哪些好處? 使用 IronPDF 進行 PDF 編輯具有許多優勢,例如確保符合資料保護法規、增強文件安全性以及提供高效的自動化敏感資訊管理流程。 Curtis Chau 立即與工程團隊聊天 技術撰稿人 Curtis Chau 擁有電腦科學學士學位(卡爾頓大學),專長於前端開發,精通 Node.js、TypeScript、JavaScript 和 React。Curtis 對製作直覺且美觀的使用者介面充滿熱情,他喜歡使用現代化的架構,並製作結構良好且視覺上吸引人的手冊。除了開發之外,Curtis 對物聯網 (IoT) 也有濃厚的興趣,他喜歡探索整合硬體與軟體的創新方式。在空閒時間,他喜歡玩遊戲和建立 Discord bots,將他對技術的熱愛與創意結合。 準備好開始了嗎? Nuget 下載 17,386,124 | 版本: 2026.2 剛剛發布 免費 NuGet 下載 總下載量:17,386,124 查看許可證