PDF Redaction in C#;:使用 IronPDF 删除敏感数据并净化文档。 Curtis Chau 已更新:2026年2月3日 下载 IronPDF NuGet 下载 DLL 下载 Windows 安装程序 免费试用 法学硕士副本 法学硕士副本 将页面复制为 Markdown 格式,用于 LLMs 在 ChatGPT 中打开 向 ChatGPT 咨询此页面 在双子座打开 向 Gemini 询问此页面 在 Grok 中打开 向 Grok 询问此页面 打开困惑 向 Perplexity 询问有关此页面的信息 分享 在 Facebook 上分享 分享到 X(Twitter) 在 LinkedIn 上分享 复制链接 电子邮件文章 This article was translated from English: Does it need improvement? Translated View the article in English IronPDF for .NET 中的 C# .NET PDF redaction 可永久删除文档内部结构中的敏感内容,而不仅仅是在视觉上对其进行覆盖,因此无论如何复制、搜索或取证分析都无法恢复原始数据。 这远不止文字上的黑色矩形:IronPDF 提供具有 regex 模式匹配功能的 文本编辑、用于签名和图像的 基于区域的编辑、元数据剥离、用于消除嵌入式脚本的 文档消毒和漏洞扫描,为 .NET 开发人员提供了用于 HIPAA, GDPR 和 PCI DSS 合规文档保护工作流的完整工具包。 as-heading:2(TL;DR:快速入门指南) 本教程涉及在 C# .NET 中永久删除 PDF 文档中的敏感内容,包括文本模式、图像区域、元数据和嵌入式脚本。 适用对象:在医疗保健、法律、金融或政府环境中处理敏感文档的 .NET 开发人员。 您将构建的内容:使用 regex 模式匹配的文本编辑(SSN、信用卡、电子邮件)、基于坐标的签名和照片区域编辑、元数据清理、用于剥离嵌入脚本的 PDF 净化,以及基于 YARA 的漏洞扫描。 运行环境: .NET 10、.NET 8 LTS、.NET Framework 4.6.2+ 和 .NET Standard 2.0。所有操作均在本地运行,无外部依赖性。 何时使用此方法:当您需要共享文件用于法律取证、FOIA 请求或外部分发,同时确保删除的内容真正消失时。 在技术上为何重要:可视化覆盖可在 PDF 的内容流中恢复原始文本。 IronPDF 的编辑功能会删除文档结构本身的字符数据,使其无法恢复。 只需几行代码,就能重制 PDF 中的敏感文本: 立即开始使用 NuGet 创建 PDF 文件: 使用 NuGet 包管理器安装 IronPDF PM > Install-Package IronPdf 复制并运行这段代码。 using IronPdf; PdfDocument pdf = PdfDocument.FromFile("confidential-report.pdf"); pdf.RedactTextOnAllPages("CONFIDENTIAL"); pdf.SaveAs("redacted-report.pdf"); 部署到您的生产环境中进行测试 立即开始在您的项目中使用 IronPDF,免费试用! 免费试用30天 购买或注册 IronPDF 30 天试用版后,请在应用程序的开头添加许可证密钥。 IronPdf.License.LicenseKey = "KEY"; IronPdf.License.LicenseKey = "KEY"; Imports IronPdf IronPdf.License.LicenseKey = "KEY" $vbLabelText $csharpLabel 今天在您的项目中使用 IronPDF,免费试用。 第一步: 免费开始 使用 NuGet 安装 PM > Install-Package IronPdf 在 IronPDF 上查看 NuGet 快速安装。超过 1000 万次下载,它正以 C# 改变 PDF 开发。 您也可以下载 DLL 或 Windows 安装程序。 as-heading:2(目录) TL;DR: 快速入门指南 快速概述 重制 PDF 文档中的文本 What is the Difference Between True Redaction and Visual Overlay? 如何编辑 PDF 文档中所有页面的文本? 如何仅编辑特定页面上的文本? 如何自定义编辑内容的外观? 模式匹配和自动重action 如何使用正则表达式查找和重编敏感模式? 如何构建可重复使用的敏感数据扫描器? 基于区域的重action 如何编辑 PDF 中的特定区域? 如何在不同页面中对多个区域进行重制? 从 PDF 元数据中删除敏感数据 如何删除可能暴露敏感信息的元数据? .NET 中的 PDF 净化 如何对 PDF 进行消毒以去除嵌入的脚本和隐藏的威胁? 如何扫描 PDF 以查找安全漏洞? 完整的工作流程 如何构建完整的重反应和净化管道? 下一步。 True Redaction 和 Visual Overlay 之间有何区别? 了解真正的编辑和可视化覆盖之间的区别对于处理敏感文件的任何人来说都至关重要。 许多工具和手动方法都会产生编辑的效果,但实际上并不会删除底层数据。 这种错误的安全意识已经造成了许多备受瞩目的数据泄露和合规失败。 可视化叠加方法通常是在敏感内容上绘制不透明的形状。 文本在 PDF 结构中保持完整。 查看文件的人看到的是一个黑色矩形,但原始字符仍然存在于文件的内容流中。 选择页面上的所有文本、使用可访问性工具或检查原始 PDF 数据将揭示所有本应隐藏的内容。 当对方律师对经过编辑的文件进行微不足道的编辑时,法庭案件就会受到影响。 政府机构曾意外发布过一些机密信息,这些信息看似经过审查,但完全可以恢复。 真正的节录工作方式不同。 当您使用 IronPDF 的节录方法时,该库会在 PDF 的内部结构中找到指定文本并将其完全删除。 字符数据将从内容流中删除。 可视化表示被一个节录标记(通常是一个黑色矩形)所取代,但原始内容已不存在于文件的任何地方。无论如何选择、复制或取证分析,都无法恢复被永久删除的内容。 IronPDF 通过在结构层面修改 PDF 来实现真正的节录。 RedactTextOnAllPages 方法及其变体可搜索页面内容,识别匹配文本,将其从文档对象模型中移除,并可选择在内容曾经出现的位置绘制可视化指示符。 这种方法符合 NIST 等组织关于安全文档节录的指导方针。 实际意义重大。 如果您需要对外共享文档、提交文件用于法律取证、根据信息自由要求发布记录或分发报告,同时又要保护个人身份信息,那么只有真正的节录才能提供充分的保护。 在内部草稿中,可视化叠加可能足以吸引读者对某些部分的注意力,但在实际数据保护中,绝对不能相信可视化叠加。 有关其他文档安全措施,请参阅我们的加密 PDF 和数字签名指南。 如何在整个文档中以 C# Redact PDF 文本? 最常见的编辑情况是删除文档中所有特定文本的实例。 也许您需要删除报告中的人名,删除财务报表中的账号,或者在对外发布前删除内部参考代码。 IronPDF 可通过 RedactTextOnAllPages 方法直接实现这一点。 输入 一份包含个人信息(包括姓名、社会保险号和员工 ID)的员工记录文件。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-text-all-pages.cs using IronPdf; // Load the source document PdfDocument pdf = PdfDocument.FromFile("employee-records.pdf"); // Redact an employee name from the entire document pdf.RedactTextOnAllPages("John Smith"); // Redact a Social Security Number pdf.RedactTextOnAllPages("123-45-6789"); // Redact an internal employee ID pdf.RedactTextOnAllPages("EMP-2024-0042"); // Save the cleaned document pdf.SaveAs("employee-records-redacted.pdf"); Imports IronPdf ' Load the source document Dim pdf As PdfDocument = PdfDocument.FromFile("employee-records.pdf") ' Redact an employee name from the entire document pdf.RedactTextOnAllPages("John Smith") ' Redact a Social Security Number pdf.RedactTextOnAllPages("123-45-6789") ' Redact an internal employee ID pdf.RedactTextOnAllPages("EMP-2024-0042") ' Save the cleaned document pdf.SaveAs("employee-records-redacted.pdf") $vbLabelText $csharpLabel 这段代码加载了一个包含员工信息的 PDF 文件,并通过调用 RedactTextOnAllPages 为每个值删除了三条机密数据。 每次调用都会搜索文档中的每一页,并永久删除所有与员工姓名、社会保险号和内部标识符相匹配的实例。 输出示例 默认行为是在出现编辑文本的地方绘制黑色矩形,并在文档结构中用星号替换实际字符。 这样既能直观地确认进行了删节,又能确保原始内容完全消失。 在处理较长的文档或多个节录目标时,您可以高效地连锁这些调用: :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-text-list.cs using IronPdf; using System.Collections.Generic; // Load the document once PdfDocument pdf = PdfDocument.FromFile("quarterly-report.pdf"); // Define all terms that need redaction List<string> sensitiveTerms = new List<string> { "Project Titan", "Sarah Johnson", "Budget: $4.2M", "Q3-INTERNAL-2024", "sarah.johnson@company.com" }; // Redact each term foreach (string term in sensitiveTerms) { pdf.RedactTextOnAllPages(term); } // Save the result pdf.SaveAs("quarterly-report-public.pdf"); Imports IronPdf Imports System.Collections.Generic ' Load the document once Dim pdf As PdfDocument = PdfDocument.FromFile("quarterly-report.pdf") ' Define all terms that need redaction Dim sensitiveTerms As New List(Of String) From { "Project Titan", "Sarah Johnson", "Budget: $4.2M", "Q3-INTERNAL-2024", "sarah.johnson@company.com" } ' Redact each term For Each term As String In sensitiveTerms pdf.RedactTextOnAllPages(term) Next ' Save the result pdf.SaveAs("quarterly-report-public.pdf") $vbLabelText $csharpLabel 当您有一个要删除的敏感值的已知列表时,这种模式就能很好地发挥作用。 文档只需加载一次,所有节录均在内存中应用,并保存最终结果。 每个术语都是独立处理的,因此术语之间的部分匹配或格式差异不会影响其他节录。 如何仅编辑特定页面上的文本? 有时您需要更精确地控制删节的位置。 文件的封面页可能包含应保持完整的信息,或者您可能知道机密数据只出现在某些部分。 IronPDF 提供用于单页节录的 RedactTextOnPage 和用于针对多个特定页面的 RedactTextOnPages 功能。 输入 一份多页合同包,客户名称出现在签名页上,财务条款出现在整个文件的特定页面上。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-specific-pages.cs using IronPdf; // Load the document PdfDocument pdf = PdfDocument.FromFile("contract-bundle.pdf"); // Redact text only on page 1 (index 0) pdf.RedactTextOnPage(0, "Client Name: Acme Corporation"); // Redact text on pages 3, 5, and 7 (indices 2, 4, 6) int[] financialPages = { 2, 4, 6 }; pdf.RedactTextOnPages(financialPages, "Payment Terms: Net 30"); // Other pages remain untouched except for the specific redactions applied pdf.SaveAs("contract-bundle-redacted.pdf"); Imports IronPdf ' Load the document Dim pdf As PdfDocument = PdfDocument.FromFile("contract-bundle.pdf") ' Redact text only on page 1 (index 0) pdf.RedactTextOnPage(0, "Client Name: Acme Corporation") ' Redact text on pages 3, 5, and 7 (indices 2, 4, 6) Dim financialPages As Integer() = {2, 4, 6} pdf.RedactTextOnPages(financialPages, "Payment Terms: Net 30") ' Other pages remain untouched except for the specific redactions applied pdf.SaveAs("contract-bundle-redacted.pdf") $vbLabelText $csharpLabel 该代码演示了通过使用 RedactTextOnPage 对单个页面进行有针对性的编辑,以及使用 RedactTextOnPages 对多个特定页面进行有针对性的编辑。 客户名称仅从第 1 页(索引 0)中删除,而支付条款则从第 3、5 和 7 页(索引 2、4、6)中删节,其他页面保持不变。 输出示例 IronPDF 中的页面索引为零,即第一页为索引 0,第二页为索引 1,以此类推。 这符合标准的编程习惯,并与大多数开发人员对数组访问的看法一致。 在处理大型文档时,针对特定页面可提高性能。 您可以指导编辑引擎准确地查找文本,而不是扫描数百页来查找只出现在少数位置的文本。 这对于批量处理场景非常重要,在这种场景中,您可能需要处理成千上万的文档。 为获得最大吞吐量,请考虑使用 async 和多线程技术。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-large-document.cs using IronPdf; // Process a large document efficiently PdfDocument pdf = PdfDocument.FromFile("annual-report-500-pages.pdf"); // We know from document structure that: // - Executive summary with names is on pages 1-3 // - Financial data is on pages 45-60 // - Appendix with employee info is on pages 480-495 // Redact executive names from summary section for (int i = 0; i <= 2; i++) { pdf.RedactTextOnPage(i, "CEO: Robert Williams"); pdf.RedactTextOnPage(i, "CFO: Maria Garcia"); } // Redact specific financial figures from the financial section int[] financialSection = { 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 }; pdf.RedactTextOnPages(financialSection, "Net Revenue: $847M"); // Redact employee identifiers from appendix for (int i = 479; i <= 494; i++) { pdf.RedactTextOnPage(i, "Employee ID:"); } pdf.SaveAs("annual-report-public-release.pdf"); Imports IronPdf ' Process a large document efficiently Dim pdf As PdfDocument = PdfDocument.FromFile("annual-report-500-pages.pdf") ' We know from document structure that: ' - Executive summary with names is on pages 1-3 ' - Financial data is on pages 45-60 ' - Appendix with employee info is on pages 480-495 ' Redact executive names from summary section For i As Integer = 0 To 2 pdf.RedactTextOnPage(i, "CEO: Robert Williams") pdf.RedactTextOnPage(i, "CFO: Maria Garcia") Next ' Redact specific financial figures from the financial section Dim financialSection As Integer() = {44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59} pdf.RedactTextOnPages(financialSection, "Net Revenue: $847M") ' Redact employee identifiers from appendix For i As Integer = 479 To 494 pdf.RedactTextOnPage(i, "Employee ID:") Next pdf.SaveAs("annual-report-public-release.pdf") $vbLabelText $csharpLabel 这种有针对性的方法只处理 500 页文档中的相关部分,与逐页扫描每个节录术语相比,大大缩短了执行时间。 如何自定义重编内容的外观? IronPDF 提供了多个参数来控制最终文档中的删节显示方式。 您可以调整大小写敏感性、整词匹配、是否绘制可视化矩形,以及用哪些替换文本来替代编辑内容。 输入 一份法律简报,其中包含各种敏感术语,包括需要进行不同编辑处理的分类标签、密码和内部参考代码。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/customize-redaction-appearance.cs using IronPdf; // Load the document PdfDocument pdf = PdfDocument.FromFile("legal-brief.pdf"); // Case-sensitive redaction: only matches exact case // "CLASSIFIED" will be redacted but "classified" or "Classified" will not pdf.RedactTextOnAllPages( "CLASSIFIED", CaseSensitive: true, OnlyMatchWholeWords: true, DrawRectangles: true, ReplacementText: "[REDACTED]" ); // Case-insensitive redaction: matches regardless of case // Will redact "Secret", "SECRET", "secret", etc. pdf.RedactTextOnAllPages( "secret", CaseSensitive: false, OnlyMatchWholeWords: true, DrawRectangles: true, ReplacementText: "*****" ); // Whole word disabled: matches partial strings too // Will redact "password", "passwords", "mypassword123", etc. pdf.RedactTextOnAllPages( "password", CaseSensitive: false, OnlyMatchWholeWords: false, DrawRectangles: true, ReplacementText: "XXXXX" ); // No visual rectangle: text is removed but no black box appears // Useful when you want seamless removal without obvious redaction marks pdf.RedactTextOnAllPages( "internal-reference-code", CaseSensitive: true, OnlyMatchWholeWords: true, DrawRectangles: false, ReplacementText: "" ); pdf.SaveAs("legal-brief-redacted.pdf"); Imports IronPdf ' Load the document Dim pdf As PdfDocument = PdfDocument.FromFile("legal-brief.pdf") ' Case-sensitive redaction: only matches exact case ' "CLASSIFIED" will be redacted but "classified" or "Classified" will not pdf.RedactTextOnAllPages( "CLASSIFIED", CaseSensitive:=True, OnlyMatchWholeWords:=True, DrawRectangles:=True, ReplacementText:="[REDACTED]" ) ' Case-insensitive redaction: matches regardless of case ' Will redact "Secret", "SECRET", "secret", etc. pdf.RedactTextOnAllPages( "secret", CaseSensitive:=False, OnlyMatchWholeWords:=True, DrawRectangles:=True, ReplacementText:="*****" ) ' Whole word disabled: matches partial strings too ' Will redact "password", "passwords", "mypassword123", etc. pdf.RedactTextOnAllPages( "password", CaseSensitive:=False, OnlyMatchWholeWords:=False, DrawRectangles:=True, ReplacementText:="XXXXX" ) ' No visual rectangle: text is removed but no black box appears ' Useful when you want seamless removal without obvious redaction marks pdf.RedactTextOnAllPages( "internal-reference-code", CaseSensitive:=True, OnlyMatchWholeWords:=True, DrawRectangles:=False, ReplacementText:="" ) pdf.SaveAs("legal-brief-redacted.pdf") $vbLabelText $csharpLabel 本代码使用 RedactTextOnAllPages 的可选参数演示了四种不同的编辑配置。 它显示了带"[REDACTED]"替换的大小写敏感精确匹配、带星号的大小写不敏感匹配、部分单词匹配以捕捉 "密码 "等变体,以及无视觉矩形的隐形移除以实现无缝内容消除。 输出示例 根据您的要求,参数可用于不同的目的: CaseSensitive决定匹配是否考虑字母大小写。 法律文件通常会使用特定的大写字母来表达含义,因此大小写敏感匹配可确保您只删除完全匹配的内容。 在处理大小写不同的一般文本时,可能需要进行大小写不敏感匹配,以捕捉所有实例。 OnlyMatchWholeWords控制搜索是匹配完整单词还是部分字符串。 在编辑名称时,通常需要进行整词匹配,这样 "Smith "就不会意外地编辑 "Blacksmith "或 "Smithfield "的一部分。 在编辑帐号前缀等模式时,可能需要进行部分匹配以捕捉变化。 DrawRectangles(绘制矩形)指定是否在删除内容的地方显示黑框。 大多数监管和法律环境都要求有明显的编辑标记,以证明内容是被有意删除的,而不是意外遗漏的。 内部工作流程可能更倾向于隐形删除,以获得更干净的输出。 ReplacementText定义了取代编辑内容的字符。 常见的选择包括星号、"REDACTED "标签或空字符串。 如果有人试图从编辑区域选择或复制,替换文本将出现在文档结构中。 如何使用正则表达式查找和重制敏感模式? 当您有特定值需要删除时,对已知文本字符串进行重编是可行的,但许多机密数据类型遵循的是可预测的模式而不是固定值。 社会保险号、信用卡号、电子邮件地址、电话号码和日期都有可识别的格式,可以用正则表达式进行匹配。 建立一个基于模式的编辑系统可以让您删除 PDF 内容中的私人信息,而无需事先了解每个具体值。 IronPdf 的文本提取功能与节录方法相结合,可实现强大的模式匹配工作流。 您需要提取文本,使用 .NET 正则表达式识别匹配项,然后编辑每个发现的值。 using IronPdf; using System.Text.RegularExpressions; using System.Collections.Generic; public class PatternRedactor { // Common patterns for sensitive data private static readonly Dictionary<string, string> SensitivePatterns = new Dictionary<string, string> { // US Social Security Number: 123-45-6789 { "SSN", @"\b\d{3}-\d{2}-\d{4}\b" }, // Credit Card Numbers: various formats with 13-19 digits { "CreditCard", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" }, // Email Addresses { "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" }, // US Phone Numbers: (123) 456-7890 or 123-456-7890 { "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }, // Dates: MM/DD/YYYY or MM-DD-YYYY { "Date", @"\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" }, // IP Addresses { "IPAddress", @"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b" } }; public void RedactPatterns(string inputPath, string outputPath, params string[] patternNames) { // Load the PDF PdfDocument pdf = PdfDocument.FromFile(inputPath); // Extract all text from the document string fullText = pdf.ExtractAllText(); // Track unique matches to avoid duplicate redaction attempts HashSet<string> matchesToRedact = new HashSet<string>(); // Find all matches for requested patterns foreach (string patternName in patternNames) { if (SensitivePatterns.TryGetValue(patternName, out string pattern)) { Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); MatchCollection matches = regex.Matches(fullText); foreach (Match match in matches) { matchesToRedact.Add(match.Value); } } } // Redact each unique match foreach (string sensitiveValue in matchesToRedact) { pdf.RedactTextOnAllPages(sensitiveValue); } // Save the redacted document pdf.SaveAs(outputPath); } } // Usage example class Program { static void Main() { PatternRedactor redactor = new PatternRedactor(); // Redact SSNs and credit cards from a financial document redactor.RedactPatterns( "customer-data.pdf", "customer-data-safe.pdf", "SSN", "CreditCard", "Email" ); } } using IronPdf; using System.Text.RegularExpressions; using System.Collections.Generic; public class PatternRedactor { // Common patterns for sensitive data private static readonly Dictionary<string, string> SensitivePatterns = new Dictionary<string, string> { // US Social Security Number: 123-45-6789 { "SSN", @"\b\d{3}-\d{2}-\d{4}\b" }, // Credit Card Numbers: various formats with 13-19 digits { "CreditCard", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" }, // Email Addresses { "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" }, // US Phone Numbers: (123) 456-7890 or 123-456-7890 { "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }, // Dates: MM/DD/YYYY or MM-DD-YYYY { "Date", @"\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" }, // IP Addresses { "IPAddress", @"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b" } }; public void RedactPatterns(string inputPath, string outputPath, params string[] patternNames) { // Load the PDF PdfDocument pdf = PdfDocument.FromFile(inputPath); // Extract all text from the document string fullText = pdf.ExtractAllText(); // Track unique matches to avoid duplicate redaction attempts HashSet<string> matchesToRedact = new HashSet<string>(); // Find all matches for requested patterns foreach (string patternName in patternNames) { if (SensitivePatterns.TryGetValue(patternName, out string pattern)) { Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); MatchCollection matches = regex.Matches(fullText); foreach (Match match in matches) { matchesToRedact.Add(match.Value); } } } // Redact each unique match foreach (string sensitiveValue in matchesToRedact) { pdf.RedactTextOnAllPages(sensitiveValue); } // Save the redacted document pdf.SaveAs(outputPath); } } // Usage example class Program { static void Main() { PatternRedactor redactor = new PatternRedactor(); // Redact SSNs and credit cards from a financial document redactor.RedactPatterns( "customer-data.pdf", "customer-data-safe.pdf", "SSN", "CreditCard", "Email" ); } } Imports IronPdf Imports System.Text.RegularExpressions Imports System.Collections.Generic Public Class PatternRedactor ' Common patterns for sensitive data Private Shared ReadOnly SensitivePatterns As New Dictionary(Of String, String) From { ' US Social Security Number: 123-45-6789 {"SSN", "\b\d{3}-\d{2}-\d{4}\b"}, ' Credit Card Numbers: various formats with 13-19 digits {"CreditCard", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"}, ' Email Addresses {"Email", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"}, ' US Phone Numbers: (123) 456-7890 or 123-456-7890 {"Phone", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"}, ' Dates: MM/DD/YYYY or MM-DD-YYYY {"Date", "\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b"}, ' IP Addresses {"IPAddress", "\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b"} } Public Sub RedactPatterns(inputPath As String, outputPath As String, ParamArray patternNames As String()) ' Load the PDF Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath) ' Extract all text from the document Dim fullText As String = pdf.ExtractAllText() ' Track unique matches to avoid duplicate redaction attempts Dim matchesToRedact As New HashSet(Of String)() ' Find all matches for requested patterns For Each patternName As String In patternNames Dim pattern As String = Nothing If SensitivePatterns.TryGetValue(patternName, pattern) Then Dim regex As New Regex(pattern, RegexOptions.IgnoreCase) Dim matches As MatchCollection = regex.Matches(fullText) For Each match As Match In matches matchesToRedact.Add(match.Value) Next End If Next ' Redact each unique match For Each sensitiveValue As String In matchesToRedact pdf.RedactTextOnAllPages(sensitiveValue) Next ' Save the redacted document pdf.SaveAs(outputPath) End Sub End Class ' Usage example Class Program Shared Sub Main() Dim redactor As New PatternRedactor() ' Redact SSNs and credit cards from a financial document redactor.RedactPatterns( "customer-data.pdf", "customer-data-safe.pdf", "SSN", "CreditCard", "Email" ) End Sub End Class $vbLabelText $csharpLabel 这种基于模式的方法具有良好的扩展性,因为您只需定义一次模式,即可将其应用于任何文档。 添加新的数据类型只需要在字典中添加新的 regex 模式。 如何构建可重用的敏感数据扫描仪? 对于生产环境,您通常需要扫描文档并报告存在哪些机密信息,然后再决定是否进行编辑。 这有助于合规性审核,并允许对编辑决定进行人工审核。 以下类别在提供编辑功能的同时还提供扫描功能。 using IronPdf; using System.Collections.Generic; using System.Text.RegularExpressions; using System.Linq; public class SensitiveDataMatch { public string PatternType { get; set; } public string Value { get; set; } public int PageNumber { get; set; } } public class ScanResult { public string FilePath { get; set; } public List<SensitiveDataMatch> Matches { get; set; } = new List<SensitiveDataMatch>(); public bool ContainsSensitiveData => Matches.Count > 0; public Dictionary<string, int> GetSummary() { return Matches.GroupBy(m => m.PatternType) .ToDictionary(g => g.Key, g => g.Count()); } } public class DocumentScanner { private readonly Dictionary<string, string> _patterns; public DocumentScanner() { _patterns = new Dictionary<string, string> { { "Social Security Number", @"\b\d{3}-\d{2}-\d{4}\b" }, { "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" }, { "Email Address", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" }, { "Phone Number", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }, { "Date of Birth Pattern", @"\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" } }; } public ScanResult ScanDocument(string filePath) { ScanResult result = new ScanResult { FilePath = filePath }; PdfDocument pdf = PdfDocument.FromFile(filePath); // Scan each page individually to track location for (int pageIndex = 0; pageIndex < pdf.PageCount; pageIndex++) { string pageText = pdf.ExtractTextFromPage(pageIndex); foreach (var pattern in _patterns) { Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase); MatchCollection matches = regex.Matches(pageText); foreach (Match match in matches) { result.Matches.Add(new SensitiveDataMatch { PatternType = pattern.Key, Value = MaskValue(match.Value, pattern.Key), PageNumber = pageIndex + 1 }); } } } return result; } // Partially mask values for safe storage private string MaskValue(string value, string patternType) { if (patternType == "Social Security Number" && value.Length >= 4) { return "XXX-XX-" + value.Substring(value.Length - 4); } if (patternType == "Credit Card" && value.Length >= 4) { return "****-****-****-" + value.Substring(value.Length - 4); } if (patternType == "Email Address") { int atIndex = value.IndexOf('@'); if (atIndex > 2) { return value.Substring(0, 2) + "***" + value.Substring(atIndex); } } return value.Length > 4 ? value.Substring(0, 2) + "***" : "****"; } public void ScanAndRedact(string inputPath, string outputPath) { // First scan to identify sensitive data ScanResult scanResult = ScanDocument(inputPath); if (!scanResult.ContainsSensitiveData) { return; } // Load document for redaction PdfDocument pdf = PdfDocument.FromFile(inputPath); // Extract unique actual values (not masked) for redaction string fullText = pdf.ExtractAllText(); HashSet<string> valuesToRedact = new HashSet<string>(); foreach (var pattern in _patterns) { Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase); foreach (Match match in regex.Matches(fullText)) { valuesToRedact.Add(match.Value); } } // Apply redactions foreach (string value in valuesToRedact) { pdf.RedactTextOnAllPages(value); } pdf.SaveAs(outputPath); } } // Usage class Program { static void Main() { DocumentScanner scanner = new DocumentScanner(); // Scan only (for audit purposes) ScanResult result = scanner.ScanDocument("application-form.pdf"); var summary = result.GetSummary(); // Scan and redact in one operation scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf"); } } using IronPdf; using System.Collections.Generic; using System.Text.RegularExpressions; using System.Linq; public class SensitiveDataMatch { public string PatternType { get; set; } public string Value { get; set; } public int PageNumber { get; set; } } public class ScanResult { public string FilePath { get; set; } public List<SensitiveDataMatch> Matches { get; set; } = new List<SensitiveDataMatch>(); public bool ContainsSensitiveData => Matches.Count > 0; public Dictionary<string, int> GetSummary() { return Matches.GroupBy(m => m.PatternType) .ToDictionary(g => g.Key, g => g.Count()); } } public class DocumentScanner { private readonly Dictionary<string, string> _patterns; public DocumentScanner() { _patterns = new Dictionary<string, string> { { "Social Security Number", @"\b\d{3}-\d{2}-\d{4}\b" }, { "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" }, { "Email Address", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" }, { "Phone Number", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }, { "Date of Birth Pattern", @"\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" } }; } public ScanResult ScanDocument(string filePath) { ScanResult result = new ScanResult { FilePath = filePath }; PdfDocument pdf = PdfDocument.FromFile(filePath); // Scan each page individually to track location for (int pageIndex = 0; pageIndex < pdf.PageCount; pageIndex++) { string pageText = pdf.ExtractTextFromPage(pageIndex); foreach (var pattern in _patterns) { Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase); MatchCollection matches = regex.Matches(pageText); foreach (Match match in matches) { result.Matches.Add(new SensitiveDataMatch { PatternType = pattern.Key, Value = MaskValue(match.Value, pattern.Key), PageNumber = pageIndex + 1 }); } } } return result; } // Partially mask values for safe storage private string MaskValue(string value, string patternType) { if (patternType == "Social Security Number" && value.Length >= 4) { return "XXX-XX-" + value.Substring(value.Length - 4); } if (patternType == "Credit Card" && value.Length >= 4) { return "****-****-****-" + value.Substring(value.Length - 4); } if (patternType == "Email Address") { int atIndex = value.IndexOf('@'); if (atIndex > 2) { return value.Substring(0, 2) + "***" + value.Substring(atIndex); } } return value.Length > 4 ? value.Substring(0, 2) + "***" : "****"; } public void ScanAndRedact(string inputPath, string outputPath) { // First scan to identify sensitive data ScanResult scanResult = ScanDocument(inputPath); if (!scanResult.ContainsSensitiveData) { return; } // Load document for redaction PdfDocument pdf = PdfDocument.FromFile(inputPath); // Extract unique actual values (not masked) for redaction string fullText = pdf.ExtractAllText(); HashSet<string> valuesToRedact = new HashSet<string>(); foreach (var pattern in _patterns) { Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase); foreach (Match match in regex.Matches(fullText)) { valuesToRedact.Add(match.Value); } } // Apply redactions foreach (string value in valuesToRedact) { pdf.RedactTextOnAllPages(value); } pdf.SaveAs(outputPath); } } // Usage class Program { static void Main() { DocumentScanner scanner = new DocumentScanner(); // Scan only (for audit purposes) ScanResult result = scanner.ScanDocument("application-form.pdf"); var summary = result.GetSummary(); // Scan and redact in one operation scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf"); } } Imports IronPdf Imports System.Collections.Generic Imports System.Text.RegularExpressions Imports System.Linq Public Class SensitiveDataMatch Public Property PatternType As String Public Property Value As String Public Property PageNumber As Integer End Class Public Class ScanResult Public Property FilePath As String Public Property Matches As List(Of SensitiveDataMatch) = New List(Of SensitiveDataMatch)() Public ReadOnly Property ContainsSensitiveData As Boolean Get Return Matches.Count > 0 End Get End Property Public Function GetSummary() As Dictionary(Of String, Integer) Return Matches.GroupBy(Function(m) m.PatternType) _ .ToDictionary(Function(g) g.Key, Function(g) g.Count()) End Function End Class Public Class DocumentScanner Private ReadOnly _patterns As Dictionary(Of String, String) Public Sub New() _patterns = New Dictionary(Of String, String) From { {"Social Security Number", "\b\d{3}-\d{2}-\d{4}\b"}, {"Credit Card", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"}, {"Email Address", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"}, {"Phone Number", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"}, {"Date of Birth Pattern", "\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b"} } End Sub Public Function ScanDocument(filePath As String) As ScanResult Dim result As New ScanResult With {.FilePath = filePath} Dim pdf As PdfDocument = PdfDocument.FromFile(filePath) ' Scan each page individually to track location For pageIndex As Integer = 0 To pdf.PageCount - 1 Dim pageText As String = pdf.ExtractTextFromPage(pageIndex) For Each pattern In _patterns Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase) Dim matches As MatchCollection = regex.Matches(pageText) For Each match As Match In matches result.Matches.Add(New SensitiveDataMatch With { .PatternType = pattern.Key, .Value = MaskValue(match.Value, pattern.Key), .PageNumber = pageIndex + 1 }) Next Next Next Return result End Function ' Partially mask values for safe storage Private Function MaskValue(value As String, patternType As String) As String If patternType = "Social Security Number" AndAlso value.Length >= 4 Then Return "XXX-XX-" & value.Substring(value.Length - 4) End If If patternType = "Credit Card" AndAlso value.Length >= 4 Then Return "****-****-****-" & value.Substring(value.Length - 4) End If If patternType = "Email Address" Then Dim atIndex As Integer = value.IndexOf("@"c) If atIndex > 2 Then Return value.Substring(0, 2) & "***" & value.Substring(atIndex) End If End If Return If(value.Length > 4, value.Substring(0, 2) & "***", "****") End Function Public Sub ScanAndRedact(inputPath As String, outputPath As String) ' First scan to identify sensitive data Dim scanResult As ScanResult = ScanDocument(inputPath) If Not scanResult.ContainsSensitiveData Then Return End If ' Load document for redaction Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath) ' Extract unique actual values (not masked) for redaction Dim fullText As String = pdf.ExtractAllText() Dim valuesToRedact As New HashSet(Of String)() For Each pattern In _patterns Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase) For Each match As Match In regex.Matches(fullText) valuesToRedact.Add(match.Value) Next Next ' Apply redactions For Each value As String In valuesToRedact pdf.RedactTextOnAllPages(value) Next pdf.SaveAs(outputPath) End Sub End Class ' Usage Module Program Sub Main() Dim scanner As New DocumentScanner() ' Scan only (for audit purposes) Dim result As ScanResult = scanner.ScanDocument("application-form.pdf") Dim summary = result.GetSummary() ' Scan and redact in one operation scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf") End Sub End Module $vbLabelText $csharpLabel 扫描仪可在任何修改发生之前提供存在哪些机密信息的可视性。 这将支持合规性工作流程,因为在这些流程中,您需要记录发现和删除的内容。 屏蔽功能可确保日志文件和报告本身不会成为数据暴露的来源。 如何编辑 PDF 中的特定区域? 文本编辑可以有效处理基于字符的内容,但 PDF 文件通常包含敏感信息,而文本匹配无法处理这些信息。 签名、照片、手写注释、印章和图形元素需要采用不同的方法。 基于区域的编辑让您可以通过坐标指定矩形区域,并永久性地遮盖这些范围内的所有内容。 IronPDF 使用 RectangleF 结构来定义节录区域。 您需要指定左上角的 X 坐标和 Y 坐标,然后指定区域的宽度和高度。 坐标以点为单位,从页面左下方开始测量,这与 PDF 规范的坐标系统相匹配。 输入 一份已签署的协议文件,其中包含手写签名和身份证照片,需要使用基于坐标的区域定位功能进行编辑。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-region-basic.cs using IronPdf; using IronSoftware.Drawing; // Load a document with signature blocks and photos PdfDocument pdf = PdfDocument.FromFile("signed-agreement.pdf"); // Define a region for a signature block // Located 100 points from left, 650 points from bottom // Width of 200 points, height of 50 points RectangleF signatureRegion = new RectangleF(100, 650, 200, 50); // Redact the signature region on all pages pdf.RedactRegionsOnAllPages(signatureRegion); // Define a region for a photo ID in the upper right RectangleF photoRegion = new RectangleF(450, 700, 100, 120); pdf.RedactRegionsOnAllPages(photoRegion); // Save the document with regions redacted pdf.SaveAs("signed-agreement-redacted.pdf"); Imports IronPdf Imports IronSoftware.Drawing ' Load a document with signature blocks and photos Dim pdf As PdfDocument = PdfDocument.FromFile("signed-agreement.pdf") ' Define a region for a signature block ' Located 100 points from left, 650 points from bottom ' Width of 200 points, height of 50 points Dim signatureRegion As New RectangleF(100, 650, 200, 50) ' Redact the signature region on all pages pdf.RedactRegionsOnAllPages(signatureRegion) ' Define a region for a photo ID in the upper right Dim photoRegion As New RectangleF(450, 700, 100, 120) pdf.RedactRegionsOnAllPages(photoRegion) ' Save the document with regions redacted pdf.SaveAs("signed-agreement-redacted.pdf") $vbLabelText $csharpLabel 该代码使用 RectangleF 结构定义用于节录的矩形区域。 签名区域位于坐标(100,650)处,面积为 200x50 像素,照片区域位于坐标(450,700)处,面积为 100x120 像素。 RedactRegionsOnAllPages 方法将在所有页面的这些区域上应用黑色矩形。 输出示例 确定正确的坐标通常需要一些实验或测量。 PDF 页面通常使用坐标系,其中一个点等于 1/72 英寸。 一个标准的 US Letter 页面宽 612 点,高 792 点。 A4 纸页面约为 595 x 842 点。 在移动光标时显示坐标的 PDF 查看工具可以提供帮助,您也可以通过编程提取页面尺寸: :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-region-dimensions.cs using IronPdf; using IronSoftware.Drawing; PdfDocument pdf = PdfDocument.FromFile("form-document.pdf"); // Get dimensions of the first page var pageInfo = pdf.Pages[0]; // Calculate regions relative to page dimensions // Redact the bottom quarter of the page where signatures appear float signatureAreaHeight = (float)(pageInfo.Height / 4); RectangleF bottomQuarter = new RectangleF( 0, // Start at left edge 0, // Start at bottom (float)pageInfo.Width, // Full page width signatureAreaHeight // Quarter of page height ); pdf.RedactRegionsOnAllPages(bottomQuarter); // Redact a header area at the top containing letterhead with address float headerHeight = 100; RectangleF headerArea = new RectangleF( 0, (float)(pageInfo.Height - headerHeight), // Position from bottom (float)pageInfo.Width, headerHeight ); pdf.RedactRegionsOnAllPages(headerArea); pdf.SaveAs("form-document-redacted.pdf"); Imports IronPdf Imports IronSoftware.Drawing Dim pdf As PdfDocument = PdfDocument.FromFile("form-document.pdf") ' Get dimensions of the first page Dim pageInfo = pdf.Pages(0) ' Calculate regions relative to page dimensions ' Redact the bottom quarter of the page where signatures appear Dim signatureAreaHeight As Single = CSng(pageInfo.Height / 4) Dim bottomQuarter As New RectangleF(0, 0, CSng(pageInfo.Width), signatureAreaHeight) pdf.RedactRegionsOnAllPages(bottomQuarter) ' Redact a header area at the top containing letterhead with address Dim headerHeight As Single = 100 Dim headerArea As New RectangleF(0, CSng(pageInfo.Height - headerHeight), CSng(pageInfo.Width), headerHeight) pdf.RedactRegionsOnAllPages(headerArea) pdf.SaveAs("form-document-redacted.pdf") $vbLabelText $csharpLabel 如何重制不同页面上的多个区域? 复杂的文件通常需要在不同页面上编辑不同的区域。 一个多页的表格可能会在不同的位置有签名线,或者不同的页面可能会在独特的位置包含照片、印章或其他图形元素。 IronPDF 包含特定页面方法,可进行有针对性的区域节录。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-multiple-regions.cs using IronPdf; using IronSoftware.Drawing; PdfDocument pdf = PdfDocument.FromFile("multi-page-application.pdf"); // Define page-specific redaction regions // Page 1: Cover page with applicant photo RectangleF page1Photo = new RectangleF(450, 600, 120, 150); pdf.RedactRegionOnPage(0, page1Photo); // Page 2: Personal information section RectangleF page2InfoBlock = new RectangleF(50, 400, 250, 200); pdf.RedactRegionOnPage(1, page2InfoBlock); // Pages 3-5: Signature lines at the same position RectangleF signatureLine = new RectangleF(100, 100, 200, 40); int[] signaturePages = { 2, 3, 4 }; pdf.RedactRegionOnPages(signaturePages, signatureLine); // Page 6: Multiple regions - notary stamp and witness signature RectangleF notaryStamp = new RectangleF(400, 150, 150, 150); RectangleF witnessSignature = new RectangleF(100, 150, 200, 40); pdf.RedactRegionOnPage(5, notaryStamp); pdf.RedactRegionOnPage(5, witnessSignature); pdf.SaveAs("multi-page-application-redacted.pdf"); Imports IronPdf Imports IronSoftware.Drawing Dim pdf As PdfDocument = PdfDocument.FromFile("multi-page-application.pdf") ' Define page-specific redaction regions ' Page 1: Cover page with applicant photo Dim page1Photo As New RectangleF(450, 600, 120, 150) pdf.RedactRegionOnPage(0, page1Photo) ' Page 2: Personal information section Dim page2InfoBlock As New RectangleF(50, 400, 250, 200) pdf.RedactRegionOnPage(1, page2InfoBlock) ' Pages 3-5: Signature lines at the same position Dim signatureLine As New RectangleF(100, 100, 200, 40) Dim signaturePages As Integer() = {2, 3, 4} pdf.RedactRegionOnPages(signaturePages, signatureLine) ' Page 6: Multiple regions - notary stamp and witness signature Dim notaryStamp As New RectangleF(400, 150, 150, 150) Dim witnessSignature As New RectangleF(100, 150, 200, 40) pdf.RedactRegionOnPage(5, notaryStamp) pdf.RedactRegionOnPage(5, witnessSignature) pdf.SaveAs("multi-page-application-redacted.pdf") $vbLabelText $csharpLabel 布局一致的文档可从可重复使用的区域定义中获益: using IronPdf; using IronSoftware.Drawing; public class FormRegions { // Standard form regions based on common templates public static RectangleF HeaderLogo => new RectangleF(20, 720, 150, 60); public static RectangleF SignatureBlock => new RectangleF(72, 72, 200, 50); public static RectangleF DateField => new RectangleF(400, 72, 120, 20); public static RectangleF PhotoId => new RectangleF(480, 650, 100, 130); public static RectangleF AddressBlock => new RectangleF(72, 600, 250, 80); } class Program { static void Main() { PdfDocument pdf = PdfDocument.FromFile("standard-form.pdf"); // Apply standard redactions using predefined regions pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock); pdf.RedactRegionsOnAllPages(FormRegions.DateField); pdf.RedactRegionOnPage(0, FormRegions.PhotoId); pdf.SaveAs("standard-form-redacted.pdf"); } } using IronPdf; using IronSoftware.Drawing; public class FormRegions { // Standard form regions based on common templates public static RectangleF HeaderLogo => new RectangleF(20, 720, 150, 60); public static RectangleF SignatureBlock => new RectangleF(72, 72, 200, 50); public static RectangleF DateField => new RectangleF(400, 72, 120, 20); public static RectangleF PhotoId => new RectangleF(480, 650, 100, 130); public static RectangleF AddressBlock => new RectangleF(72, 600, 250, 80); } class Program { static void Main() { PdfDocument pdf = PdfDocument.FromFile("standard-form.pdf"); // Apply standard redactions using predefined regions pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock); pdf.RedactRegionsOnAllPages(FormRegions.DateField); pdf.RedactRegionOnPage(0, FormRegions.PhotoId); pdf.SaveAs("standard-form-redacted.pdf"); } } Imports IronPdf Imports IronSoftware.Drawing Public Class FormRegions ' Standard form regions based on common templates Public Shared ReadOnly Property HeaderLogo As RectangleF Get Return New RectangleF(20, 720, 150, 60) End Get End Property Public Shared ReadOnly Property SignatureBlock As RectangleF Get Return New RectangleF(72, 72, 200, 50) End Get End Property Public Shared ReadOnly Property DateField As RectangleF Get Return New RectangleF(400, 72, 120, 20) End Get End Property Public Shared ReadOnly Property PhotoId As RectangleF Get Return New RectangleF(480, 650, 100, 130) End Get End Property Public Shared ReadOnly Property AddressBlock As RectangleF Get Return New RectangleF(72, 600, 250, 80) End Get End Property End Class Module Program Sub Main() Dim pdf As PdfDocument = PdfDocument.FromFile("standard-form.pdf") ' Apply standard redactions using predefined regions pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock) pdf.RedactRegionsOnAllPages(FormRegions.DateField) pdf.RedactRegionOnPage(0, FormRegions.PhotoId) pdf.SaveAs("standard-form-redacted.pdf") End Sub End Module $vbLabelText $csharpLabel 如何删除可能暴露敏感信息的元数据? PDF 元数据是一个经常被忽视的信息泄漏源。 每个 PDF 都带有可能泄露敏感细节的属性:作者姓名和用户名、创建文档所用的软件、创建和修改时间戳、原始文件名、修订历史以及各种应用程序添加的自定义属性。 在对外共享文档之前,必须剥离或清除这些元数据。 有关元数据操作的全面概述,请参阅我们的元数据操作指南。 IronPdf 通过 MetaData 属性公开文档元数据,允许您读取现有值、修改它们或完全删除它们。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/view-metadata.cs using IronPdf; // Load a document containing sensitive metadata PdfDocument pdf = PdfDocument.FromFile("internal-report.pdf"); // Access current metadata properties string author = pdf.MetaData.Author; string title = pdf.MetaData.Title; string subject = pdf.MetaData.Subject; string keywords = pdf.MetaData.Keywords; string creator = pdf.MetaData.Creator; string producer = pdf.MetaData.Producer; DateTime? creationDate = pdf.MetaData.CreationDate; DateTime? modifiedDate = pdf.MetaData.ModifiedDate; // Get all metadata keys including custom properties var allKeys = pdf.MetaData.Keys(); Imports IronPdf ' Load a document containing sensitive metadata Dim pdf As PdfDocument = PdfDocument.FromFile("internal-report.pdf") ' Access current metadata properties Dim author As String = pdf.MetaData.Author Dim title As String = pdf.MetaData.Title Dim subject As String = pdf.MetaData.Subject Dim keywords As String = pdf.MetaData.Keywords Dim creator As String = pdf.MetaData.Creator Dim producer As String = pdf.MetaData.Producer Dim creationDate As DateTime? = pdf.MetaData.CreationDate Dim modifiedDate As DateTime? = pdf.MetaData.ModifiedDate ' Get all metadata keys including custom properties Dim allKeys = pdf.MetaData.Keys() $vbLabelText $csharpLabel 在发布前删除敏感元数据: 输入 一份内部备忘录,其中包含嵌入式元数据,如作者姓名、创建时间戳以及可能泄露敏感组织信息的自定义属性。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/remove-metadata.cs using IronPdf; using System; PdfDocument pdf = PdfDocument.FromFile("confidential-memo.pdf"); // Replace identifying metadata with generic values pdf.MetaData.Author = "Organization Name"; pdf.MetaData.Creator = "Document System"; pdf.MetaData.Producer = ""; pdf.MetaData.Title = "Public Document"; pdf.MetaData.Subject = ""; pdf.MetaData.Keywords = ""; // Normalize dates to remove timing information pdf.MetaData.CreationDate = DateTime.Now; pdf.MetaData.ModifiedDate = DateTime.Now; // Remove specific custom metadata keys pdf.MetaData.RemoveMetaDataKey("OriginalFilename"); pdf.MetaData.RemoveMetaDataKey("LastSavedBy"); pdf.MetaData.RemoveMetaDataKey("Company"); pdf.MetaData.RemoveMetaDataKey("Manager"); // Remove custom properties added by applications try { pdf.MetaData.CustomProperties.Remove("SourcePath"); } catch { } pdf.SaveAs("confidential-memo-cleaned.pdf"); Imports IronPdf Imports System Dim pdf As PdfDocument = PdfDocument.FromFile("confidential-memo.pdf") ' Replace identifying metadata with generic values pdf.MetaData.Author = "Organization Name" pdf.MetaData.Creator = "Document System" pdf.MetaData.Producer = "" pdf.MetaData.Title = "Public Document" pdf.MetaData.Subject = "" pdf.MetaData.Keywords = "" ' Normalize dates to remove timing information pdf.MetaData.CreationDate = DateTime.Now pdf.MetaData.ModifiedDate = DateTime.Now ' Remove specific custom metadata keys pdf.MetaData.RemoveMetaDataKey("OriginalFilename") pdf.MetaData.RemoveMetaDataKey("LastSavedBy") pdf.MetaData.RemoveMetaDataKey("Company") pdf.MetaData.RemoveMetaDataKey("Manager") ' Remove custom properties added by applications Try pdf.MetaData.CustomProperties.Remove("SourcePath") Catch End Try pdf.SaveAs("confidential-memo-cleaned.pdf") $vbLabelText $csharpLabel 该代码将识别元数据字段替换为通用值,将时间戳规范化为当前日期,并移除应用程序可能添加的自定义元数据键。 RemoveMetaDataKey 方法针对的是 "OriginalFilename "和 "LastSavedBy "等可能暴露内部信息的特定属性。 输出示例 彻底清理批量操作中的元数据需要系统化的方法: using IronPdf; using System; using System.Collections.Generic; public class MetadataCleaner { private readonly string _defaultAuthor; private readonly string _defaultCreator; public MetadataCleaner(string organizationName) { _defaultAuthor = organizationName; _defaultCreator = $"{organizationName} Document System"; } public void CleanMetadata(PdfDocument pdf) { // Replace standard metadata fields pdf.MetaData.Author = _defaultAuthor; pdf.MetaData.Creator = _defaultCreator; pdf.MetaData.Producer = ""; pdf.MetaData.Subject = ""; pdf.MetaData.Keywords = ""; // Normalize timestamps DateTime now = DateTime.Now; pdf.MetaData.CreationDate = now; pdf.MetaData.ModifiedDate = now; // Get all keys and remove potentially sensitive ones List<string> keysToRemove = new List<string>(); foreach (string key in pdf.MetaData.Keys()) { // Keep only essential keys if (!IsEssentialKey(key)) { keysToRemove.Add(key); } } foreach (string key in keysToRemove) { pdf.MetaData.RemoveMetaDataKey(key); } } private bool IsEssentialKey(string key) { // Keep only the basic display properties string[] essentialKeys = { "Title", "Author", "CreationDate", "ModifiedDate" }; foreach (string essential in essentialKeys) { if (key.Equals(essential, StringComparison.OrdinalIgnoreCase)) { return true; } } return false; } } // Usage class Program { static void Main() { MetadataCleaner cleaner = new MetadataCleaner("Acme Corporation"); PdfDocument pdf = PdfDocument.FromFile("report.pdf"); cleaner.CleanMetadata(pdf); pdf.SaveAs("report-clean.pdf"); } } using IronPdf; using System; using System.Collections.Generic; public class MetadataCleaner { private readonly string _defaultAuthor; private readonly string _defaultCreator; public MetadataCleaner(string organizationName) { _defaultAuthor = organizationName; _defaultCreator = $"{organizationName} Document System"; } public void CleanMetadata(PdfDocument pdf) { // Replace standard metadata fields pdf.MetaData.Author = _defaultAuthor; pdf.MetaData.Creator = _defaultCreator; pdf.MetaData.Producer = ""; pdf.MetaData.Subject = ""; pdf.MetaData.Keywords = ""; // Normalize timestamps DateTime now = DateTime.Now; pdf.MetaData.CreationDate = now; pdf.MetaData.ModifiedDate = now; // Get all keys and remove potentially sensitive ones List<string> keysToRemove = new List<string>(); foreach (string key in pdf.MetaData.Keys()) { // Keep only essential keys if (!IsEssentialKey(key)) { keysToRemove.Add(key); } } foreach (string key in keysToRemove) { pdf.MetaData.RemoveMetaDataKey(key); } } private bool IsEssentialKey(string key) { // Keep only the basic display properties string[] essentialKeys = { "Title", "Author", "CreationDate", "ModifiedDate" }; foreach (string essential in essentialKeys) { if (key.Equals(essential, StringComparison.OrdinalIgnoreCase)) { return true; } } return false; } } // Usage class Program { static void Main() { MetadataCleaner cleaner = new MetadataCleaner("Acme Corporation"); PdfDocument pdf = PdfDocument.FromFile("report.pdf"); cleaner.CleanMetadata(pdf); pdf.SaveAs("report-clean.pdf"); } } Imports IronPdf Imports System Imports System.Collections.Generic Public Class MetadataCleaner Private ReadOnly _defaultAuthor As String Private ReadOnly _defaultCreator As String Public Sub New(organizationName As String) _defaultAuthor = organizationName _defaultCreator = $"{organizationName} Document System" End Sub Public Sub CleanMetadata(pdf As PdfDocument) ' Replace standard metadata fields pdf.MetaData.Author = _defaultAuthor pdf.MetaData.Creator = _defaultCreator pdf.MetaData.Producer = "" pdf.MetaData.Subject = "" pdf.MetaData.Keywords = "" ' Normalize timestamps Dim now As DateTime = DateTime.Now pdf.MetaData.CreationDate = now pdf.MetaData.ModifiedDate = now ' Get all keys and remove potentially sensitive ones Dim keysToRemove As New List(Of String)() For Each key As String In pdf.MetaData.Keys() ' Keep only essential keys If Not IsEssentialKey(key) Then keysToRemove.Add(key) End If Next For Each key As String In keysToRemove pdf.MetaData.RemoveMetaDataKey(key) Next End Sub Private Function IsEssentialKey(key As String) As Boolean ' Keep only the basic display properties Dim essentialKeys As String() = {"Title", "Author", "CreationDate", "ModifiedDate"} For Each essential As String In essentialKeys If key.Equals(essential, StringComparison.OrdinalIgnoreCase) Then Return True End If Next Return False End Function End Class ' Usage Class Program Shared Sub Main() Dim cleaner As New MetadataCleaner("Acme Corporation") Dim pdf As PdfDocument = PdfDocument.FromFile("report.pdf") cleaner.CleanMetadata(pdf) pdf.SaveAs("report-clean.pdf") End Sub End Class $vbLabelText $csharpLabel 如何对 PDF 进行消毒以移除嵌入的脚本和隐藏的威胁? PDF 净化解决的不仅仅是可见内容和元数据的安全问题。 PDF 文件可能包含 JavaScript 代码、嵌入式可执行文件、触发外部连接的表单操作以及其他潜在的恶意元素。 这些功能用于交互式表单和多媒体内容等合法用途,但也会产生攻击向量。 对 PDF 进行净化处理可以去除这些活跃元素,同时保留可视内容。 有关消毒方法的其他详细信息,请参阅我们的消毒 PDF 方法指南。 IronPdf 的Cleaner类通过一种优雅的方法处理净化:将 PDF 转换为图像格式,然后再转换回来。 翻译过程中要删除 JavaScript、嵌入式对象、表单操作和注释,同时保持视觉外观的完整。 该库提供两种具有不同特点的消毒方法。 输入 从外部源接收的 PDF 文档可能包含 JavaScript、嵌入式对象或其他潜在的恶意活动内容。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/sanitize-pdf.cs using IronPdf; // Load a PDF that may contain active content PdfDocument pdf = PdfDocument.FromFile("received-document.pdf"); // Sanitize using SVG conversion // Faster processing, results in searchable text, slight layout variations possible PdfDocument sanitizedSvg = Cleaner.SanitizeWithSvg(pdf); sanitizedSvg.SaveAs("sanitized-svg.pdf"); // Sanitize using Bitmap conversion // Slower processing, text becomes image (not searchable), exact visual reproduction PdfDocument sanitizedBitmap = Cleaner.SanitizeWithBitmap(pdf); sanitizedBitmap.SaveAs("sanitized-bitmap.pdf"); Imports IronPdf ' Load a PDF that may contain active content Dim pdf As PdfDocument = PdfDocument.FromFile("received-document.pdf") ' Sanitize using SVG conversion ' Faster processing, results in searchable text, slight layout variations possible Dim sanitizedSvg As PdfDocument = Cleaner.SanitizeWithSvg(pdf) sanitizedSvg.SaveAs("sanitized-svg.pdf") ' Sanitize using Bitmap conversion ' Slower processing, text becomes image (not searchable), exact visual reproduction Dim sanitizedBitmap As PdfDocument = Cleaner.SanitizeWithBitmap(pdf) sanitizedBitmap.SaveAs("sanitized-bitmap.pdf") $vbLabelText $csharpLabel 这段代码演示了 IronPDF 的 Cleaner 类提供的两种消毒方法。 SanitizeWithSvg通过 SVG 中间格式转换 PDF,在删除活动内容的同时保留可搜索的文本。 SanitizeWithBitmap首先将页面转换为图像,生成精确的视觉副本,但文本将呈现为不可搜索的图形。 输出示例 SVG 方法运行速度更快,并能将文本保留为可搜索内容,因此适用于需要保持索引或可访问性的文档。 位图方法可以生成精确的视觉副本,但会将文本转换为图像,从而无法进行文本选择和搜索。 根据您对输出文档的要求进行选择。 您还可以在消毒过程中应用渲染选项来调整输出: :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/sanitize-with-options.cs using IronPdf; // Load the potentially unsafe document PdfDocument pdf = PdfDocument.FromFile("untrusted-source.pdf"); // Configure rendering options for sanitization var renderOptions = new ChromePdfRenderOptions { MarginTop = 10, MarginBottom = 10, MarginLeft = 10, MarginRight = 10 }; // Sanitize with custom options PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf, renderOptions); sanitized.SaveAs("untrusted-source-safe.pdf"); Imports IronPdf ' Load the potentially unsafe document Dim pdf As PdfDocument = PdfDocument.FromFile("untrusted-source.pdf") ' Configure rendering options for sanitization Dim renderOptions As New ChromePdfRenderOptions With { .MarginTop = 10, .MarginBottom = 10, .MarginLeft = 10, .MarginRight = 10 } ' Sanitize with custom options Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf, renderOptions) sanitized.SaveAs("untrusted-source-safe.pdf") $vbLabelText $csharpLabel 高安全性环境通常需要将消毒与其他保护措施结合起来: using IronPdf; using System; public class SecureDocumentProcessor { public PdfDocument ProcessUntrustedDocument(string inputPath) { // Load the document PdfDocument original = PdfDocument.FromFile(inputPath); // Step 1: Sanitize to remove active content PdfDocument sanitized = Cleaner.SanitizeWithSvg(original); // Step 2: Clean metadata sanitized.MetaData.Author = "Processed Document"; sanitized.MetaData.Creator = "Secure Processor"; sanitized.MetaData.Producer = ""; sanitized.MetaData.CreationDate = DateTime.Now; sanitized.MetaData.ModifiedDate = DateTime.Now; // Remove all custom metadata foreach (string key in sanitized.MetaData.Keys()) { if (key != "Title" && key != "Author" && key != "CreationDate" && key != "ModifiedDate") { sanitized.MetaData.RemoveMetaDataKey(key); } } return sanitized; } } // Usage class Program { static void Main() { SecureDocumentProcessor processor = new SecureDocumentProcessor(); PdfDocument safe = processor.ProcessUntrustedDocument("email-attachment.pdf"); safe.SaveAs("email-attachment-safe.pdf"); } } using IronPdf; using System; public class SecureDocumentProcessor { public PdfDocument ProcessUntrustedDocument(string inputPath) { // Load the document PdfDocument original = PdfDocument.FromFile(inputPath); // Step 1: Sanitize to remove active content PdfDocument sanitized = Cleaner.SanitizeWithSvg(original); // Step 2: Clean metadata sanitized.MetaData.Author = "Processed Document"; sanitized.MetaData.Creator = "Secure Processor"; sanitized.MetaData.Producer = ""; sanitized.MetaData.CreationDate = DateTime.Now; sanitized.MetaData.ModifiedDate = DateTime.Now; // Remove all custom metadata foreach (string key in sanitized.MetaData.Keys()) { if (key != "Title" && key != "Author" && key != "CreationDate" && key != "ModifiedDate") { sanitized.MetaData.RemoveMetaDataKey(key); } } return sanitized; } } // Usage class Program { static void Main() { SecureDocumentProcessor processor = new SecureDocumentProcessor(); PdfDocument safe = processor.ProcessUntrustedDocument("email-attachment.pdf"); safe.SaveAs("email-attachment-safe.pdf"); } } Imports IronPdf Imports System Public Class SecureDocumentProcessor Public Function ProcessUntrustedDocument(inputPath As String) As PdfDocument ' Load the document Dim original As PdfDocument = PdfDocument.FromFile(inputPath) ' Step 1: Sanitize to remove active content Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(original) ' Step 2: Clean metadata sanitized.MetaData.Author = "Processed Document" sanitized.MetaData.Creator = "Secure Processor" sanitized.MetaData.Producer = "" sanitized.MetaData.CreationDate = DateTime.Now sanitized.MetaData.ModifiedDate = DateTime.Now ' Remove all custom metadata For Each key As String In sanitized.MetaData.Keys() If key <> "Title" AndAlso key <> "Author" AndAlso key <> "CreationDate" AndAlso key <> "ModifiedDate" Then sanitized.MetaData.RemoveMetaDataKey(key) End If Next Return sanitized End Function End Class ' Usage Module Program Sub Main() Dim processor As New SecureDocumentProcessor() Dim safe As PdfDocument = processor.ProcessUntrustedDocument("email-attachment.pdf") safe.SaveAs("email-attachment-safe.pdf") End Sub End Module $vbLabelText $csharpLabel 如何扫描 PDF 以查找安全漏洞? 在对文档进行处理或消毒之前,您可能需要评估它们包含哪些潜在威胁。 IronPDF 的 Cleaner.ScanPdf 方法使用 YARA 规则检查文档,YARA 规则是恶意软件分析和威胁检测中常用的模式定义。 扫描识别与恶意 PDF 文件相关的特征。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/scan-vulnerabilities.cs using IronPdf; // Load the document to scan PdfDocument pdf = PdfDocument.FromFile("suspicious-document.pdf"); // Scan using default YARA rules CleanerScanResult scanResult = Cleaner.ScanPdf(pdf); // Check the scan results bool threatsDetected = scanResult.IsDetected; int riskCount = scanResult.Risks.Count; // Process identified risks if (scanResult.IsDetected) { foreach (var risk in scanResult.Risks) { // Handle each identified risk } // Sanitize the document before use PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf); sanitized.SaveAs("suspicious-document-safe.pdf"); } Imports IronPdf ' Load the document to scan Dim pdf As PdfDocument = PdfDocument.FromFile("suspicious-document.pdf") ' Scan using default YARA rules Dim scanResult As CleanerScanResult = Cleaner.ScanPdf(pdf) ' Check the scan results Dim threatsDetected As Boolean = scanResult.IsDetected Dim riskCount As Integer = scanResult.Risks.Count ' Process identified risks If scanResult.IsDetected Then For Each risk In scanResult.Risks ' Handle each identified risk Next ' Sanitize the document before use Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf) sanitized.SaveAs("suspicious-document-safe.pdf") End If $vbLabelText $csharpLabel 您可以提供定制的 YARA 规则文件,以满足专门的检测要求。 具有特定威胁模型或合规需求的组织通常会针对特定漏洞模式维护自己的规则集。 :path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/scan-custom-yara.cs using IronPdf; PdfDocument pdf = PdfDocument.FromFile("incoming-document.pdf"); // Scan with custom YARA rules string[] customYaraFiles = { "corporate-rules.yar", "industry-specific.yar" }; CleanerScanResult result = Cleaner.ScanPdf(pdf, customYaraFiles); if (result.IsDetected) { // Document triggered custom rules and requires review or sanitization PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf); sanitized.SaveAs("incoming-document-safe.pdf"); } Imports IronPdf Dim pdf As PdfDocument = PdfDocument.FromFile("incoming-document.pdf") ' Scan with custom YARA rules Dim customYaraFiles As String() = {"corporate-rules.yar", "industry-specific.yar"} Dim result As CleanerScanResult = Cleaner.ScanPdf(pdf, customYaraFiles) If result.IsDetected Then ' Document triggered custom rules and requires review or sanitization Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf) sanitized.SaveAs("incoming-document-safe.pdf") End If $vbLabelText $csharpLabel 将扫描集成到文档接收工作流程中有助于自动做出安全决策: using IronPdf; using System; using System.IO; public enum DocumentSafetyLevel { Safe, Suspicious, Dangerous } public class DocumentSecurityGateway { public DocumentSafetyLevel EvaluateDocument(string filePath) { PdfDocument pdf = PdfDocument.FromFile(filePath); CleanerScanResult scan = Cleaner.ScanPdf(pdf); if (!scan.IsDetected) { return DocumentSafetyLevel.Safe; } // Evaluate severity based on number of risks if (scan.Risks.Count > 5) { return DocumentSafetyLevel.Dangerous; } return DocumentSafetyLevel.Suspicious; } public PdfDocument ProcessIncomingDocument(string filePath, string outputDirectory) { DocumentSafetyLevel safety = EvaluateDocument(filePath); string fileName = Path.GetFileName(filePath); switch (safety) { case DocumentSafetyLevel.Safe: return PdfDocument.FromFile(filePath); case DocumentSafetyLevel.Suspicious: PdfDocument suspicious = PdfDocument.FromFile(filePath); return Cleaner.SanitizeWithSvg(suspicious); case DocumentSafetyLevel.Dangerous: throw new SecurityException($"Document {fileName} contains dangerous content"); default: throw new InvalidOperationException("Unknown safety level"); } } } using IronPdf; using System; using System.IO; public enum DocumentSafetyLevel { Safe, Suspicious, Dangerous } public class DocumentSecurityGateway { public DocumentSafetyLevel EvaluateDocument(string filePath) { PdfDocument pdf = PdfDocument.FromFile(filePath); CleanerScanResult scan = Cleaner.ScanPdf(pdf); if (!scan.IsDetected) { return DocumentSafetyLevel.Safe; } // Evaluate severity based on number of risks if (scan.Risks.Count > 5) { return DocumentSafetyLevel.Dangerous; } return DocumentSafetyLevel.Suspicious; } public PdfDocument ProcessIncomingDocument(string filePath, string outputDirectory) { DocumentSafetyLevel safety = EvaluateDocument(filePath); string fileName = Path.GetFileName(filePath); switch (safety) { case DocumentSafetyLevel.Safe: return PdfDocument.FromFile(filePath); case DocumentSafetyLevel.Suspicious: PdfDocument suspicious = PdfDocument.FromFile(filePath); return Cleaner.SanitizeWithSvg(suspicious); case DocumentSafetyLevel.Dangerous: throw new SecurityException($"Document {fileName} contains dangerous content"); default: throw new InvalidOperationException("Unknown safety level"); } } } Imports IronPdf Imports System Imports System.IO Public Enum DocumentSafetyLevel Safe Suspicious Dangerous End Enum Public Class DocumentSecurityGateway Public Function EvaluateDocument(filePath As String) As DocumentSafetyLevel Dim pdf As PdfDocument = PdfDocument.FromFile(filePath) Dim scan As CleanerScanResult = Cleaner.ScanPdf(pdf) If Not scan.IsDetected Then Return DocumentSafetyLevel.Safe End If ' Evaluate severity based on number of risks If scan.Risks.Count > 5 Then Return DocumentSafetyLevel.Dangerous End If Return DocumentSafetyLevel.Suspicious End Function Public Function ProcessIncomingDocument(filePath As String, outputDirectory As String) As PdfDocument Dim safety As DocumentSafetyLevel = EvaluateDocument(filePath) Dim fileName As String = Path.GetFileName(filePath) Select Case safety Case DocumentSafetyLevel.Safe Return PdfDocument.FromFile(filePath) Case DocumentSafetyLevel.Suspicious Dim suspicious As PdfDocument = PdfDocument.FromFile(filePath) Return Cleaner.SanitizeWithSvg(suspicious) Case DocumentSafetyLevel.Dangerous Throw New SecurityException($"Document {fileName} contains dangerous content") Case Else Throw New InvalidOperationException("Unknown safety level") End Select End Function End Class $vbLabelText $csharpLabel 如何构建完整的重反应和净化管道? 生产文档处理通常需要将多种保护技术结合到一个连贯的工作流程中。 一个完整的流水线可能会扫描传入文档以查找威胁,对通过初步筛选的文档进行消毒,应用文本和区域编辑,剥离元数据,并生成记录所有操作的审计日志。 本示例展示了这种综合方法。 using IronPdf; using IronSoftware.Drawing; using System; using System.Collections.Generic; using System.IO; using System.Text.RegularExpressions; public class DocumentProcessingResult { public string OriginalFile { get; set; } public string OutputFile { get; set; } public bool WasSanitized { get; set; } public int TextRedactionsApplied { get; set; } public int RegionRedactionsApplied { get; set; } public bool MetadataCleaned { get; set; } public List<string> SensitiveDataTypesFound { get; set; } = new List<string>(); public DateTime ProcessedAt { get; set; } public bool Success { get; set; } public string ErrorMessage { get; set; } } public class ComprehensiveDocumentProcessor { // Sensitive data patterns private readonly Dictionary<string, string> _sensitivePatterns = new Dictionary<string, string> { { "SSN", @"\b\d{3}-\d{2}-\d{4}\b" }, { "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" }, { "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" }, { "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" } }; // Standard regions to redact (signature areas, photo locations) private readonly List<RectangleF> _standardRedactionRegions = new List<RectangleF> { new RectangleF(72, 72, 200, 50), // Bottom left signature new RectangleF(350, 72, 200, 50) // Bottom right signature }; private readonly string _organizationName; public ComprehensiveDocumentProcessor(string organizationName) { _organizationName = organizationName; } public DocumentProcessingResult ProcessDocument( string inputPath, string outputPath, bool sanitize = true, bool redactPatterns = true, bool redactRegions = true, bool cleanMetadata = true, List<string> additionalTermsToRedact = null) { var result = new DocumentProcessingResult { OriginalFile = inputPath, OutputFile = outputPath, ProcessedAt = DateTime.Now }; try { // Load the document PdfDocument pdf = PdfDocument.FromFile(inputPath); // Step 1: Security scan CleanerScanResult scanResult = Cleaner.ScanPdf(pdf); if (scanResult.IsDetected && scanResult.Risks.Count > 10) { throw new SecurityException("Document contains too many security risks to process"); } // Step 2: Sanitization (if needed or requested) if (sanitize || scanResult.IsDetected) { pdf = Cleaner.SanitizeWithSvg(pdf); result.WasSanitized = true; } // Step 3: Pattern-based text redaction if (redactPatterns) { string fullText = pdf.ExtractAllText(); HashSet<string> valuesToRedact = new HashSet<string>(); foreach (var pattern in _sensitivePatterns) { Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase); MatchCollection matches = regex.Matches(fullText); if (matches.Count > 0) { result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})"); foreach (Match match in matches) { valuesToRedact.Add(match.Value); } } } // Apply redactions foreach (string value in valuesToRedact) { pdf.RedactTextOnAllPages(value); result.TextRedactionsApplied++; } } // Step 4: Additional specific terms if (additionalTermsToRedact != null) { foreach (string term in additionalTermsToRedact) { pdf.RedactTextOnAllPages(term); result.TextRedactionsApplied++; } } // Step 5: Region-based redaction if (redactRegions) { foreach (RectangleF region in _standardRedactionRegions) { pdf.RedactRegionsOnAllPages(region); result.RegionRedactionsApplied++; } } // Step 6: Metadata cleaning if (cleanMetadata) { pdf.MetaData.Author = _organizationName; pdf.MetaData.Creator = $"{_organizationName} Document Processor"; pdf.MetaData.Producer = ""; pdf.MetaData.Subject = ""; pdf.MetaData.Keywords = ""; pdf.MetaData.CreationDate = DateTime.Now; pdf.MetaData.ModifiedDate = DateTime.Now; result.MetadataCleaned = true; } // Step 7: Save the processed document pdf.SaveAs(outputPath); result.Success = true; } catch (Exception ex) { result.Success = false; result.ErrorMessage = ex.Message; } return result; } } // Usage example class Program { static void Main() { var processor = new ComprehensiveDocumentProcessor("Acme Corporation"); // Process a single document with all protections var result = processor.ProcessDocument( inputPath: "customer-application.pdf", outputPath: "customer-application-redacted.pdf", sanitize: true, redactPatterns: true, redactRegions: true, cleanMetadata: true, additionalTermsToRedact: new List<string> { "Project Alpha", "Internal Use Only" } ); // Batch process multiple documents string[] inputFiles = Directory.GetFiles("incoming", "*.pdf"); foreach (string file in inputFiles) { string outputFile = Path.Combine("processed", Path.GetFileName(file)); processor.ProcessDocument(file, outputFile); } } } using IronPdf; using IronSoftware.Drawing; using System; using System.Collections.Generic; using System.IO; using System.Text.RegularExpressions; public class DocumentProcessingResult { public string OriginalFile { get; set; } public string OutputFile { get; set; } public bool WasSanitized { get; set; } public int TextRedactionsApplied { get; set; } public int RegionRedactionsApplied { get; set; } public bool MetadataCleaned { get; set; } public List<string> SensitiveDataTypesFound { get; set; } = new List<string>(); public DateTime ProcessedAt { get; set; } public bool Success { get; set; } public string ErrorMessage { get; set; } } public class ComprehensiveDocumentProcessor { // Sensitive data patterns private readonly Dictionary<string, string> _sensitivePatterns = new Dictionary<string, string> { { "SSN", @"\b\d{3}-\d{2}-\d{4}\b" }, { "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" }, { "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" }, { "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" } }; // Standard regions to redact (signature areas, photo locations) private readonly List<RectangleF> _standardRedactionRegions = new List<RectangleF> { new RectangleF(72, 72, 200, 50), // Bottom left signature new RectangleF(350, 72, 200, 50) // Bottom right signature }; private readonly string _organizationName; public ComprehensiveDocumentProcessor(string organizationName) { _organizationName = organizationName; } public DocumentProcessingResult ProcessDocument( string inputPath, string outputPath, bool sanitize = true, bool redactPatterns = true, bool redactRegions = true, bool cleanMetadata = true, List<string> additionalTermsToRedact = null) { var result = new DocumentProcessingResult { OriginalFile = inputPath, OutputFile = outputPath, ProcessedAt = DateTime.Now }; try { // Load the document PdfDocument pdf = PdfDocument.FromFile(inputPath); // Step 1: Security scan CleanerScanResult scanResult = Cleaner.ScanPdf(pdf); if (scanResult.IsDetected && scanResult.Risks.Count > 10) { throw new SecurityException("Document contains too many security risks to process"); } // Step 2: Sanitization (if needed or requested) if (sanitize || scanResult.IsDetected) { pdf = Cleaner.SanitizeWithSvg(pdf); result.WasSanitized = true; } // Step 3: Pattern-based text redaction if (redactPatterns) { string fullText = pdf.ExtractAllText(); HashSet<string> valuesToRedact = new HashSet<string>(); foreach (var pattern in _sensitivePatterns) { Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase); MatchCollection matches = regex.Matches(fullText); if (matches.Count > 0) { result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})"); foreach (Match match in matches) { valuesToRedact.Add(match.Value); } } } // Apply redactions foreach (string value in valuesToRedact) { pdf.RedactTextOnAllPages(value); result.TextRedactionsApplied++; } } // Step 4: Additional specific terms if (additionalTermsToRedact != null) { foreach (string term in additionalTermsToRedact) { pdf.RedactTextOnAllPages(term); result.TextRedactionsApplied++; } } // Step 5: Region-based redaction if (redactRegions) { foreach (RectangleF region in _standardRedactionRegions) { pdf.RedactRegionsOnAllPages(region); result.RegionRedactionsApplied++; } } // Step 6: Metadata cleaning if (cleanMetadata) { pdf.MetaData.Author = _organizationName; pdf.MetaData.Creator = $"{_organizationName} Document Processor"; pdf.MetaData.Producer = ""; pdf.MetaData.Subject = ""; pdf.MetaData.Keywords = ""; pdf.MetaData.CreationDate = DateTime.Now; pdf.MetaData.ModifiedDate = DateTime.Now; result.MetadataCleaned = true; } // Step 7: Save the processed document pdf.SaveAs(outputPath); result.Success = true; } catch (Exception ex) { result.Success = false; result.ErrorMessage = ex.Message; } return result; } } // Usage example class Program { static void Main() { var processor = new ComprehensiveDocumentProcessor("Acme Corporation"); // Process a single document with all protections var result = processor.ProcessDocument( inputPath: "customer-application.pdf", outputPath: "customer-application-redacted.pdf", sanitize: true, redactPatterns: true, redactRegions: true, cleanMetadata: true, additionalTermsToRedact: new List<string> { "Project Alpha", "Internal Use Only" } ); // Batch process multiple documents string[] inputFiles = Directory.GetFiles("incoming", "*.pdf"); foreach (string file in inputFiles) { string outputFile = Path.Combine("processed", Path.GetFileName(file)); processor.ProcessDocument(file, outputFile); } } } Imports IronPdf Imports IronSoftware.Drawing Imports System Imports System.Collections.Generic Imports System.IO Imports System.Text.RegularExpressions Public Class DocumentProcessingResult Public Property OriginalFile As String Public Property OutputFile As String Public Property WasSanitized As Boolean Public Property TextRedactionsApplied As Integer Public Property RegionRedactionsApplied As Integer Public Property MetadataCleaned As Boolean Public Property SensitiveDataTypesFound As List(Of String) = New List(Of String)() Public Property ProcessedAt As DateTime Public Property Success As Boolean Public Property ErrorMessage As String End Class Public Class ComprehensiveDocumentProcessor ' Sensitive data patterns Private ReadOnly _sensitivePatterns As Dictionary(Of String, String) = New Dictionary(Of String, String) From { {"SSN", "\b\d{3}-\d{2}-\d{4}\b"}, {"Credit Card", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"}, {"Email", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"}, {"Phone", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"} } ' Standard regions to redact (signature areas, photo locations) Private ReadOnly _standardRedactionRegions As List(Of RectangleF) = New List(Of RectangleF) From { New RectangleF(72, 72, 200, 50), ' Bottom left signature New RectangleF(350, 72, 200, 50) ' Bottom right signature } Private ReadOnly _organizationName As String Public Sub New(organizationName As String) _organizationName = organizationName End Sub Public Function ProcessDocument( inputPath As String, outputPath As String, Optional sanitize As Boolean = True, Optional redactPatterns As Boolean = True, Optional redactRegions As Boolean = True, Optional cleanMetadata As Boolean = True, Optional additionalTermsToRedact As List(Of String) = Nothing) As DocumentProcessingResult Dim result As New DocumentProcessingResult With { .OriginalFile = inputPath, .OutputFile = outputPath, .ProcessedAt = DateTime.Now } Try ' Load the document Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath) ' Step 1: Security scan Dim scanResult As CleanerScanResult = Cleaner.ScanPdf(pdf) If scanResult.IsDetected AndAlso scanResult.Risks.Count > 10 Then Throw New SecurityException("Document contains too many security risks to process") End If ' Step 2: Sanitization (if needed or requested) If sanitize OrElse scanResult.IsDetected Then pdf = Cleaner.SanitizeWithSvg(pdf) result.WasSanitized = True End If ' Step 3: Pattern-based text redaction If redactPatterns Then Dim fullText As String = pdf.ExtractAllText() Dim valuesToRedact As New HashSet(Of String)() For Each pattern In _sensitivePatterns Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase) Dim matches As MatchCollection = regex.Matches(fullText) If matches.Count > 0 Then result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})") For Each match As Match In matches valuesToRedact.Add(match.Value) Next End If Next ' Apply redactions For Each value As String In valuesToRedact pdf.RedactTextOnAllPages(value) result.TextRedactionsApplied += 1 Next End If ' Step 4: Additional specific terms If additionalTermsToRedact IsNot Nothing Then For Each term As String In additionalTermsToRedact pdf.RedactTextOnAllPages(term) result.TextRedactionsApplied += 1 Next End If ' Step 5: Region-based redaction If redactRegions Then For Each region As RectangleF In _standardRedactionRegions pdf.RedactRegionsOnAllPages(region) result.RegionRedactionsApplied += 1 Next End If ' Step 6: Metadata cleaning If cleanMetadata Then pdf.MetaData.Author = _organizationName pdf.MetaData.Creator = $"{_organizationName} Document Processor" pdf.MetaData.Producer = "" pdf.MetaData.Subject = "" pdf.MetaData.Keywords = "" pdf.MetaData.CreationDate = DateTime.Now pdf.MetaData.ModifiedDate = DateTime.Now result.MetadataCleaned = True End If ' Step 7: Save the processed document pdf.SaveAs(outputPath) result.Success = True Catch ex As Exception result.Success = False result.ErrorMessage = ex.Message End Try Return result End Function End Class ' Usage example Class Program Shared Sub Main() Dim processor As New ComprehensiveDocumentProcessor("Acme Corporation") ' Process a single document with all protections Dim result = processor.ProcessDocument( inputPath:="customer-application.pdf", outputPath:="customer-application-redacted.pdf", sanitize:=True, redactPatterns:=True, redactRegions:=True, cleanMetadata:=True, additionalTermsToRedact:=New List(Of String) From {"Project Alpha", "Internal Use Only"} ) ' Batch process multiple documents Dim inputFiles As String() = Directory.GetFiles("incoming", "*.pdf") For Each file As String In inputFiles Dim outputFile As String = Path.Combine("processed", Path.GetFileName(file)) processor.ProcessDocument(file, outputFile) Next End Sub End Class $vbLabelText $csharpLabel 输入 客户申请表包含多种类型的敏感数据,包括 SSN、信用卡号、电子邮件地址和需要全面保护的签名块。 输出示例 该综合处理器将本指南中涉及的所有技术整合到一个可配置的类中。 它可以扫描威胁、在必要时进行消毒、查找和编辑敏感模式、应用区域编辑、清理元数据并生成详细报告。 您可以调整敏感度模式、编辑区域和处理选项,以满足您的特定要求。 下一步 保护 PDF 文档中的敏感信息需要的不仅仅是表面的措施。 真正的节录将永久删除文档结构中的内容。 模式匹配可自动发现和删除社会保障号、信用卡详细信息和电子邮件地址等数据。 基于区域的节录可以处理文本匹配无法处理的签名、照片和其他图形元素。 元数据清理可消除可能泄露作者、时间戳或内部文件路径的隐藏信息。 桑尼特化可删除存在安全风险的嵌入式脚本和活动内容。 IronPDF 通过与 C# 和 .NET 开发实践自然集成的一致、精心设计的 API 提供所有这些功能。 本指南中演示的方法可处理单个文档或扩展到 批量处理数千个文件。 无论您是在为医疗保健数据构建合规工作流,还是在为取证准备法律文件,或者仅仅是在确保内部报告可以安全地对外共享,这些技术都是负责任地处理文档的基础。 要实现全面的安全覆盖,请将节录与密码保护和权限和数字签名结合起来。 准备好开始构建了吗? 下载 IronPDF 并免费试用。 该库包含一个免费的开发许可证,因此您可以在使用生产许可证之前充分评估编辑、文本提取和消毒功能。 如果您有关于实施或合规性工作流程的问题,请联系我们的工程支持团队。 常见问题解答 什么是 PDF 编辑? PDF 编辑是从 PDF 文档中永久删除敏感信息的过程。这可能包括出于隐私或合规原因需要隐藏的文本、图像和元数据。 如何使用 C# 编辑 PDF 中的信息? 您可以使用 IronPDF 使用 C# 编辑 PDF 中的信息。它允许您永久删除或隐藏 PDF 文档中的文本、图像和元数据,确保文档符合隐私和合规标准。 为什么 PDF 编辑对合规性很重要? PDF 编辑对于遵守 HIPAA、GDPR 和 PCI DSS 等标准至关重要,因为它有助于确保敏感数据的安全,防止未经授权访问机密信息。 IronPDF 可以编辑 PDF 的整个区域吗? 是的,IronPDF 可以编辑 PDF 的整个区域。这样您就可以定义文档中出于安全目的需要隐藏或删除的特定区域。 使用 IronPDF 可以编辑哪些类型的数据? IronPDF 可以编辑 PDF 文档中的文本、图像和元数据等各种类型的数据,确保全面的数据隐私和安全。 IronPDF 支持对文档进行消毒吗? 是的,IronPDF 支持对文档进行消毒,这包括清理 PDF,删除可能不可见但仍可能造成隐私风险的隐藏数据或元数据。 是否可以使用 IronPDF 自动编辑 PDF? 是的,IronPDF 允许在 C# 中实现 PDF 编辑流程的自动化,从而更轻松地处理需要删除敏感数据的大量文件。 IronPdf 如何确保节录的永久性? IronPDF 可确保编辑的永久性,从文档中永久删除所选文本和图像,而不仅仅是遮盖它们,这意味着它们无法恢复或查看。 IronPDF 可以编辑 PDF 中的元数据吗? 是的,IronPDF 可以编辑 PDF 文档中的元数据,确保彻底删除所有形式的敏感数据,包括隐藏数据或背景数据。 使用 IronPDF 进行 PDF 编辑有哪些好处? 使用 IronPdf 进行 PDF 编辑具有多种优势,如确保符合数据保护法规、增强文档安全性,以及为管理敏感信息提供高效的自动化流程。 Curtis Chau 立即与工程团队聊天 技术作家 Curtis Chau 拥有卡尔顿大学的计算机科学学士学位,专注于前端开发,精通 Node.js、TypeScript、JavaScript 和 React。他热衷于打造直观且美观的用户界面,喜欢使用现代框架并创建结构良好、视觉吸引力强的手册。除了开发之外,Curtis 对物联网 (IoT) 有浓厚的兴趣,探索将硬件和软件集成的新方法。在空闲时间,他喜欢玩游戏和构建 Discord 机器人,将他对技术的热爱与创造力相结合。 准备开始了吗? Nuget 下载 17,386,124 | 版本: 2026.2 刚刚发布 免费 NuGet 下载 总下载量:17,386,124 查看许可证