PDF Redaction in C#: Remove Sensitive Data and Sanitize Documents with IronPDF
using IronPDF 在 C# .NET 中進行 PDF 遮蔽處理,會從文件的內部結構中永久移除敏感內容,而非僅在視覺上進行遮蓋,因此無論進行多少次複製、搜尋或鑑識分析,皆無法復原原始資料。 這遠不止於在文字上覆蓋黑色矩形:IronPDF 提供基於正規表達式模式匹配的文字遮蔽、針對簽名與圖像的區域式遮蔽、元資料剝離、用於消除嵌入式腳本的文件淨化,以及漏洞掃描功能,為 .NET 開發人員提供一套完整的工具組,以滿足 HIPAA、GDPR 和 PCI DSS 合規的文件保護工作流程需求。
TL;DR:快速入門指南
本教學將介紹如何在 C# .NET 環境中永久移除 PDF 文件中的敏感內容,包括文字模式、圖片區域、元資料及嵌入式腳本。
- 適用對象:在醫療、法律、金融或政府領域處理敏感文件的 .NET 開發人員。
- 您將開發的內容:使用正規表達式模式匹配進行文字遮蔽(社會安全號碼、信用卡號、電子郵件),針對簽名和照片進行基於座標的區域遮蔽,清理元資料,執行 PDF 淨化以移除嵌入式腳本,以及基於 YARA 的漏洞掃描。
- 執行環境:.NET 10、.NET 8 LTS、.NET Framework 4.6.2 以上版本,以及 .NET Standard 2.0。所有操作均在本地執行,無需外部依賴項。
- 何時適用此方法:當您需要為法律取證、資訊自由法(FOIA)請求或對外發佈而分享文件時,同時需確保已移除的內容確實無法復原。
- 技術上的重要性:視覺疊加層會將原始文字保留在 PDF 的內容流中,使其可被恢復。 IronPDF 的遮蔽功能會從文件結構本身刪除字元資料,使其無法復原。
只需幾行程式碼,即可從 PDF 中遮蔽敏感文字:
-
using NuGet 套件管理員安裝 https://www.nuget.org/packages/IronPdf
PM > Install-Package IronPdf -
請複製並執行此程式碼片段。
using IronPdf; PdfDocument pdf = PdfDocument.FromFile("confidential-report.pdf"); pdf.RedactTextOnAllPages("CONFIDENTIAL"); pdf.SaveAs("redacted-report.pdf"); -
部署至您的生產環境進行測試
立即透過免費試用,在您的專案中開始使用 IronPDF
購買 IronPDF 或註冊 30 天試用版後,請在應用程式啟動時輸入您的授權金鑰。
IronPdf.License.LicenseKey = "KEY";
IronPdf.License.LicenseKey = "KEY";
Imports IronPdf
IronPdf.License.LicenseKey = "KEY"
立即透過免費試用,在您的專案中開始使用 IronPDF。
目錄
- 重點摘要:快速入門指南
- 從 PDF 文件中遮蔽文字
- 模式比對與自動遮蔽
- 基於區域的遮蔽處理
- 從 PDF 元資料中移除敏感資料
- .NET 中的 PDF 淨化處理
- 完整工作流程
- 後續步驟
"真實遮蔽"與"視覺疊加"有何區別?
對於處理敏感文件的人員而言,理解"真正遮蔽"與"視覺疊加"之間的區別至關重要。 許多工具和手動方法僅在表面上呈現刪除效果,實際上並未移除底層資料。 這種虛假的安全感已導致無數備受矚目的資料外洩事件與合規失敗案例。
視覺覆蓋法通常是在敏感內容上繪製不透明的圖形。 原始文字在 PDF 結構中保持完整。 查看文件的人會看到一個黑色矩形,但原始字元仍存在於檔案的內容流中。 選取頁面上的所有文字、使用輔助技術工具,或檢視原始 PDF 資料,即可揭露所有據稱被隱藏的內容。 曾有法庭案件因對方律師輕易地將經遮蔽處理的文件文件還原,而導致案件內容外洩。 政府機關曾不慎洩露機密資訊,這些資訊看似經過審查,但實際上完全可以還原。
真正的內容遮蔽運作方式有所不同。 當您使用 IronPDF 的遮蔽方法時,該函式庫會定位 PDF 內部結構中的指定文字,並將其完全移除。 字元資料已從內容串流中刪除。 視覺呈現內容已被遮蔽標記取代,通常為黑色矩形,但原始內容已從檔案中徹底消失。無論如何選取、複製或進行鑑識分析,皆無法復原已被永久刪除的內容。
IronPDF 透過在結構層面修改 PDF 檔案,實現真正的遮蔽處理。 RedactTextOnAllPages 方法及其變體會搜尋頁面內容,識別符合條件的文字,將其從文件物件模型中移除,並可選擇在該內容原先出現的位置繪製視覺標記。 此做法符合 NIST 等組織針對安全文件遮蔽所制定的準則。
其實際影響相當重大。 若您需要對外分享文件、提交法律取證文件、根據資訊自由請求發布紀錄,或在保護個人識別資訊的同時分發報告,唯有真正的遮蔽處理才能提供充分的保護。 視覺覆蓋層或許足以應付內部草稿——當您僅需將注意力從特定段落移開時——但絕不應依賴其進行實際的資料保護。 如需其他文件安全措施,請參閱我們關於 PDF 加密與數位簽名的指南。
How do I Redact PDF Text in C# Across an Entire Document?
最常見的刪除情境是移除文件中所有特定文字的出現位置。 您可能需要從報告中刪除個人姓名、從財務報表中移除帳戶號碼,或在對外發布前刪除內部參考代碼。 IronPDF 透過 RedactTextOnAllPages 方法讓這項操作變得簡單直觀。
輸入
一名員工記錄了包含個人資訊的文件,其中包含姓名、社會安全號碼及員工編號。
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-text-all-pages.cs
using IronPdf;
// Load the source document
PdfDocument pdf = PdfDocument.FromFile("employee-records.pdf");
// Redact an employee name from the entire document
pdf.RedactTextOnAllPages("John Smith");
// Redact a Social Security Number
pdf.RedactTextOnAllPages("123-45-6789");
// Redact an internal employee ID
pdf.RedactTextOnAllPages("EMP-2024-0042");
// Save the cleaned document
pdf.SaveAs("employee-records-redacted.pdf");
Imports IronPdf
' Load the source document
Dim pdf As PdfDocument = PdfDocument.FromFile("employee-records.pdf")
' Redact an employee name from the entire document
pdf.RedactTextOnAllPages("John Smith")
' Redact a Social Security Number
pdf.RedactTextOnAllPages("123-45-6789")
' Redact an internal employee ID
pdf.RedactTextOnAllPages("EMP-2024-0042")
' Save the cleaned document
pdf.SaveAs("employee-records-redacted.pdf")
此程式碼會載入一份包含員工資訊的 PDF 檔案,並透過針對每個值呼叫 RedactTextOnAllPages 來移除三項機密資料。 每次呼叫都會搜尋文件中的每一頁,並永久移除所有與員工姓名、社會安全號碼及內部識別碼相符的項目。
翻譯範例
預設行為是在原始文字被遮蔽的位置繪製黑色矩形,並在文件結構中將實際字元替換為星號。 此舉既可視覺化確認內容已遭遮蔽,亦能確保原始內容已完全移除。
當處理較長的文件或多個編輯目標時,您可以高效地串接這些呼叫:
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-text-list.cs
using IronPdf;
using System.Collections.Generic;
// Load the document once
PdfDocument pdf = PdfDocument.FromFile("quarterly-report.pdf");
// Define all terms that need redaction
List<string> sensitiveTerms = new List<string>
{
"Project Titan",
"Sarah Johnson",
"Budget: $4.2M",
"Q3-INTERNAL-2024",
"sarah.johnson@company.com"
};
// Redact each term
foreach (string term in sensitiveTerms)
{
pdf.RedactTextOnAllPages(term);
}
// Save the result
pdf.SaveAs("quarterly-report-public.pdf");
Imports IronPdf
Imports System.Collections.Generic
' Load the document once
Dim pdf As PdfDocument = PdfDocument.FromFile("quarterly-report.pdf")
' Define all terms that need redaction
Dim sensitiveTerms As New List(Of String) From {
"Project Titan",
"Sarah Johnson",
"Budget: $4.2M",
"Q3-INTERNAL-2024",
"sarah.johnson@company.com"
}
' Redact each term
For Each term As String In sensitiveTerms
pdf.RedactTextOnAllPages(term)
Next
' Save the result
pdf.SaveAs("quarterly-report-public.pdf")
當您已知有一份需移除的敏感值清單時,此模式效果甚佳。 文件載入一次後,所有編輯操作皆在記憶體中進行,最終結果將被儲存。 每個術語均獨立處理,因此術語間的局部匹配或格式差異不會影響其他翻譯結果。
如何僅對特定頁面的文字進行遮蔽處理?
有時您需要更精確地控制遮蔽內容的位置。 文件可能包含封面頁,其資訊應保持完整;或者您可能已知機密資料僅出現在特定段落中。 IronPDF 提供 RedactTextOnPage 用於單頁遮蔽,以及 RedactTextOnPages 用於針對多個特定頁面進行遮蔽。
輸入
一份多頁合約文件包,簽署頁上載有客戶名稱,財務條款則散見於文件各處的特定頁面。
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-specific-pages.cs
using IronPdf;
// Load the document
PdfDocument pdf = PdfDocument.FromFile("contract-bundle.pdf");
// Redact text only on page 1 (index 0)
pdf.RedactTextOnPage(0, "Client Name: Acme Corporation");
// Redact text on pages 3, 5, and 7 (indices 2, 4, 6)
int[] financialPages = { 2, 4, 6 };
pdf.RedactTextOnPages(financialPages, "Payment Terms: Net 30");
// Other pages remain untouched except for the specific redactions applied
pdf.SaveAs("contract-bundle-redacted.pdf");
Imports IronPdf
' Load the document
Dim pdf As PdfDocument = PdfDocument.FromFile("contract-bundle.pdf")
' Redact text only on page 1 (index 0)
pdf.RedactTextOnPage(0, "Client Name: Acme Corporation")
' Redact text on pages 3, 5, and 7 (indices 2, 4, 6)
Dim financialPages As Integer() = {2, 4, 6}
pdf.RedactTextOnPages(financialPages, "Payment Terms: Net 30")
' Other pages remain untouched except for the specific redactions applied
pdf.SaveAs("contract-bundle-redacted.pdf")
此程式碼示範了針對性遮蔽技術,其中 RedactTextOnPage 用於單一頁面,而 RedactTextOnPages 則用於多個特定頁面。 僅從第 1 頁(索引 0)移除客戶名稱,並從第 3、5 和 7 頁(索引 2、4、6)遮蔽付款條款,其餘頁面均保持不變。
翻譯範例
IronPDF 的頁面索引採用零起始制,即第一頁索引為 0,第二頁索引為 1,依此類推。 此寫法符合標準程式設計慣例,並與多數開發者對陣列存取的認知相符。
針對特定頁面進行處理,可提升處理大型文件時的效能。 與其掃描數百頁文件尋找僅出現在少數位置的文字,您可直接指示遮蔽引擎精確搜尋目標位置。 這對於可能需要處理數千份文件的批次處理情境至關重要。 為達到最大處理量,請考慮採用非同步與多執行緒技術。
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-large-document.cs
using IronPdf;
// Process a large document efficiently
PdfDocument pdf = PdfDocument.FromFile("annual-report-500-pages.pdf");
// We know from document structure that:
// - Executive summary with names is on pages 1-3
// - Financial data is on pages 45-60
// - Appendix with employee info is on pages 480-495
// Redact executive names from summary section
for (int i = 0; i <= 2; i++)
{
pdf.RedactTextOnPage(i, "CEO: Robert Williams");
pdf.RedactTextOnPage(i, "CFO: Maria Garcia");
}
// Redact specific financial figures from the financial section
int[] financialSection = { 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 };
pdf.RedactTextOnPages(financialSection, "Net Revenue: $847M");
// Redact employee identifiers from appendix
for (int i = 479; i <= 494; i++)
{
pdf.RedactTextOnPage(i, "Employee ID:");
}
pdf.SaveAs("annual-report-public-release.pdf");
Imports IronPdf
' Process a large document efficiently
Dim pdf As PdfDocument = PdfDocument.FromFile("annual-report-500-pages.pdf")
' We know from document structure that:
' - Executive summary with names is on pages 1-3
' - Financial data is on pages 45-60
' - Appendix with employee info is on pages 480-495
' Redact executive names from summary section
For i As Integer = 0 To 2
pdf.RedactTextOnPage(i, "CEO: Robert Williams")
pdf.RedactTextOnPage(i, "CFO: Maria Garcia")
Next
' Redact specific financial figures from the financial section
Dim financialSection As Integer() = {44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59}
pdf.RedactTextOnPages(financialSection, "Net Revenue: $847M")
' Redact employee identifiers from appendix
For i As Integer = 479 To 494
pdf.RedactTextOnPage(i, "Employee ID:")
Next
pdf.SaveAs("annual-report-public-release.pdf")
這種針對性處理方式僅處理 500 頁文件中的相關段落,相較於逐頁掃描每個需遮蔽的詞彙,能顯著縮短執行時間。
如何自訂遮蔽內容的外觀?
IronPDF 提供多種參數,用於控制最終文件中遮蔽內容的呈現方式。 您可以調整大小寫敏感度、整詞匹配、是否繪製視覺化矩形框,以及在遮蔽內容處顯示的替代文字。
輸入
一份法律摘要,其中包含各類敏感術語,例如分類標籤、密碼及內部參考代碼,這些內容需要採取不同的遮蔽處理方式。
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/customize-redaction-appearance.cs
using IronPdf;
// Load the document
PdfDocument pdf = PdfDocument.FromFile("legal-brief.pdf");
// Case-sensitive redaction: only matches exact case
// "CLASSIFIED" will be redacted but "classified" or "Classified" will not
pdf.RedactTextOnAllPages(
"CLASSIFIED",
CaseSensitive: true,
OnlyMatchWholeWords: true,
DrawRectangles: true,
ReplacementText: "[REDACTED]"
);
// Case-insensitive redaction: matches regardless of case
// Will redact "Secret", "SECRET", "secret", etc.
pdf.RedactTextOnAllPages(
"secret",
CaseSensitive: false,
OnlyMatchWholeWords: true,
DrawRectangles: true,
ReplacementText: "*****"
);
// Whole word disabled: matches partial strings too
// Will redact "password", "passwords", "mypassword123", etc.
pdf.RedactTextOnAllPages(
"password",
CaseSensitive: false,
OnlyMatchWholeWords: false,
DrawRectangles: true,
ReplacementText: "XXXXX"
);
// No visual rectangle: text is removed but no black box appears
// Useful when you want seamless removal without obvious redaction marks
pdf.RedactTextOnAllPages(
"internal-reference-code",
CaseSensitive: true,
OnlyMatchWholeWords: true,
DrawRectangles: false,
ReplacementText: ""
);
pdf.SaveAs("legal-brief-redacted.pdf");
Imports IronPdf
' Load the document
Dim pdf As PdfDocument = PdfDocument.FromFile("legal-brief.pdf")
' Case-sensitive redaction: only matches exact case
' "CLASSIFIED" will be redacted but "classified" or "Classified" will not
pdf.RedactTextOnAllPages(
"CLASSIFIED",
CaseSensitive:=True,
OnlyMatchWholeWords:=True,
DrawRectangles:=True,
ReplacementText:="[REDACTED]"
)
' Case-insensitive redaction: matches regardless of case
' Will redact "Secret", "SECRET", "secret", etc.
pdf.RedactTextOnAllPages(
"secret",
CaseSensitive:=False,
OnlyMatchWholeWords:=True,
DrawRectangles:=True,
ReplacementText:="*****"
)
' Whole word disabled: matches partial strings too
' Will redact "password", "passwords", "mypassword123", etc.
pdf.RedactTextOnAllPages(
"password",
CaseSensitive:=False,
OnlyMatchWholeWords:=False,
DrawRectangles:=True,
ReplacementText:="XXXXX"
)
' No visual rectangle: text is removed but no black box appears
' Useful when you want seamless removal without obvious redaction marks
pdf.RedactTextOnAllPages(
"internal-reference-code",
CaseSensitive:=True,
OnlyMatchWholeWords:=True,
DrawRectangles:=False,
ReplacementText:=""
)
pdf.SaveAs("legal-brief-redacted.pdf")
此程式碼示範了使用 RedactTextOnAllPages 的可選參數所設定的四種不同遮蔽配置。 其展示以下功能:區分大小寫的完全匹配(以"[REDACTED]"替代)、不區分大小寫的星號匹配、部分單字匹配(以捕捉"passwords"等變體),以及無視覺矩形的隱形移除,實現無縫的內容刪除。
翻譯範例
這些參數會根據您的需求發揮不同作用:
CaseSensitive 決定比對時是否區分大小寫。 法律文件常使用具有特定語義的大寫規則,因此透過區分大小寫的比對,可確保僅移除完全相符的內容。 處理大小寫不一的普通文字時,可能需要採用不區分大小寫的比對方式,以涵蓋所有出現的實例。
OnlyMatchWholeWords 控制搜尋結果是匹配完整單詞還是部分字串。 在遮蔽名稱時,通常應採用整詞匹配,以避免"Smith"意外遮蔽"Blacksmith"或"Smithfield"中的部分字詞。 在處理帳戶號碼前綴等資料時,可能需要採用部分匹配的方式以涵蓋各種變體。
DrawRectangles 指定在內容被移除的位置是否顯示黑框。 多數法規與法律情境要求須標示可見的刪除標記,以證明內容是經刻意刪除,而非意外遺漏。 內部工作流程可能傾向採用隱式移除,以確保輸出結果更為簡潔。
ReplacementText 定義了哪些字元會取代被遮蔽的內容。 常見的處理方式包括使用星號、標註"REDACTED"或保留為空字串。 若有人嘗試選取或複製被遮蔽的區域,文件結構中將顯示替代文字。
如何使用正規表達式來查找並遮蔽敏感模式?
當您有特定值需要移除時,遮蔽已知文字串的方法是可行的,但許多機密資料類型遵循可預測的模式,而非固定值。 社會安全號碼、信用卡號碼、電子郵件地址、電話號碼及日期均具有可識別的格式,可透過正規表達式進行比對。 建立基於模式的遮蔽系統,可讓您在無需預先知曉每個具體數值的情況下,從 PDF 內容中移除私人資訊。
IronPDF 的文字擷取功能結合遮蔽技術,可實現強大的模式比對工作流程。 您需先擷取文字,使用 .NET 正規表達式進行比對,然後對每個檢出的值進行遮蔽處理。
using IronPdf;
using System.Text.RegularExpressions;
using System.Co/llections.Generic;
public class PatternRedactor
{
// Common patterns for sensitive data
private static readonly Dictionary<string, string> SensitivePatterns = new Dictionary<string, string>
{
// US Social Security Number: 123-45-6789
{ "SSN", @"\b\d{3}-\d{2}-\d{4}\b" },
// Credit Card Numbers: various formats with 13-19 digits
{ "CreditCard", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
// Email Addresses
{ "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
// US Phone Numbers: (123) 456-7890 or 123-456-7890
{ "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" },
// Dates: MM/DD/YYYY or MM-DD-YYYY
{ "Date", @"\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" },
// IP Addresses
{ "IPAddress", @"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b" }
};
public void RedactPatterns(string inputPath, string outputPath, params string[] patternNames)
{
// Load the PDF
PdfDocument pdf = PdfDocument.FromFile(inputPath);
// Extract all text from the document
string fullText = pdf.ExtractAllText();
// Track unique matches to avoid duplicate redaction attempts
HashSet<string> matchesToRedact = new HashSet<string>();
// Find all matches for requested patterns
foreach (string patternName in patternNames)
{
if (SensitivePatterns.TryGetValue(patternName, out string pattern))
{
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(fullText);
foreach (Match match in matches)
{
matchesToRedact.Add(match.Value);
}
}
}
// Redact each unique match
foreach (string sensitiveValue in matchesToRedact)
{
pdf.RedactTextOnAllPages(sensitiveValue);
}
// Save the redacted document
pdf.SaveAs(outputPath);
}
}
// Usage example
class Program
{
static void Main()
{
PatternRedactor redactor = new PatternRedactor();
// Redact SSNs and credit cards from a financial document
redactor.RedactPatterns(
"customer-data.pdf",
"customer-data-safe.pdf",
"SSN", "CreditCard", "Email"
);
}
}
using IronPdf;
using System.Text.RegularExpressions;
using System.Co/llections.Generic;
public class PatternRedactor
{
// Common patterns for sensitive data
private static readonly Dictionary<string, string> SensitivePatterns = new Dictionary<string, string>
{
// US Social Security Number: 123-45-6789
{ "SSN", @"\b\d{3}-\d{2}-\d{4}\b" },
// Credit Card Numbers: various formats with 13-19 digits
{ "CreditCard", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
// Email Addresses
{ "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
// US Phone Numbers: (123) 456-7890 or 123-456-7890
{ "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" },
// Dates: MM/DD/YYYY or MM-DD-YYYY
{ "Date", @"\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" },
// IP Addresses
{ "IPAddress", @"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b" }
};
public void RedactPatterns(string inputPath, string outputPath, params string[] patternNames)
{
// Load the PDF
PdfDocument pdf = PdfDocument.FromFile(inputPath);
// Extract all text from the document
string fullText = pdf.ExtractAllText();
// Track unique matches to avoid duplicate redaction attempts
HashSet<string> matchesToRedact = new HashSet<string>();
// Find all matches for requested patterns
foreach (string patternName in patternNames)
{
if (SensitivePatterns.TryGetValue(patternName, out string pattern))
{
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(fullText);
foreach (Match match in matches)
{
matchesToRedact.Add(match.Value);
}
}
}
// Redact each unique match
foreach (string sensitiveValue in matchesToRedact)
{
pdf.RedactTextOnAllPages(sensitiveValue);
}
// Save the redacted document
pdf.SaveAs(outputPath);
}
}
// Usage example
class Program
{
static void Main()
{
PatternRedactor redactor = new PatternRedactor();
// Redact SSNs and credit cards from a financial document
redactor.RedactPatterns(
"customer-data.pdf",
"customer-data-safe.pdf",
"SSN", "CreditCard", "Email"
);
}
}
Imports IronPdf
Imports System.Text.RegularExpressions
Imports System.Collections.Generic
Public Class PatternRedactor
' Common patterns for sensitive data
Private Shared ReadOnly SensitivePatterns As New Dictionary(Of String, String) From {
' US Social Security Number: 123-45-6789
{"SSN", "\b\d{3}-\d{2}-\d{4}\b"},
' Credit Card Numbers: various formats with 13-19 digits
{"CreditCard", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"},
' Email Addresses
{"Email", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"},
' US Phone Numbers: (123) 456-7890 or 123-456-7890
{"Phone", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"},
' Dates: MM/DD/YYYY or MM-DD-YYYY
{"Date", "\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b"},
' IP Addresses
{"IPAddress", "\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b"}
}
Public Sub RedactPatterns(inputPath As String, outputPath As String, ParamArray patternNames As String())
' Load the PDF
Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath)
' Extract all text from the document
Dim fullText As String = pdf.ExtractAllText()
' Track unique matches to avoid duplicate redaction attempts
Dim matchesToRedact As New HashSet(Of String)()
' Find all matches for requested patterns
For Each patternName As String In patternNames
Dim pattern As String = Nothing
If SensitivePatterns.TryGetValue(patternName, pattern) Then
Dim regex As New Regex(pattern, RegexOptions.IgnoreCase)
Dim matches As MatchCollection = regex.Matches(fullText)
For Each match As Match In matches
matchesToRedact.Add(match.Value)
Next
End If
Next
' Redact each unique match
For Each sensitiveValue As String In matchesToRedact
pdf.RedactTextOnAllPages(sensitiveValue)
Next
' Save the redacted document
pdf.SaveAs(outputPath)
End Sub
End Class
' Usage example
Class Program
Shared Sub Main()
Dim redactor As New PatternRedactor()
' Redact SSNs and credit cards from a financial document
redactor.RedactPatterns(
"customer-data.pdf",
"customer-data-safe.pdf",
"SSN", "CreditCard", "Email"
)
End Sub
End Class
這種基於模式的方法具有良好的擴展性,因為您只需定義一次模式,即可將其套用至任何文件。 新增資料類型僅需在字典中加入新的正規表達式模式即可。
如何建立可重複使用的敏感資料掃描器?
在生產環境中,您通常需要掃描文件並報告其中包含哪些機密資訊,才能決定是否進行遮蔽處理。 這有助於合規稽核,並允許人工審查遮蔽決策。 以下類別提供掃描功能及資料遮蔽功能。
using IronPdf;
using System.Co/llections.Generic;
using System.Text.RegularExpressions;
using System.Linq;
public class SensitiveDataMatch
{
public string PatternType { get; set; }
public string Value { get; set; }
public int PageNumber { get; set; }
}
public class ScanResult
{
public string FilePath { get; set; }
public List<SensitiveDataMatch> Matches { get; set; } = new List<SensitiveDataMatch>();
public bool ContainsSensitiveData => Matches.Co/unt > 0;
public Dictionary<string, int> GetSummary()
{
return Matches.GroupBy(m => m.PatternType)
.ToDictionary(g => g.Key, g => g.Co/unt());
}
}
public class DocumentScanner
{
private readonly Dictionary<string, string> _patterns;
public DocumentScanner()
{
_patterns = new Dictionary<string, string>
{
{ "Social Security Number", @"\b\d{3}-\d{2}-\d{4}\b" },
{ "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
{ "Email Address", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
{ "Phone Number", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" },
{ "Date of Birth Pattern", @"\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" }
};
}
public ScanResult ScanDocument(string filePath)
{
ScanResult result = new ScanResult { FilePath = filePath };
PdfDocument pdf = PdfDocument.FromFile(filePath);
// Scan each page individually to track location
for (int pageIndex = 0; pageIndex < pdf.PageCount; pageIndex++)
{
string pageText = pdf.ExtractTextFromPage(pageIndex);
foreach (var pattern in _patterns)
{
Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(pageText);
foreach (Match match in matches)
{
result.Matches.Add(new SensitiveDataMatch
{
PatternType = pattern.Key,
Value = MaskValue(match.Value, pattern.Key),
PageNumber = pageIndex + 1
});
}
}
}
return result;
}
// Partially mask values for safe storage
private string MaskValue(string value, string patternType)
{
if (patternType == "Social Security Number" && value.Length >= 4)
{
return "XXX-XX-" + value.Substring(value.Length - 4);
}
if (patternType == "Credit Card" && value.Length >= 4)
{
return "****-****-****-" + value.Substring(value.Length - 4);
}
if (patternType == "Email Address")
{
int atIndex = value.IndexOf('@');
if (atIndex > 2)
{
return value.Substring(0, 2) + "***" + value.Substring(atIndex);
}
}
return value.Length > 4 ? value.Substring(0, 2) + "***" : "****";
}
public void ScanAndRedact(string inputPath, string outputPath)
{
// First scan to identify sensitive data
ScanResult scanResult = ScanDocument(inputPath);
if (!scanResult.Co/ntainsSensitiveData)
{
return;
}
// Load document for redaction
PdfDocument pdf = PdfDocument.FromFile(inputPath);
// Extract unique actual values (not masked) for redaction
string fullText = pdf.ExtractAllText();
HashSet<string> valuesToRedact = new HashSet<string>();
foreach (var pattern in _patterns)
{
Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
foreach (Match match in regex.Matches(fullText))
{
valuesToRedact.Add(match.Value);
}
}
// Apply redactions
foreach (string value in valuesToRedact)
{
pdf.RedactTextOnAllPages(value);
}
pdf.SaveAs(outputPath);
}
}
// Usage
class Program
{
static void Main()
{
DocumentScanner scanner = new DocumentScanner();
// Scan only (for audit purposes)
ScanResult result = scanner.ScanDocument("application-form.pdf");
var summary = result.GetSummary();
// Scan and redact in one operation
scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf");
}
}
using IronPdf;
using System.Co/llections.Generic;
using System.Text.RegularExpressions;
using System.Linq;
public class SensitiveDataMatch
{
public string PatternType { get; set; }
public string Value { get; set; }
public int PageNumber { get; set; }
}
public class ScanResult
{
public string FilePath { get; set; }
public List<SensitiveDataMatch> Matches { get; set; } = new List<SensitiveDataMatch>();
public bool ContainsSensitiveData => Matches.Co/unt > 0;
public Dictionary<string, int> GetSummary()
{
return Matches.GroupBy(m => m.PatternType)
.ToDictionary(g => g.Key, g => g.Co/unt());
}
}
public class DocumentScanner
{
private readonly Dictionary<string, string> _patterns;
public DocumentScanner()
{
_patterns = new Dictionary<string, string>
{
{ "Social Security Number", @"\b\d{3}-\d{2}-\d{4}\b" },
{ "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
{ "Email Address", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
{ "Phone Number", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" },
{ "Date of Birth Pattern", @"\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" }
};
}
public ScanResult ScanDocument(string filePath)
{
ScanResult result = new ScanResult { FilePath = filePath };
PdfDocument pdf = PdfDocument.FromFile(filePath);
// Scan each page individually to track location
for (int pageIndex = 0; pageIndex < pdf.PageCount; pageIndex++)
{
string pageText = pdf.ExtractTextFromPage(pageIndex);
foreach (var pattern in _patterns)
{
Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(pageText);
foreach (Match match in matches)
{
result.Matches.Add(new SensitiveDataMatch
{
PatternType = pattern.Key,
Value = MaskValue(match.Value, pattern.Key),
PageNumber = pageIndex + 1
});
}
}
}
return result;
}
// Partially mask values for safe storage
private string MaskValue(string value, string patternType)
{
if (patternType == "Social Security Number" && value.Length >= 4)
{
return "XXX-XX-" + value.Substring(value.Length - 4);
}
if (patternType == "Credit Card" && value.Length >= 4)
{
return "****-****-****-" + value.Substring(value.Length - 4);
}
if (patternType == "Email Address")
{
int atIndex = value.IndexOf('@');
if (atIndex > 2)
{
return value.Substring(0, 2) + "***" + value.Substring(atIndex);
}
}
return value.Length > 4 ? value.Substring(0, 2) + "***" : "****";
}
public void ScanAndRedact(string inputPath, string outputPath)
{
// First scan to identify sensitive data
ScanResult scanResult = ScanDocument(inputPath);
if (!scanResult.Co/ntainsSensitiveData)
{
return;
}
// Load document for redaction
PdfDocument pdf = PdfDocument.FromFile(inputPath);
// Extract unique actual values (not masked) for redaction
string fullText = pdf.ExtractAllText();
HashSet<string> valuesToRedact = new HashSet<string>();
foreach (var pattern in _patterns)
{
Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
foreach (Match match in regex.Matches(fullText))
{
valuesToRedact.Add(match.Value);
}
}
// Apply redactions
foreach (string value in valuesToRedact)
{
pdf.RedactTextOnAllPages(value);
}
pdf.SaveAs(outputPath);
}
}
// Usage
class Program
{
static void Main()
{
DocumentScanner scanner = new DocumentScanner();
// Scan only (for audit purposes)
ScanResult result = scanner.ScanDocument("application-form.pdf");
var summary = result.GetSummary();
// Scan and redact in one operation
scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf");
}
}
Imports IronPdf
Imports System.Collections.Generic
Imports System.Text.RegularExpressions
Imports System.Linq
Public Class SensitiveDataMatch
Public Property PatternType As String
Public Property Value As String
Public Property PageNumber As Integer
End Class
Public Class ScanResult
Public Property FilePath As String
Public Property Matches As List(Of SensitiveDataMatch) = New List(Of SensitiveDataMatch)()
Public ReadOnly Property ContainsSensitiveData As Boolean
Get
Return Matches.Count > 0
End Get
End Property
Public Function GetSummary() As Dictionary(Of String, Integer)
Return Matches.GroupBy(Function(m) m.PatternType) _
.ToDictionary(Function(g) g.Key, Function(g) g.Count())
End Function
End Class
Public Class DocumentScanner
Private ReadOnly _patterns As Dictionary(Of String, String)
Public Sub New()
_patterns = New Dictionary(Of String, String) From {
{"Social Security Number", "\b\d{3}-\d{2}-\d{4}\b"},
{"Credit Card", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"},
{"Email Address", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"},
{"Phone Number", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"},
{"Date of Birth Pattern", "\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b"}
}
End Sub
Public Function ScanDocument(filePath As String) As ScanResult
Dim result As New ScanResult With {.FilePath = filePath}
Dim pdf As PdfDocument = PdfDocument.FromFile(filePath)
' Scan each page individually to track location
For pageIndex As Integer = 0 To pdf.PageCount - 1
Dim pageText As String = pdf.ExtractTextFromPage(pageIndex)
For Each pattern In _patterns
Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase)
Dim matches As MatchCollection = regex.Matches(pageText)
For Each match As Match In matches
result.Matches.Add(New SensitiveDataMatch With {
.PatternType = pattern.Key,
.Value = MaskValue(match.Value, pattern.Key),
.PageNumber = pageIndex + 1
})
Next
Next
Next
Return result
End Function
' Partially mask values for safe storage
Private Function MaskValue(value As String, patternType As String) As String
If patternType = "Social Security Number" AndAlso value.Length >= 4 Then
Return "XXX-XX-" & value.Substring(value.Length - 4)
End If
If patternType = "Credit Card" AndAlso value.Length >= 4 Then
Return "****-****-****-" & value.Substring(value.Length - 4)
End If
If patternType = "Email Address" Then
Dim atIndex As Integer = value.IndexOf("@"c)
If atIndex > 2 Then
Return value.Substring(0, 2) & "***" & value.Substring(atIndex)
End If
End If
Return If(value.Length > 4, value.Substring(0, 2) & "***", "****")
End Function
Public Sub ScanAndRedact(inputPath As String, outputPath As String)
' First scan to identify sensitive data
Dim scanResult As ScanResult = ScanDocument(inputPath)
If Not scanResult.ContainsSensitiveData Then
Return
End If
' Load document for redaction
Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath)
' Extract unique actual values (not masked) for redaction
Dim fullText As String = pdf.ExtractAllText()
Dim valuesToRedact As New HashSet(Of String)()
For Each pattern In _patterns
Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase)
For Each match As Match In regex.Matches(fullText)
valuesToRedact.Add(match.Value)
Next
Next
' Apply redactions
For Each value As String In valuesToRedact
pdf.RedactTextOnAllPages(value)
Next
pdf.SaveAs(outputPath)
End Sub
End Class
' Usage
Module Program
Sub Main()
Dim scanner As New DocumentScanner()
' Scan only (for audit purposes)
Dim result As ScanResult = scanner.ScanDocument("application-form.pdf")
Dim summary = result.GetSummary()
' Scan and redact in one operation
scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf")
End Sub
End Module
該掃描工具可在任何修改發生之前,讓使用者清楚掌握現存的機密資訊。 這有助於符合合規工作流程,讓您能保存所發現及移除的內容的文件。 遮蔽功能可確保日誌檔案與報告本身不會成為資料外洩的來源。
如何在 PDF 中遮蔽特定區域或範圍?
文字遮蔽功能能有效處理基於字元的內容,但 PDF 檔案通常包含敏感資訊,其形式往往無法透過文字比對來處理。 簽名、照片、手寫註解、印章及圖形元素需採取不同的處理方式。 基於區域的遮蔽功能可讓您透過座標指定矩形區域,並永久遮蔽該範圍內的所有內容。
IronPDF 使用 RectangleF 結構來定義遮蔽區域。 您需先指定左上角的 X 和 Y 座標,接著指定區域的寬度和高度。 座標以點為單位,從頁面的左下角開始計算,這與 PDF/A 規格的座標系統相符。
輸入
一份包含手寫簽名及身分證件照片的簽署協議文件,需透過基於座標的區域標定功能進行遮蔽處理。
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-region-basic.cs
using IronPdf;
using IronSoftware.Drawing;
// Load a document with signature blocks and photos
PdfDocument pdf = PdfDocument.FromFile("signed-agreement.pdf");
// Define a region for a signature block
// Located 100 points from left, 650 points from bottom
// Width of 200 points, height of 50 points
RectangleF signatureRegion = new RectangleF(100, 650, 200, 50);
// Redact the signature region on all pages
pdf.RedactRegionsOnAllPages(signatureRegion);
// Define a region for a photo ID in the upper right
RectangleF photoRegion = new RectangleF(450, 700, 100, 120);
pdf.RedactRegionsOnAllPages(photoRegion);
// Save the document with regions redacted
pdf.SaveAs("signed-agreement-redacted.pdf");
Imports IronPdf
Imports IronSoftware.Drawing
' Load a document with signature blocks and photos
Dim pdf As PdfDocument = PdfDocument.FromFile("signed-agreement.pdf")
' Define a region for a signature block
' Located 100 points from left, 650 points from bottom
' Width of 200 points, height of 50 points
Dim signatureRegion As New RectangleF(100, 650, 200, 50)
' Redact the signature region on all pages
pdf.RedactRegionsOnAllPages(signatureRegion)
' Define a region for a photo ID in the upper right
Dim photoRegion As New RectangleF(450, 700, 100, 120)
pdf.RedactRegionsOnAllPages(photoRegion)
' Save the document with regions redacted
pdf.SaveAs("signed-agreement-redacted.pdf")
此程式碼使用 RectangleF 結構來定義需遮蔽的矩形區域。 簽名區位於座標 (100, 650),面積為 200x50 像素;照片區位於 (450, 700),面積為 100x120 像素。 RedactRegionsOnAllPages 方法會在所有頁面的這些區域上覆蓋黑色矩形。
翻譯範例
要確定正確的座標,通常需要進行一些實驗或測量。 PDF 頁面通常採用一種座標系統,其中一點等於 1/72 英吋。 標準的美國信紙頁面寬度為 612 點,高度為 792 點。 A4 頁面的尺寸約為 595 × 842 點。 可使用在移動游標時顯示座標的 PDF 檢視工具,或透過程式碼方式擷取頁面尺寸:
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-region-dimensions.cs
using IronPdf;
using IronSoftware.Drawing;
PdfDocument pdf = PdfDocument.FromFile("form-document.pdf");
// Get dimensions of the first page
var pageInfo = pdf.Pages[0];
// Calculate regions relative to page dimensions
// Redact the bottom quarter of the page where signatures appear
float signatureAreaHeight = (float)(pageInfo.Height / 4);
RectangleF bottomQuarter = new RectangleF(
0, // Start at left edge
0, // Start at bottom
(float)pageInfo.Width, // Full page width
signatureAreaHeight // Quarter of page height
);
pdf.RedactRegionsOnAllPages(bottomQuarter);
// Redact a header area at the top containing letterhead with address
float headerHeight = 100;
RectangleF headerArea = new RectangleF(
0,
(float)(pageInfo.Height - headerHeight), // Position from bottom
(float)pageInfo.Width,
headerHeight
);
pdf.RedactRegionsOnAllPages(headerArea);
pdf.SaveAs("form-document-redacted.pdf");
Imports IronPdf
Imports IronSoftware.Drawing
Dim pdf As PdfDocument = PdfDocument.FromFile("form-document.pdf")
' Get dimensions of the first page
Dim pageInfo = pdf.Pages(0)
' Calculate regions relative to page dimensions
' Redact the bottom quarter of the page where signatures appear
Dim signatureAreaHeight As Single = CSng(pageInfo.Height / 4)
Dim bottomQuarter As New RectangleF(0, 0, CSng(pageInfo.Width), signatureAreaHeight)
pdf.RedactRegionsOnAllPages(bottomQuarter)
' Redact a header area at the top containing letterhead with address
Dim headerHeight As Single = 100
Dim headerArea As New RectangleF(0, CSng(pageInfo.Height - headerHeight), CSng(pageInfo.Width), headerHeight)
pdf.RedactRegionsOnAllPages(headerArea)
pdf.SaveAs("form-document-redacted.pdf")
如何對不同頁面上的多個區域進行遮蔽處理?
複雜的文件通常需要在不同頁面上對不同區域進行遮蔽處理。 多頁表單中的簽名欄位位置可能各異,或不同頁面可能在特定位置包含照片、印章或其他圖形元素。 IronPDF 包含針對特定頁面的方法,用於對指定區域進行遮蔽處理。
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-multiple-regions.cs
using IronPdf;
using IronSoftware.Drawing;
PdfDocument pdf = PdfDocument.FromFile("multi-page-application.pdf");
// Define page-specific redaction regions
// Page 1: Cover page with applicant photo
RectangleF page1Photo = new RectangleF(450, 600, 120, 150);
pdf.RedactRegionOnPage(0, page1Photo);
// Page 2: Personal information section
RectangleF page2InfoBlock = new RectangleF(50, 400, 250, 200);
pdf.RedactRegionOnPage(1, page2InfoBlock);
// Pages 3-5: Signature lines at the same position
RectangleF signatureLine = new RectangleF(100, 100, 200, 40);
int[] signaturePages = { 2, 3, 4 };
pdf.RedactRegionOnPages(signaturePages, signatureLine);
// Page 6: Multiple regions - notary stamp and witness signature
RectangleF notaryStamp = new RectangleF(400, 150, 150, 150);
RectangleF witnessSignature = new RectangleF(100, 150, 200, 40);
pdf.RedactRegionOnPage(5, notaryStamp);
pdf.RedactRegionOnPage(5, witnessSignature);
pdf.SaveAs("multi-page-application-redacted.pdf");
Imports IronPdf
Imports IronSoftware.Drawing
Dim pdf As PdfDocument = PdfDocument.FromFile("multi-page-application.pdf")
' Define page-specific redaction regions
' Page 1: Cover page with applicant photo
Dim page1Photo As New RectangleF(450, 600, 120, 150)
pdf.RedactRegionOnPage(0, page1Photo)
' Page 2: Personal information section
Dim page2InfoBlock As New RectangleF(50, 400, 250, 200)
pdf.RedactRegionOnPage(1, page2InfoBlock)
' Pages 3-5: Signature lines at the same position
Dim signatureLine As New RectangleF(100, 100, 200, 40)
Dim signaturePages As Integer() = {2, 3, 4}
pdf.RedactRegionOnPages(signaturePages, signatureLine)
' Page 6: Multiple regions - notary stamp and witness signature
Dim notaryStamp As New RectangleF(400, 150, 150, 150)
Dim witnessSignature As New RectangleF(100, 150, 200, 40)
pdf.RedactRegionOnPage(5, notaryStamp)
pdf.RedactRegionOnPage(5, witnessSignature)
pdf.SaveAs("multi-page-application-redacted.pdf")
對於版面配置一致的文件,可透過可重複使用的區域定義來提升效率:
using IronPdf;
using IronSoftware.Drawing;
public class FormRegions
{
// Standard form regions based on common templates
public static RectangleF HeaderLogo => new RectangleF(20, 720, 150, 60);
public static RectangleF SignatureBlock => new RectangleF(72, 72, 200, 50);
public static RectangleF DateField => new RectangleF(400, 72, 120, 20);
public static RectangleF PhotoId => new RectangleF(480, 650, 100, 130);
public static RectangleF AddressBlock => new RectangleF(72, 600, 250, 80);
}
class Program
{
static void Main()
{
PdfDocument pdf = PdfDocument.FromFile("standard-form.pdf");
// Apply standard redactions using predefined regions
pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock);
pdf.RedactRegionsOnAllPages(FormRegions.DateField);
pdf.RedactRegionOnPage(0, FormRegions.PhotoId);
pdf.SaveAs("standard-form-redacted.pdf");
}
}
using IronPdf;
using IronSoftware.Drawing;
public class FormRegions
{
// Standard form regions based on common templates
public static RectangleF HeaderLogo => new RectangleF(20, 720, 150, 60);
public static RectangleF SignatureBlock => new RectangleF(72, 72, 200, 50);
public static RectangleF DateField => new RectangleF(400, 72, 120, 20);
public static RectangleF PhotoId => new RectangleF(480, 650, 100, 130);
public static RectangleF AddressBlock => new RectangleF(72, 600, 250, 80);
}
class Program
{
static void Main()
{
PdfDocument pdf = PdfDocument.FromFile("standard-form.pdf");
// Apply standard redactions using predefined regions
pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock);
pdf.RedactRegionsOnAllPages(FormRegions.DateField);
pdf.RedactRegionOnPage(0, FormRegions.PhotoId);
pdf.SaveAs("standard-form-redacted.pdf");
}
}
Imports IronPdf
Imports IronSoftware.Drawing
Public Class FormRegions
' Standard form regions based on common templates
Public Shared ReadOnly Property HeaderLogo As RectangleF
Get
Return New RectangleF(20, 720, 150, 60)
End Get
End Property
Public Shared ReadOnly Property SignatureBlock As RectangleF
Get
Return New RectangleF(72, 72, 200, 50)
End Get
End Property
Public Shared ReadOnly Property DateField As RectangleF
Get
Return New RectangleF(400, 72, 120, 20)
End Get
End Property
Public Shared ReadOnly Property PhotoId As RectangleF
Get
Return New RectangleF(480, 650, 100, 130)
End Get
End Property
Public Shared ReadOnly Property AddressBlock As RectangleF
Get
Return New RectangleF(72, 600, 250, 80)
End Get
End Property
End Class
Module Program
Sub Main()
Dim pdf As PdfDocument = PdfDocument.FromFile("standard-form.pdf")
' Apply standard redactions using predefined regions
pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock)
pdf.RedactRegionsOnAllPages(FormRegions.DateField)
pdf.RedactRegionOnPage(0, FormRegions.PhotoId)
pdf.SaveAs("standard-form-redacted.pdf")
End Sub
End Module
如何移除可能洩露敏感資訊的元資料?
PDF 元資料是常被忽略的資訊外洩來源。 每個 PDF 檔案都包含可能洩露敏感資訊的屬性:作者姓名與使用者名稱、建立文件所使用的軟體、建立與修改時間戳記、原始檔案名稱、修訂歷史紀錄,以及各類應用程式所新增的自訂屬性。 在將文件分享至外部之前,移除或清理這些元資料至關重要。 如需元資料操作的完整概述,請參閱我們的元資料操作指南。
IronPDF 透過 MetaData 屬性公開文件元資料,讓您能夠讀取現有值、修改它們,或將其完全移除。
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/view-metadata.cs
using IronPdf;
using System;
// Load a document containing sensitive metadata
PdfDocument pdf = PdfDocument.FromFile("internal-report.pdf");
// Access current metadata properties
string author = pdf.MetaData.Author;
string title = pdf.MetaData.Title;
string subject = pdf.MetaData.Subject;
string keywords = pdf.MetaData.Keywords;
string creator = pdf.MetaData.Creator;
string producer = pdf.MetaData.Producer;
DateTime? creationDate = pdf.MetaData.CreationDate;
DateTime? modifiedDate = pdf.MetaData.ModifiedDate;
// Get all metadata keys including custom properties
var allKeys = pdf.MetaData.Keys();
Imports IronPdf
Imports System
' Load a document containing sensitive metadata
Dim pdf As PdfDocument = PdfDocument.FromFile("internal-report.pdf")
' Access current metadata properties
Dim author As String = pdf.MetaData.Author
Dim title As String = pdf.MetaData.Title
Dim subject As String = pdf.MetaData.Subject
Dim keywords As String = pdf.MetaData.Keywords
Dim creator As String = pdf.MetaData.Creator
Dim producer As String = pdf.MetaData.Producer
Dim creationDate As DateTime? = pdf.MetaData.CreationDate
Dim modifiedDate As DateTime? = pdf.MetaData.ModifiedDate
' Get all metadata keys including custom properties
Dim allKeys = pdf.MetaData.Keys()
若要在發佈前移除敏感的元資料:
輸入
一份內含嵌入式元資料的內部備忘錄,例如作者姓名、建立時間戳記及可能揭露敏感組織資訊的自訂屬性。
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/remove-metadata.cs
using IronPdf;
using System;
PdfDocument pdf = PdfDocument.FromFile("confidential-memo.pdf");
// Replace identifying metadata with generic values
pdf.MetaData.Author = "Organization Name";
pdf.MetaData.Creator = "Document System";
pdf.MetaData.Producer = "";
pdf.MetaData.Title = "Public Document";
pdf.MetaData.Subject = "";
pdf.MetaData.Keywords = "";
// Normalize dates to remove timing information
pdf.MetaData.CreationDate = DateTime.Now;
pdf.MetaData.ModifiedDate = DateTime.Now;
// Remove specific custom metadata keys
pdf.MetaData.RemoveMetaDataKey("OriginalFilename");
pdf.MetaData.RemoveMetaDataKey("LastSavedBy");
pdf.MetaData.RemoveMetaDataKey("Company");
pdf.MetaData.RemoveMetaDataKey("Manager");
// Remove custom properties added by applications
try
{
pdf.MetaData.CustomProperties.Remove("SourcePath");
}
catch { }
pdf.SaveAs("confidential-memo-cleaned.pdf");
Imports IronPdf
Imports System
Dim pdf As PdfDocument = PdfDocument.FromFile("confidential-memo.pdf")
' Replace identifying metadata with generic values
pdf.MetaData.Author = "Organization Name"
pdf.MetaData.Creator = "Document System"
pdf.MetaData.Producer = ""
pdf.MetaData.Title = "Public Document"
pdf.MetaData.Subject = ""
pdf.MetaData.Keywords = ""
' Normalize dates to remove timing information
pdf.MetaData.CreationDate = DateTime.Now
pdf.MetaData.ModifiedDate = DateTime.Now
' Remove specific custom metadata keys
pdf.MetaData.RemoveMetaDataKey("OriginalFilename")
pdf.MetaData.RemoveMetaDataKey("LastSavedBy")
pdf.MetaData.RemoveMetaDataKey("Company")
pdf.MetaData.RemoveMetaDataKey("Manager")
' Remove custom properties added by applications
Try
pdf.MetaData.CustomProperties.Remove("SourcePath")
Catch
End Try
pdf.SaveAs("confidential-memo-cleaned.pdf")
此程式碼會將識別性元數據欄位替換為通用值、將時間戳記標準化為當前日期,並移除應用程式可能新增的自訂元數據鍵。 RemoveMetaDataKey 方法針對"OriginalFilename"和"LastSavedBy"等可能洩露內部資訊的特定屬性。
翻譯範例
要徹底清理批次操作中的元資料,需要採取系統化的方法:
using IronPdf;
using System;
using System.Co/llections.Generic;
public class MetadataCleaner
{
private readonly string _defaultAuthor;
private readonly string _defaultCreator;
public MetadataCleaner(string organizationName)
{
_defaultAuthor = organizationName;
_defaultCreator = $"{organizationName} Document System";
}
public void CleanMetadata(PdfDocument pdf)
{
// Replace standard metadata fields
pdf.MetaData.Author = _defaultAuthor;
pdf.MetaData.Creator = _defaultCreator;
pdf.MetaData.Producer = "";
pdf.MetaData.Subject = "";
pdf.MetaData.Keywords = "";
// Normalize timestamps
DateTime now = DateTime.Now;
pdf.MetaData.CreationDate = now;
pdf.MetaData.ModifiedDate = now;
// Get all keys and remove potentially sensitive ones
List<string> keysToRemove = new List<string>();
foreach (string key in pdf.MetaData.Keys())
{
// Keep only essential keys
if (!IsEssentialKey(key))
{
keysToRemove.Add(key);
}
}
foreach (string key in keysToRemove)
{
pdf.MetaData.RemoveMetaDataKey(key);
}
}
private bool IsEssentialKey(string key)
{
// Keep only the basic display properties
string[] essentialKeys = { "Title", "Author", "CreationDate", "ModifiedDate" };
foreach (string essential in essentialKeys)
{
if (key.Equals(essential, StringComparison.OrdinalIgnoreCase))
{
return true;
}
}
return false;
}
}
// Usage
class Program
{
static void Main()
{
MetadataCleaner cleaner = new MetadataCleaner("Acme Corporation");
PdfDocument pdf = PdfDocument.FromFile("report.pdf");
cleaner.CleanMetadata(pdf);
pdf.SaveAs("report-clean.pdf");
}
}
using IronPdf;
using System;
using System.Co/llections.Generic;
public class MetadataCleaner
{
private readonly string _defaultAuthor;
private readonly string _defaultCreator;
public MetadataCleaner(string organizationName)
{
_defaultAuthor = organizationName;
_defaultCreator = $"{organizationName} Document System";
}
public void CleanMetadata(PdfDocument pdf)
{
// Replace standard metadata fields
pdf.MetaData.Author = _defaultAuthor;
pdf.MetaData.Creator = _defaultCreator;
pdf.MetaData.Producer = "";
pdf.MetaData.Subject = "";
pdf.MetaData.Keywords = "";
// Normalize timestamps
DateTime now = DateTime.Now;
pdf.MetaData.CreationDate = now;
pdf.MetaData.ModifiedDate = now;
// Get all keys and remove potentially sensitive ones
List<string> keysToRemove = new List<string>();
foreach (string key in pdf.MetaData.Keys())
{
// Keep only essential keys
if (!IsEssentialKey(key))
{
keysToRemove.Add(key);
}
}
foreach (string key in keysToRemove)
{
pdf.MetaData.RemoveMetaDataKey(key);
}
}
private bool IsEssentialKey(string key)
{
// Keep only the basic display properties
string[] essentialKeys = { "Title", "Author", "CreationDate", "ModifiedDate" };
foreach (string essential in essentialKeys)
{
if (key.Equals(essential, StringComparison.OrdinalIgnoreCase))
{
return true;
}
}
return false;
}
}
// Usage
class Program
{
static void Main()
{
MetadataCleaner cleaner = new MetadataCleaner("Acme Corporation");
PdfDocument pdf = PdfDocument.FromFile("report.pdf");
cleaner.CleanMetadata(pdf);
pdf.SaveAs("report-clean.pdf");
}
}
Imports IronPdf
Imports System
Imports System.Collections.Generic
Public Class MetadataCleaner
Private ReadOnly _defaultAuthor As String
Private ReadOnly _defaultCreator As String
Public Sub New(organizationName As String)
_defaultAuthor = organizationName
_defaultCreator = $"{organizationName} Document System"
End Sub
Public Sub CleanMetadata(pdf As PdfDocument)
' Replace standard metadata fields
pdf.MetaData.Author = _defaultAuthor
pdf.MetaData.Creator = _defaultCreator
pdf.MetaData.Producer = ""
pdf.MetaData.Subject = ""
pdf.MetaData.Keywords = ""
' Normalize timestamps
Dim now As DateTime = DateTime.Now
pdf.MetaData.CreationDate = now
pdf.MetaData.ModifiedDate = now
' Get all keys and remove potentially sensitive ones
Dim keysToRemove As New List(Of String)()
For Each key As String In pdf.MetaData.Keys()
' Keep only essential keys
If Not IsEssentialKey(key) Then
keysToRemove.Add(key)
End If
Next
For Each key As String In keysToRemove
pdf.MetaData.RemoveMetaDataKey(key)
Next
End Sub
Private Function IsEssentialKey(key As String) As Boolean
' Keep only the basic display properties
Dim essentialKeys As String() = {"Title", "Author", "CreationDate", "ModifiedDate"}
For Each essential As String In essentialKeys
If key.Equals(essential, StringComparison.OrdinalIgnoreCase) Then
Return True
End If
Next
Return False
End Function
End Class
' Usage
Class Program
Shared Sub Main()
Dim cleaner As New MetadataCleaner("Acme Corporation")
Dim pdf As PdfDocument = PdfDocument.FromFile("report.pdf")
cleaner.CleanMetadata(pdf)
pdf.SaveAs("report-clean.pdf")
End Sub
End Class
如何清理 PDF 以移除嵌入式腳本和隱藏威脅?
PDF 淨化處理旨在解決超越可見內容與元資料範圍的安全疑慮。 PDF 檔案可能包含 JavaScript 程式碼、嵌入式可執行檔、會觸發外部連線的表單動作,以及其他潛在的惡意元素。 這些功能雖用於互動式表單和多媒體內容等正當用途,但也同時形成了攻擊途徑。 PDF 淨化處理會移除這些動態元素,同時保留視覺內容。 有關資料淨化方法的更多詳細資訊,請參閱我們的《PDF 淨化操作指南》。
IronPDF 的 Cleaner 類別採用一種優雅的方法來處理資料淨化:先將 PDF 轉換為圖像格式,然後再轉換回 PDF。 此流程會移除 JavaScript、嵌入式物件、表單動作及註解,同時保持視覺外觀完整。 此函式庫提供兩種特性各異的資料淨化方法。
輸入
從外部來源接收的 PDF 文件,其中可能包含 JavaScript、嵌入式物件或其他潛在惡意的動態內容。
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/sanitize-pdf.cs
using IronPdf;
// Load a PDF that may contain active content
PdfDocument pdf = PdfDocument.FromFile("received-document.pdf");
// Sanitize using SVG conversion
// Faster processing, results in searchable text, slight layout variations possible
PdfDocument sanitizedSvg = Cleaner.SanitizeWithSvg(pdf);
sanitizedSvg.SaveAs("sanitized-svg.pdf");
// Sanitize using Bitmap conversion
// Slower processing, text becomes image (not searchable), exact visual reproduction
PdfDocument sanitizedBitmap = Cleaner.SanitizeWithBitmap(pdf);
sanitizedBitmap.SaveAs("sanitized-bitmap.pdf");
Imports IronPdf
' Load a PDF that may contain active content
Dim pdf As PdfDocument = PdfDocument.FromFile("received-document.pdf")
' Sanitize using SVG conversion
' Faster processing, results in searchable text, slight layout variations possible
Dim sanitizedSvg As PdfDocument = Cleaner.SanitizeWithSvg(pdf)
sanitizedSvg.SaveAs("sanitized-svg.pdf")
' Sanitize using Bitmap conversion
' Slower processing, text becomes image (not searchable), exact visual reproduction
Dim sanitizedBitmap As PdfDocument = Cleaner.SanitizeWithBitmap(pdf)
sanitizedBitmap.SaveAs("sanitized-bitmap.pdf")
此程式碼展示了 IronPDF 的 Cleaner 類別所提供的兩種資料淨化方法。 SanitizeWithSvg 透過 SVG 中間格式轉換 PDF,在移除動態內容的同時保留可搜尋的文字。 SanitizeWithBitmap 會先將頁面轉換為圖片,產生與原始頁面完全一致的視覺複本,但其中的文字會被渲染為無法搜尋的圖形。
翻譯範例
SVG 方法運作速度更快,且能保留文字作為可搜尋內容,因此適用於需要維持索引或可存取性的文件。 位圖方法雖能產生完全相同的視覺複本,但會將文字轉換為圖像,導致無法選取文字或進行搜尋。 請根據您對輸出文件的要求進行選擇。
您亦可在清理過程中套用渲染選項來調整輸出:
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/sanitize-with-options.cs
using IronPdf;
// Load the potentially unsafe document
PdfDocument pdf = PdfDocument.FromFile("untrusted-source.pdf");
// Configure rendering options for sanitization
var renderOptions = new ChromePdfRenderOptions
{
MarginTop = 10,
MarginBottom = 10,
MarginLeft = 10,
MarginRight = 10
};
// Sanitize with custom options
PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf, renderOptions);
sanitized.SaveAs("untrusted-source-safe.pdf");
Imports IronPdf
' Load the potentially unsafe document
Dim pdf As PdfDocument = PdfDocument.FromFile("untrusted-source.pdf")
' Configure rendering options for sanitization
Dim renderOptions As New ChromePdfRenderOptions With {
.MarginTop = 10,
.MarginBottom = 10,
.MarginLeft = 10,
.MarginRight = 10
}
' Sanitize with custom options
Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf, renderOptions)
sanitized.SaveAs("untrusted-source-safe.pdf")
高安全性環境通常需要將資料淨化與其他防護措施結合使用:
using IronPdf;
using System;
public class SecureDocumentProcessor
{
public PdfDocument ProcessUntrustedDocument(string inputPath)
{
// Load the document
PdfDocument original = PdfDocument.FromFile(inputPath);
// Step 1: Sanitize to remove active content
PdfDocument sanitized = Cleaner.SanitizeWithSvg(original);
// Step 2: Clean metadata
sanitized.MetaData.Author = "Processed Document";
sanitized.MetaData.Creator = "Secure Processor";
sanitized.MetaData.Producer = "";
sanitized.MetaData.CreationDate = DateTime.Now;
sanitized.MetaData.ModifiedDate = DateTime.Now;
// Remove all custom metadata
foreach (string key in sanitized.MetaData.Keys())
{
if (key != "Title" && key != "Author" && key != "CreationDate" && key != "ModifiedDate")
{
sanitized.MetaData.RemoveMetaDataKey(key);
}
}
return sanitized;
}
}
// Usage
class Program
{
static void Main()
{
SecureDocumentProcessor processor = new SecureDocumentProcessor();
PdfDocument safe = processor.ProcessUntrustedDocument("email-attachment.pdf");
safe.SaveAs("email-attachment-safe.pdf");
}
}
using IronPdf;
using System;
public class SecureDocumentProcessor
{
public PdfDocument ProcessUntrustedDocument(string inputPath)
{
// Load the document
PdfDocument original = PdfDocument.FromFile(inputPath);
// Step 1: Sanitize to remove active content
PdfDocument sanitized = Cleaner.SanitizeWithSvg(original);
// Step 2: Clean metadata
sanitized.MetaData.Author = "Processed Document";
sanitized.MetaData.Creator = "Secure Processor";
sanitized.MetaData.Producer = "";
sanitized.MetaData.CreationDate = DateTime.Now;
sanitized.MetaData.ModifiedDate = DateTime.Now;
// Remove all custom metadata
foreach (string key in sanitized.MetaData.Keys())
{
if (key != "Title" && key != "Author" && key != "CreationDate" && key != "ModifiedDate")
{
sanitized.MetaData.RemoveMetaDataKey(key);
}
}
return sanitized;
}
}
// Usage
class Program
{
static void Main()
{
SecureDocumentProcessor processor = new SecureDocumentProcessor();
PdfDocument safe = processor.ProcessUntrustedDocument("email-attachment.pdf");
safe.SaveAs("email-attachment-safe.pdf");
}
}
Imports IronPdf
Imports System
Public Class SecureDocumentProcessor
Public Function ProcessUntrustedDocument(inputPath As String) As PdfDocument
' Load the document
Dim original As PdfDocument = PdfDocument.FromFile(inputPath)
' Step 1: Sanitize to remove active content
Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(original)
' Step 2: Clean metadata
sanitized.MetaData.Author = "Processed Document"
sanitized.MetaData.Creator = "Secure Processor"
sanitized.MetaData.Producer = ""
sanitized.MetaData.CreationDate = DateTime.Now
sanitized.MetaData.ModifiedDate = DateTime.Now
' Remove all custom metadata
For Each key As String In sanitized.MetaData.Keys()
If key <> "Title" AndAlso key <> "Author" AndAlso key <> "CreationDate" AndAlso key <> "ModifiedDate" Then
sanitized.MetaData.RemoveMetaDataKey(key)
End If
Next
Return sanitized
End Function
End Class
' Usage
Module Program
Sub Main()
Dim processor As New SecureDocumentProcessor()
Dim safe As PdfDocument = processor.ProcessUntrustedDocument("email-attachment.pdf")
safe.SaveAs("email-attachment-safe.pdf")
End Sub
End Module
如何掃描 PDF 檔案以檢查安全漏洞?
在處理或淨化文件之前,您可能需要評估其中可能包含的潛在威脅。 IronPDF 的 Cleaner.ScanPdf 方法會使用 YARA 規則來檢查文件,這些規則是惡意軟體分析與威脅偵測中常用的模式定義。 此掃描功能可識別與惡意 PDF 檔案相關的特徵。
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/scan-vulnerabilities.cs
using IronPdf;
// Load the document to scan
PdfDocument pdf = PdfDocument.FromFile("suspicious-document.pdf");
// Scan using default YARA rules
CleanerScanResult scanResult = Cleaner.ScanPdf(pdf);
// Check the scan results
bool threatsDetected = scanResult.IsDetected;
int riskCount = scanResult.Risks.Count;
// Process identified risks
if (scanResult.IsDetected)
{
foreach (var risk in scanResult.Risks)
{
// Handle each identified risk
}
// Sanitize the document before use
PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf);
sanitized.SaveAs("suspicious-document-safe.pdf");
}
Imports IronPdf
' Load the document to scan
Dim pdf As PdfDocument = PdfDocument.FromFile("suspicious-document.pdf")
' Scan using default YARA rules
Dim scanResult As CleanerScanResult = Cleaner.ScanPdf(pdf)
' Check the scan results
Dim threatsDetected As Boolean = scanResult.IsDetected
Dim riskCount As Integer = scanResult.Risks.Count
' Process identified risks
If scanResult.IsDetected Then
For Each risk In scanResult.Risks
' Handle each identified risk
Next
' Sanitize the document before use
Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf)
sanitized.SaveAs("suspicious-document-safe.pdf")
End If
您可以提供自訂的 YARA 規則檔案,以滿足特殊偵測需求。 擁有特定威脅模型或合規需求的組織,通常會維護針對特定漏洞模式的專屬規則集。
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/scan-custom-yara.cs
using IronPdf;
PdfDocument pdf = PdfDocument.FromFile("incoming-document.pdf");
// Scan with custom YARA rules
string[] customYaraFiles = { "corporate-rules.yar", "industry-specific.yar" };
CleanerScanResult result = Cleaner.ScanPdf(pdf, customYaraFiles);
if (result.IsDetected)
{
// Document triggered custom rules and requires review or sanitization
PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf);
sanitized.SaveAs("incoming-document-safe.pdf");
}
Imports IronPdf
Dim pdf As PdfDocument = PdfDocument.FromFile("incoming-document.pdf")
' Scan with custom YARA rules
Dim customYaraFiles As String() = {"corporate-rules.yar", "industry-specific.yar"}
Dim result As CleanerScanResult = Cleaner.ScanPdf(pdf, customYaraFiles)
If result.IsDetected Then
' Document triggered custom rules and requires review or sanitization
Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf)
sanitized.SaveAs("incoming-document-safe.pdf")
End If
將掃描功能整合至文件接收工作流程中,有助於自動化安全決策:
using IronPdf;
using System;
using System.IO;
public enum DocumentSafetyLevel
{
Safe,
Suspicious,
Dangerous
}
public class DocumentSecurityGateway
{
public DocumentSafetyLevel EvaluateDocument(string filePath)
{
PdfDocument pdf = PdfDocument.FromFile(filePath);
CleanerScanResult scan = Cleaner.ScanPdf(pdf);
if (!scan.IsDetected)
{
return DocumentSafetyLevel.Safe;
}
// Evaluate severity based on number of risks
if (scan.Risks.Co/unt > 5)
{
return DocumentSafetyLevel.Dangerous;
}
return DocumentSafetyLevel.Suspicious;
}
public PdfDocument ProcessIncomingDocument(string filePath, string outputDirectory)
{
DocumentSafetyLevel safety = EvaluateDocument(filePath);
string fileName = Path.GetFileName(filePath);
switch (safety)
{
case DocumentSafetyLevel.Safe:
return PdfDocument.FromFile(filePath);
case DocumentSafetyLevel.Suspicious:
PdfDocument suspicious = PdfDocument.FromFile(filePath);
return Cleaner.SanitizeWithSvg(suspicious);
case DocumentSafetyLevel.Dangerous:
throw new SecurityException($"Document {fileName} contains dangerous content");
default:
throw new InvalidOperationException("Unknown safety level");
}
}
}
using IronPdf;
using System;
using System.IO;
public enum DocumentSafetyLevel
{
Safe,
Suspicious,
Dangerous
}
public class DocumentSecurityGateway
{
public DocumentSafetyLevel EvaluateDocument(string filePath)
{
PdfDocument pdf = PdfDocument.FromFile(filePath);
CleanerScanResult scan = Cleaner.ScanPdf(pdf);
if (!scan.IsDetected)
{
return DocumentSafetyLevel.Safe;
}
// Evaluate severity based on number of risks
if (scan.Risks.Co/unt > 5)
{
return DocumentSafetyLevel.Dangerous;
}
return DocumentSafetyLevel.Suspicious;
}
public PdfDocument ProcessIncomingDocument(string filePath, string outputDirectory)
{
DocumentSafetyLevel safety = EvaluateDocument(filePath);
string fileName = Path.GetFileName(filePath);
switch (safety)
{
case DocumentSafetyLevel.Safe:
return PdfDocument.FromFile(filePath);
case DocumentSafetyLevel.Suspicious:
PdfDocument suspicious = PdfDocument.FromFile(filePath);
return Cleaner.SanitizeWithSvg(suspicious);
case DocumentSafetyLevel.Dangerous:
throw new SecurityException($"Document {fileName} contains dangerous content");
default:
throw new InvalidOperationException("Unknown safety level");
}
}
}
Imports IronPdf
Imports System
Imports System.IO
Public Enum DocumentSafetyLevel
Safe
Suspicious
Dangerous
End Enum
Public Class DocumentSecurityGateway
Public Function EvaluateDocument(filePath As String) As DocumentSafetyLevel
Dim pdf As PdfDocument = PdfDocument.FromFile(filePath)
Dim scan As CleanerScanResult = Cleaner.ScanPdf(pdf)
If Not scan.IsDetected Then
Return DocumentSafetyLevel.Safe
End If
' Evaluate severity based on number of risks
If scan.Risks.Count > 5 Then
Return DocumentSafetyLevel.Dangerous
End If
Return DocumentSafetyLevel.Suspicious
End Function
Public Function ProcessIncomingDocument(filePath As String, outputDirectory As String) As PdfDocument
Dim safety As DocumentSafetyLevel = EvaluateDocument(filePath)
Dim fileName As String = Path.GetFileName(filePath)
Select Case safety
Case DocumentSafetyLevel.Safe
Return PdfDocument.FromFile(filePath)
Case DocumentSafetyLevel.Suspicious
Dim suspicious As PdfDocument = PdfDocument.FromFile(filePath)
Return Cleaner.SanitizeWithSvg(suspicious)
Case DocumentSafetyLevel.Dangerous
Throw New SecurityException($"Document {fileName} contains dangerous content")
Case Else
Throw New InvalidOperationException("Unknown safety level")
End Select
End Function
End Class
如何建立完整的資料遮蔽與淨化處理流程?
生產文件處理通常需要將多種保護技術整合成一個連貫的工作流程。 完整的處理流程可能包含:掃描傳入文件以偵測威脅、對通過初步篩選的文件進行淨化處理、執行文字與區域遮蔽、移除元資料,並產生記錄所有操作的稽核日誌。 此範例即展示了此種整合式翻譯方法。
using IronPdf;
using IronSoftware.Drawing;
using System;
using System.Co/llections.Generic;
using System.IO;
using System.Text.RegularExpressions;
public class DocumentProcessingResult
{
public string OriginalFile { get; set; }
public string OutputFile { get; set; }
public bool WasSanitized { get; set; }
public int TextRedactionsApplied { get; set; }
public int RegionRedactionsApplied { get; set; }
public bool MetadataCleaned { get; set; }
public List<string> SensitiveDataTypesFound { get; set; } = new List<string>();
public DateTime ProcessedAt { get; set; }
public bool Success { get; set; }
public string ErrorMessage { get; set; }
}
public class ComprehensiveDocumentProcessor
{
// Sensitive data patterns
private readonly Dictionary<string, string> _sensitivePatterns = new Dictionary<string, string>
{
{ "SSN", @"\b\d{3}-\d{2}-\d{4}\b" },
{ "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
{ "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
{ "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }
};
// Standard regions to redact (signature areas, photo locations)
private readonly List<RectangleF> _standardRedactionRegions = new List<RectangleF>
{
new RectangleF(72, 72, 200, 50), // Bottom left signature
new RectangleF(350, 72, 200, 50) // Bottom right signature
};
private readonly string _organizationName;
public ComprehensiveDocumentProcessor(string organizationName)
{
_organizationName = organizationName;
}
public DocumentProcessingResult ProcessDocument(
string inputPath,
string outputPath,
bool sanitize = true,
bool redactPatterns = true,
bool redactRegions = true,
bool cleanMetadata = true,
List<string> additionalTermsToRedact = null)
{
var result = new DocumentProcessingResult
{
OriginalFile = inputPath,
OutputFile = outputPath,
ProcessedAt = DateTime.Now
};
try
{
// Load the document
PdfDocument pdf = PdfDocument.FromFile(inputPath);
// Step 1: Security scan
CleanerScanResult scanResult = Cleaner.ScanPdf(pdf);
if (scanResult.IsDetected && scanResult.Risks.Co/unt > 10)
{
throw new SecurityException("Document contains too many security risks to process");
}
// Step 2: Sanitization (if needed or requested)
if (sanitize || scanResult.IsDetected)
{
pdf = Cleaner.SanitizeWithSvg(pdf);
result.WasSanitized = true;
}
// Step 3: Pattern-based text redaction
if (redactPatterns)
{
string fullText = pdf.ExtractAllText();
HashSet<string> valuesToRedact = new HashSet<string>();
foreach (var pattern in _sensitivePatterns)
{
Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(fullText);
if (matches.Co/unt > 0)
{
result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Co/unt})");
foreach (Match match in matches)
{
valuesToRedact.Add(match.Value);
}
}
}
// Apply redactions
foreach (string value in valuesToRedact)
{
pdf.RedactTextOnAllPages(value);
result.TextRedactionsApplied++;
}
}
// Step 4: Additional specific terms
if (additionalTermsToRedact != null)
{
foreach (string term in additionalTermsToRedact)
{
pdf.RedactTextOnAllPages(term);
result.TextRedactionsApplied++;
}
}
// Step 5: Region-based redaction
if (redactRegions)
{
foreach (RectangleF region in _standardRedactionRegions)
{
pdf.RedactRegionsOnAllPages(region);
result.RegionRedactionsApplied++;
}
}
// Step 6: Metadata cleaning
if (cleanMetadata)
{
pdf.MetaData.Author = _organizationName;
pdf.MetaData.Creator = $"{_organizationName} Document Processor";
pdf.MetaData.Producer = "";
pdf.MetaData.Subject = "";
pdf.MetaData.Keywords = "";
pdf.MetaData.CreationDate = DateTime.Now;
pdf.MetaData.ModifiedDate = DateTime.Now;
result.MetadataCleaned = true;
}
// Step 7: Save the processed document
pdf.SaveAs(outputPath);
result.Success = true;
}
catch (Exception ex)
{
result.Success = false;
result.ErrorMessage = ex.Message;
}
return result;
}
}
// Usage example
class Program
{
static void Main()
{
var processor = new ComprehensiveDocumentProcessor("Acme Corporation");
// Process a single document with all protections
var result = processor.ProcessDocument(
inputPath: "customer-application.pdf",
outputPath: "customer-application-redacted.pdf",
sanitize: true,
redactPatterns: true,
redactRegions: true,
cleanMetadata: true,
additionalTermsToRedact: new List<string> { "Project Alpha", "Internal Use Only" }
);
// Batch process multiple documents
string[] inputFiles = Directory.GetFiles("incoming", "*.pdf");
foreach (string file in inputFiles)
{
string outputFile = Path.Com/bine("processed", Path.GetFileName(file));
processor.ProcessDocument(file, outputFile);
}
}
}
using IronPdf;
using IronSoftware.Drawing;
using System;
using System.Co/llections.Generic;
using System.IO;
using System.Text.RegularExpressions;
public class DocumentProcessingResult
{
public string OriginalFile { get; set; }
public string OutputFile { get; set; }
public bool WasSanitized { get; set; }
public int TextRedactionsApplied { get; set; }
public int RegionRedactionsApplied { get; set; }
public bool MetadataCleaned { get; set; }
public List<string> SensitiveDataTypesFound { get; set; } = new List<string>();
public DateTime ProcessedAt { get; set; }
public bool Success { get; set; }
public string ErrorMessage { get; set; }
}
public class ComprehensiveDocumentProcessor
{
// Sensitive data patterns
private readonly Dictionary<string, string> _sensitivePatterns = new Dictionary<string, string>
{
{ "SSN", @"\b\d{3}-\d{2}-\d{4}\b" },
{ "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
{ "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
{ "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }
};
// Standard regions to redact (signature areas, photo locations)
private readonly List<RectangleF> _standardRedactionRegions = new List<RectangleF>
{
new RectangleF(72, 72, 200, 50), // Bottom left signature
new RectangleF(350, 72, 200, 50) // Bottom right signature
};
private readonly string _organizationName;
public ComprehensiveDocumentProcessor(string organizationName)
{
_organizationName = organizationName;
}
public DocumentProcessingResult ProcessDocument(
string inputPath,
string outputPath,
bool sanitize = true,
bool redactPatterns = true,
bool redactRegions = true,
bool cleanMetadata = true,
List<string> additionalTermsToRedact = null)
{
var result = new DocumentProcessingResult
{
OriginalFile = inputPath,
OutputFile = outputPath,
ProcessedAt = DateTime.Now
};
try
{
// Load the document
PdfDocument pdf = PdfDocument.FromFile(inputPath);
// Step 1: Security scan
CleanerScanResult scanResult = Cleaner.ScanPdf(pdf);
if (scanResult.IsDetected && scanResult.Risks.Co/unt > 10)
{
throw new SecurityException("Document contains too many security risks to process");
}
// Step 2: Sanitization (if needed or requested)
if (sanitize || scanResult.IsDetected)
{
pdf = Cleaner.SanitizeWithSvg(pdf);
result.WasSanitized = true;
}
// Step 3: Pattern-based text redaction
if (redactPatterns)
{
string fullText = pdf.ExtractAllText();
HashSet<string> valuesToRedact = new HashSet<string>();
foreach (var pattern in _sensitivePatterns)
{
Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(fullText);
if (matches.Co/unt > 0)
{
result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Co/unt})");
foreach (Match match in matches)
{
valuesToRedact.Add(match.Value);
}
}
}
// Apply redactions
foreach (string value in valuesToRedact)
{
pdf.RedactTextOnAllPages(value);
result.TextRedactionsApplied++;
}
}
// Step 4: Additional specific terms
if (additionalTermsToRedact != null)
{
foreach (string term in additionalTermsToRedact)
{
pdf.RedactTextOnAllPages(term);
result.TextRedactionsApplied++;
}
}
// Step 5: Region-based redaction
if (redactRegions)
{
foreach (RectangleF region in _standardRedactionRegions)
{
pdf.RedactRegionsOnAllPages(region);
result.RegionRedactionsApplied++;
}
}
// Step 6: Metadata cleaning
if (cleanMetadata)
{
pdf.MetaData.Author = _organizationName;
pdf.MetaData.Creator = $"{_organizationName} Document Processor";
pdf.MetaData.Producer = "";
pdf.MetaData.Subject = "";
pdf.MetaData.Keywords = "";
pdf.MetaData.CreationDate = DateTime.Now;
pdf.MetaData.ModifiedDate = DateTime.Now;
result.MetadataCleaned = true;
}
// Step 7: Save the processed document
pdf.SaveAs(outputPath);
result.Success = true;
}
catch (Exception ex)
{
result.Success = false;
result.ErrorMessage = ex.Message;
}
return result;
}
}
// Usage example
class Program
{
static void Main()
{
var processor = new ComprehensiveDocumentProcessor("Acme Corporation");
// Process a single document with all protections
var result = processor.ProcessDocument(
inputPath: "customer-application.pdf",
outputPath: "customer-application-redacted.pdf",
sanitize: true,
redactPatterns: true,
redactRegions: true,
cleanMetadata: true,
additionalTermsToRedact: new List<string> { "Project Alpha", "Internal Use Only" }
);
// Batch process multiple documents
string[] inputFiles = Directory.GetFiles("incoming", "*.pdf");
foreach (string file in inputFiles)
{
string outputFile = Path.Com/bine("processed", Path.GetFileName(file));
processor.ProcessDocument(file, outputFile);
}
}
}
Imports IronPdf
Imports IronSoftware.Drawing
Imports System
Imports System.Collections.Generic
Imports System.IO
Imports System.Text.RegularExpressions
Public Class DocumentProcessingResult
Public Property OriginalFile As String
Public Property OutputFile As String
Public Property WasSanitized As Boolean
Public Property TextRedactionsApplied As Integer
Public Property RegionRedactionsApplied As Integer
Public Property MetadataCleaned As Boolean
Public Property SensitiveDataTypesFound As List(Of String) = New List(Of String)()
Public Property ProcessedAt As DateTime
Public Property Success As Boolean
Public Property ErrorMessage As String
End Class
Public Class ComprehensiveDocumentProcessor
' Sensitive data patterns
Private ReadOnly _sensitivePatterns As Dictionary(Of String, String) = New Dictionary(Of String, String) From {
{"SSN", "\b\d{3}-\d{2}-\d{4}\b"},
{"Credit Card", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"},
{"Email", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"},
{"Phone", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"}
}
' Standard regions to redact (signature areas, photo locations)
Private ReadOnly _standardRedactionRegions As List(Of RectangleF) = New List(Of RectangleF) From {
New RectangleF(72, 72, 200, 50), ' Bottom left signature
New RectangleF(350, 72, 200, 50) ' Bottom right signature
}
Private ReadOnly _organizationName As String
Public Sub New(organizationName As String)
_organizationName = organizationName
End Sub
Public Function ProcessDocument(inputPath As String, outputPath As String, Optional sanitize As Boolean = True, Optional redactPatterns As Boolean = True, Optional redactRegions As Boolean = True, Optional cleanMetadata As Boolean = True, Optional additionalTermsToRedact As List(Of String) = Nothing) As DocumentProcessingResult
Dim result As New DocumentProcessingResult With {
.OriginalFile = inputPath,
.OutputFile = outputPath,
.ProcessedAt = DateTime.Now
}
Try
' Load the document
Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath)
' Step 1: Security scan
Dim scanResult As CleanerScanResult = Cleaner.ScanPdf(pdf)
If scanResult.IsDetected AndAlso scanResult.Risks.Count > 10 Then
Throw New SecurityException("Document contains too many security risks to process")
End If
' Step 2: Sanitization (if needed or requested)
If sanitize OrElse scanResult.IsDetected Then
pdf = Cleaner.SanitizeWithSvg(pdf)
result.WasSanitized = True
End If
' Step 3: Pattern-based text redaction
If redactPatterns Then
Dim fullText As String = pdf.ExtractAllText()
Dim valuesToRedact As New HashSet(Of String)()
For Each pattern In _sensitivePatterns
Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase)
Dim matches As MatchCollection = regex.Matches(fullText)
If matches.Count > 0 Then
result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})")
For Each match As Match In matches
valuesToRedact.Add(match.Value)
Next
End If
Next
' Apply redactions
For Each value In valuesToRedact
pdf.RedactTextOnAllPages(value)
result.TextRedactionsApplied += 1
Next
End If
' Step 4: Additional specific terms
If additionalTermsToRedact IsNot Nothing Then
For Each term In additionalTermsToRedact
pdf.RedactTextOnAllPages(term)
result.TextRedactionsApplied += 1
Next
End If
' Step 5: Region-based redaction
If redactRegions Then
For Each region In _standardRedactionRegions
pdf.RedactRegionsOnAllPages(region)
result.RegionRedactionsApplied += 1
Next
End If
' Step 6: Metadata cleaning
If cleanMetadata Then
pdf.MetaData.Author = _organizationName
pdf.MetaData.Creator = $"{_organizationName} Document Processor"
pdf.MetaData.Producer = ""
pdf.MetaData.Subject = ""
pdf.MetaData.Keywords = ""
pdf.MetaData.CreationDate = DateTime.Now
pdf.MetaData.ModifiedDate = DateTime.Now
result.MetadataCleaned = True
End If
' Step 7: Save the processed document
pdf.SaveAs(outputPath)
result.Success = True
Catch ex As Exception
result.Success = False
result.ErrorMessage = ex.Message
End Try
Return result
End Function
End Class
' Usage example
Module Program
Sub Main()
Dim processor As New ComprehensiveDocumentProcessor("Acme Corporation")
' Process a single document with all protections
Dim result As DocumentProcessingResult = processor.ProcessDocument(
inputPath:="customer-application.pdf",
outputPath:="customer-application-redacted.pdf",
sanitize:=True,
redactPatterns:=True,
redactRegions:=True,
cleanMetadata:=True,
additionalTermsToRedact:=New List(Of String) From {"Project Alpha", "Internal Use Only"}
)
' Batch process multiple documents
Dim inputFiles As String() = Directory.GetFiles("incoming", "*.pdf")
For Each file As String In inputFiles
Dim outputFile As String = Path.Combine("processed", Path.GetFileName(file))
processor.ProcessDocument(file, outputFile)
Next
End Sub
End Module
輸入
一份客戶申請表,其中包含多種類型的敏感資料,包括社會安全號碼、信用卡號碼、電子郵件地址以及需要全面保護的簽名區塊。
翻譯範例
此綜合處理器將本指南中涵蓋的所有技術整合為單一且可配置的類別。 它會掃描威脅、在必要時進行淨化處理、找出並遮蔽敏感模式、應用區域遮蔽、清理元資料,並產生詳細報告。 您可以調整敏感度模式、遮蔽區域及處理選項,以符合您的特定需求。
後續步驟
保護 PDF 文件中的敏感資訊,需要比表面措施更深入的防護。 真正的刪除會永久從文件結構中移除內容。 模式比對功能可自動偵測並移除社會安全號碼、信用卡資訊及電子郵件地址等資料。 基於區域的遮蔽處理可處理簽名、照片及其他圖形元素,這些是文字比對無法處理的項目。 元資料清理可移除可能洩露作者、時間戳記或內部檔案路徑的隱藏資訊。 安全清理會移除可能構成安全風險的嵌入式腳本和動態內容。
IronPDF 透過一套設計完善且一致的 API 提供所有這些功能,該 API 可自然地與 C# 及 .NET 開發實務整合。 本指南中示範的方法既可處理單一文件,亦可擴展至批次處理數千個檔案。 無論您是建立醫療保健資料的合規工作流程、準備用於證據開示的法律文件,還是單純確保內部報告能安全地對外分享,這些技術都是負責任文件處理的基礎。 為實現全面的安全防護,請將資料遮蔽與密碼保護、權限設定及數位簽章結合使用。
準備好開始建構了嗎? 下載 IronPDF 並透過免費試用版體驗其功能。 此函式庫包含免費開發授權,因此您可在決定購買正式授權前,充分評估其內容遮蔽、文字擷取及資料淨化功能。 若您對實作或合規工作流程有任何疑問,請聯繫我們的工程支援團隊。
常見問題
何謂 PDF 遮蔽?
PDF 遮蔽是指從 PDF 文件中永久移除敏感資訊的過程。這可能包括因隱私或合規原因而需隱藏的文字、圖片及元資料。
如何使用 C# 對 PDF 中的資訊進行遮蔽處理?
您可以使用 IronPDF 透過 C# 對 PDF 中的資訊進行遮蔽處理。它允許您永久移除或隱藏 PDF 文件中的文字、圖片及元資料,確保文件符合隱私權與合規標準。
為何 PDF 遮蔽處理對合規性至關重要?
PDF 資料遮蔽對於符合 HIPAA、GDPR 和 PCI DSS 等標準至關重要,因為它有助於保護敏感資料,並防止未經授權存取機密資訊。
IronPDF 能否對 PDF 的整個區域進行遮蔽處理?
是的,IronPDF 能夠對 PDF 文件中的整個區域進行遮蔽處理。這讓您能夠定義文件中因安全考量而需要隱藏或移除的特定區域。
IronPDF 可用於遮蔽哪些類型的資料?
IronPDF 能從 PDF 文件中遮蔽各類資料,包括文字、圖片及元資料,確保全面的資料隱私與安全性。
IronPDF 是否支援文件淨化功能?
是的,IronPDF 支援文件淨化功能,此功能可清理 PDF 檔案,移除那些雖不可見但仍可能構成隱私風險的隱藏資料或元資料。
是否可以透過 IronPDF 自動執行 PDF 遮蔽處理?
是的,IronPDF 支援透過 C# 自動化 PDF 遮蔽流程,使處理大量需移除敏感資料的文件變得更加容易。
IronPDF 如何確保遮蔽處理的永久性?
IronPDF 透過永久移除文件中選取的文字與圖片(而非僅加以遮蔽),確保遮蔽內容的永久性,這意味著這些內容無法被復原或檢視。
IronPDF 能否在 PDF 中遮蔽元資料?
是的,IronPDF 能夠對 PDF 文件中的元資料進行遮蔽處理,確保所有形式的敏感資料(包括隱藏或背景資料)均被徹底移除。
使用 IronPDF 進行 PDF 遮蔽處理有哪些好處?
使用 IronPDF 進行 PDF 遮蔽處理,可帶來諸多好處,例如確保符合資料保護法規、強化文件安全性,並提供高效且自動化的敏感資訊管理流程。

