C# 中的 PDF 編輯：使用 IronPDF 刪除敏感資料並清理文件

更新:2026年2月3日

Translated

View the article in English

使用IronPDF在C# .NET中進行 PDF 編輯，可以永久地從文件的內部結構中刪除敏感內容，而不僅僅是視覺上覆蓋它，因此無論進行多少複製、搜尋或取證分析都無法恢復原始資料。這遠不止是在文字上添加黑色矩形： IronPDF提供使用正規表示式模式匹配的文字編輯、基於區域的簽名和圖像編輯、元資料剝離、文件清理以消除嵌入式腳本以及漏洞掃描，為.NET開發人員提供了一套完整的工具包，用於符合HIPAA 、 GDPR和PCI DSS標準的文件保護工作流程。

TL;DR：快速入門指南

本教學介紹如何在 C# .NET中永久刪除 PDF 文件中的敏感內容，包括文字模式、圖像區域、元資料和嵌入式腳本。

-適用對象：在醫療保健、法律、金融或政府領域處理敏感文件的.NET開發人員。 -你將建立的功能：使用正規表示式模式配對進行文字編輯（社保號碼、信用卡、電子郵件），基於座標的區域編輯（用於簽名和照片），元資料清理，PDF 清理（用於移除嵌入式腳本），以及基於 YARA 的漏洞掃描。 -運行環境： .NET 10、 .NET 8 LTS、 .NET Framework 4.6.2+ 和.NET Standard 2.0。所有操作均在本地運行，沒有外部相依性。 -何時使用此方法：當您需要共用文件以進行法律取證、資訊自由法案請求或對外分發，同時確保刪除的內容真正消失時。 -從技術角度來看，這很重要：視覺疊加層不會影響 PDF 內容流中的原始文字的恢復。 IronPDF 的編輯功能會從文件結構本身刪除字元數據，使恢復成為不可能。

只需幾行程式碼即可從 PDF 檔案中刪除敏感文字：

使用NuGet套件管理器安裝https://www.nuget.org/packages/IronPdf
PM > Install-Package IronPdf

複製並運行這段程式碼。

using IronPdf;

PdfDocument pdf = PdfDocument.FromFile("confidential-report.pdf");
pdf.RedactTextOnAllPages("CONFIDENTIAL");
pdf.SaveAs("redacted-report.pdf");

部署到您的生產環境進行測試

今天就在您的專案中開始使用免費試用IronPDF

購買或註冊IronPDF的 30 天試用版後，請在應用程式開始時新增您的授權金鑰。

IronPdf.License.LicenseKey = "KEY";

IronPdf.License.LicenseKey = "KEY";

Imports IronPdf

IronPdf.License.LicenseKey = "KEY"

$vbLabelText $csharpLabel

立即開始在您的項目中使用 IronPDF 並免費試用。

第一步：

立即開始在您的項目中使用 IronPDF 並免費試用。

第一步：

TL;DR：快速入門指南 快速概覽 -從PDF文件中刪除文本 -真實編輯和視覺疊加有什麼區別？如何刪除PDF文件中所有頁面的文字？如何僅對特定頁面上的文字進行編輯？如何自訂已編輯內容的外觀？ -模式匹配和自動編輯 如何使用正規表示式尋找和編輯敏感模式？如何建立一個可重複使用的敏感資料掃描器？ 基於區域的編輯 -如何編輯PDF中的特定區域？ -如何在不同頁面編輯多個區域？ -從 PDF 元數據中移除敏感數據 如何刪除可能洩漏敏感資訊的元資料？
.NET中的 PDF 清理 如何清理 PDF 文件以移除嵌入的腳本和隱藏的威脅？如何掃描PDF檔案以查找安全漏洞？ -完整的工作流程 如何建構完整的脫敏和淨化流程？後續步驟

真實編輯和視覺疊加有什麼不同？

對於任何處理敏感文件的人來說，了解真正的塗黑和視覺疊加之間的差異至關重要。許多工具和手動方法會造成資料被編輯的假象，但實際上並沒有刪除底層資料。這種虛假的安全感已經導致了許多備受矚目的資料外洩和合規失敗事件。

視覺疊加方法通常是在敏感內容上繪製不透明形狀。 PDF結構中的文字內容保持完整。查看文件的人會看到一個黑色矩形，但原始字元仍然存在於文件的內容流中。選取頁面上的所有文字、使用輔助使用工具或檢查原始 PDF 數據，即可顯示所有原本隱藏的內容。當對方律師隨意地將經過刪減的文件恢復原狀時，法庭案件的公正性就會受到影響。政府機構曾意外洩露機密訊息，這些資訊看似經過審查，但實際上完全可以恢復。

真正的編輯方式有所不同。當您使用 IronPDF 的編輯方法時，該程式庫會在 PDF 的內部結構中找到指定的文字並將其完全刪除。角色資料已從內容流中刪除。視覺呈現會被塗改標記(通常為黑色矩形)取代，但原始內容已從檔案中刪除。無論進行多少次選擇、複製或取證分析，都無法恢復已永久刪除的內容。

IronPDF透過在結構層面修改 PDF 來實現真正的內容編輯。 RedactTextOnAllPages 方法及其變體搜尋頁面內容，識別匹配的文本，將其從文檔物件模型中刪除，並可選擇在內容曾經出現的位置繪製視覺指示器。這種方法符合 NIST 等組織制定的安全文件編輯指南。

其實際意義重大。如果您需要對外共用文件、提交文件進行法律取證、根據資訊自由請求發布記錄或在保護個人識別資訊的同時分發報告，則只有真正的編輯才能提供足夠的保護。對於只想將注意力從某些部分轉移到內部草稿的情況，視覺疊加層可能就足夠了，但絕不應該依賴它們來保護實際資料。如需更多文件安全措施，請參閱我們關於加密 PDF 和數位簽章的指南。

How do I Redact PDF Text in C# Across an Entire Document?

最常見的編輯場景是刪除文件中所有特定文字。或許你需要從報告中刪除某人的姓名，從財務報表中刪除帳號，或在對外分發之前刪除內部參考代碼。 IronPDF使用 RedactTextOnAllPages 方法使其變得簡單。

輸入

員工記錄文件，包含個人資訊，包括姓名、社會安全號碼和員工編號。

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-text-all-pages.cs

using IronPdf;

// Load the source document
PdfDocument pdf = PdfDocument.FromFile("employee-records.pdf");

// Redact an employee name from the entire document
pdf.RedactTextOnAllPages("John Smith");

// Redact a Social Security Number
pdf.RedactTextOnAllPages("123-45-6789");

// Redact an internal employee ID
pdf.RedactTextOnAllPages("EMP-2024-0042");

// Save the cleaned document
pdf.SaveAs("employee-records-redacted.pdf");

Imports IronPdf

' Load the source document
Dim pdf As PdfDocument = PdfDocument.FromFile("employee-records.pdf")

' Redact an employee name from the entire document
pdf.RedactTextOnAllPages("John Smith")

' Redact a Social Security Number
pdf.RedactTextOnAllPages("123-45-6789")

' Redact an internal employee ID
pdf.RedactTextOnAllPages("EMP-2024-0042")

' Save the cleaned document
pdf.SaveAs("employee-records-redacted.pdf")

$vbLabelText $csharpLabel

這段程式碼載入一個包含員工資訊的 PDF 文件，並透過對每個值呼叫 RedactTextOnAllPages 來刪除三個機密資料。每次呼叫都會搜尋文件中的每一頁，並永久刪除員工姓名、社會安全號碼和內部識別碼的所有符合項目。

範例輸出

預設行為是在被編輯的文字出現的位置繪製黑色矩形，並在文件結構中用星號取代實際字元。這既能直觀地確認已進行編輯，又能確保原始內容完全消失。

處理較長的文件或多個編輯目標時，您可以有效率地串聯這些呼叫：

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-text-list.cs

using IronPdf;
using System.Collections.Generic;

// Load the document once
PdfDocument pdf = PdfDocument.FromFile("quarterly-report.pdf");

// Define all terms that need redaction
List<string> sensitiveTerms = new List<string>
{
    "Project Titan",
    "Sarah Johnson",
    "Budget: $4.2M",
    "Q3-INTERNAL-2024",
    "sarah.johnson@company.com"
};

// Redact each term
foreach (string term in sensitiveTerms)
{
    pdf.RedactTextOnAllPages(term);
}

// Save the result
pdf.SaveAs("quarterly-report-public.pdf");

Imports IronPdf
Imports System.Collections.Generic

' Load the document once
Dim pdf As PdfDocument = PdfDocument.FromFile("quarterly-report.pdf")

' Define all terms that need redaction
Dim sensitiveTerms As New List(Of String) From {
    "Project Titan",
    "Sarah Johnson",
    "Budget: $4.2M",
    "Q3-INTERNAL-2024",
    "sarah.johnson@company.com"
}

' Redact each term
For Each term As String In sensitiveTerms
    pdf.RedactTextOnAllPages(term)
Next

' Save the result
pdf.SaveAs("quarterly-report-public.pdf")

$vbLabelText $csharpLabel

當您有一份已知的敏感值清單需要刪除時，這種模式非常有效。文件載入一次，所有編輯操作都在記憶體中完成，最終結果保存。每個術語都是獨立處理的，因此術語之間的部分匹配或格式差異不會影響其他編輯。

如何僅對特定頁面上的文字進行編輯？

有時你需要更精確地控制刪減的位置。文件可能有一個封面頁，其中包含應該保持完整的信息，或者您可能知道機密數據只出現在某些部分。 IronPDF提供 RedactTextOnPage 用於單頁編輯，以及 RedactTextOnPages 用於針對多個特定頁面進行編輯。

輸入

這是一份多頁的合約文件，客戶姓名印在簽名頁上，財務條款則出現在文件的特定頁面上。

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-specific-pages.cs

using IronPdf;

// Load the document
PdfDocument pdf = PdfDocument.FromFile("contract-bundle.pdf");

// Redact text only on page 1 (index 0)
pdf.RedactTextOnPage(0, "Client Name: Acme Corporation");

// Redact text on pages 3, 5, and 7 (indices 2, 4, 6)
int[] financialPages = { 2, 4, 6 };
pdf.RedactTextOnPages(financialPages, "Payment Terms: Net 30");

// Other pages remain untouched except for the specific redactions applied

pdf.SaveAs("contract-bundle-redacted.pdf");

Imports IronPdf

' Load the document
Dim pdf As PdfDocument = PdfDocument.FromFile("contract-bundle.pdf")

' Redact text only on page 1 (index 0)
pdf.RedactTextOnPage(0, "Client Name: Acme Corporation")

' Redact text on pages 3, 5, and 7 (indices 2, 4, 6)
Dim financialPages As Integer() = {2, 4, 6}
pdf.RedactTextOnPages(financialPages, "Payment Terms: Net 30")

' Other pages remain untouched except for the specific redactions applied

pdf.SaveAs("contract-bundle-redacted.pdf")

$vbLabelText $csharpLabel

此程式碼示範了使用 RedactTextOnPage 對單一頁面進行定向編輯，使用 RedactTextOnPages 對多個特定頁面進行定向編輯。僅從第 1 頁(索引 0)中刪除客戶名稱，而從第 3、5 和 7 頁(索引 2、4、6)中刪除付款條款，其餘頁面保持不變。

範例輸出

IronPDF中的頁面索引是從零開始的，這表示第一頁的索引為 0，第二頁的索引為 1，依此類推。這符合標準程式設計慣例，也與大多數開發人員對陣列存取的思考方式一致。

針對特定頁面進行處理可以提高處理大型文件的效能。與其掃描數百頁查找只在少數位置出現的文本，不如直接指示編輯引擎在何處查找。這對於批量處理場景非常重要，因為在這種場景下，您可能需要處理成千上萬份文件。為了獲得最大吞吐量，請考慮使用非同步和多執行緒技術。

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-large-document.cs

using IronPdf;

// Process a large document efficiently
PdfDocument pdf = PdfDocument.FromFile("annual-report-500-pages.pdf");

// We know from document structure that:
// - Executive summary with names is on pages 1-3
// - Financial data is on pages 45-60
// - Appendix with employee info is on pages 480-495

// Redact executive names from summary section
for (int i = 0; i <= 2; i++)
{
    pdf.RedactTextOnPage(i, "CEO: Robert Williams");
    pdf.RedactTextOnPage(i, "CFO: Maria Garcia");
}

// Redact specific financial figures from the financial section
int[] financialSection = { 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 };
pdf.RedactTextOnPages(financialSection, "Net Revenue: $847M");

// Redact employee identifiers from appendix
for (int i = 479; i <= 494; i++)
{
    pdf.RedactTextOnPage(i, "Employee ID:");
}

pdf.SaveAs("annual-report-public-release.pdf");

Imports IronPdf

' Process a large document efficiently
Dim pdf As PdfDocument = PdfDocument.FromFile("annual-report-500-pages.pdf")

' We know from document structure that:
' - Executive summary with names is on pages 1-3
' - Financial data is on pages 45-60
' - Appendix with employee info is on pages 480-495

' Redact executive names from summary section
For i As Integer = 0 To 2
    pdf.RedactTextOnPage(i, "CEO: Robert Williams")
    pdf.RedactTextOnPage(i, "CFO: Maria Garcia")
Next

' Redact specific financial figures from the financial section
Dim financialSection As Integer() = {44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59}
pdf.RedactTextOnPages(financialSection, "Net Revenue: $847M")

' Redact employee identifiers from appendix
For i As Integer = 479 To 494
    pdf.RedactTextOnPage(i, "Employee ID:")
Next

pdf.SaveAs("annual-report-public-release.pdf")

$vbLabelText $csharpLabel

這種有針對性的方法只處理 500 頁文件的相關部分，與掃描每一頁的每個編輯術語相比，大大縮短了執行時間。

如何自訂已編輯內容的外觀？

IronPDF提供了多個參數來控制編輯內容在最終文件中的顯示方式。您可以調整區分大小寫、全字匹配、是否繪製視覺矩形以及在編輯內容的位置顯示什麼替換文字。

輸入

一份法律文件，其中包含各種敏感術語，包括分類標籤、密碼和內部參考代碼，需要不同的編輯處理。

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/customize-redaction-appearance.cs

using IronPdf;

// Load the document
PdfDocument pdf = PdfDocument.FromFile("legal-brief.pdf");

// Case-sensitive redaction: only matches exact case
// "CLASSIFIED" will be redacted but "classified" or "Classified" will not
pdf.RedactTextOnAllPages(
    "CLASSIFIED",
    CaseSensitive: true,
    OnlyMatchWholeWords: true,
    DrawRectangles: true,
    ReplacementText: "[REDACTED]"
);

// Case-insensitive redaction: matches regardless of case
// Will redact "Secret", "SECRET", "secret", etc.
pdf.RedactTextOnAllPages(
    "secret",
    CaseSensitive: false,
    OnlyMatchWholeWords: true,
    DrawRectangles: true,
    ReplacementText: "*****"
);

// Whole word disabled: matches partial strings too
// Will redact "password", "passwords", "mypassword123", etc.
pdf.RedactTextOnAllPages(
    "password",
    CaseSensitive: false,
    OnlyMatchWholeWords: false,
    DrawRectangles: true,
    ReplacementText: "XXXXX"
);

// No visual rectangle: text is removed but no black box appears
// Useful when you want seamless removal without obvious redaction marks
pdf.RedactTextOnAllPages(
    "internal-reference-code",
    CaseSensitive: true,
    OnlyMatchWholeWords: true,
    DrawRectangles: false,
    ReplacementText: ""
);

pdf.SaveAs("legal-brief-redacted.pdf");

Imports IronPdf

' Load the document
Dim pdf As PdfDocument = PdfDocument.FromFile("legal-brief.pdf")

' Case-sensitive redaction: only matches exact case
' "CLASSIFIED" will be redacted but "classified" or "Classified" will not
pdf.RedactTextOnAllPages(
    "CLASSIFIED",
    CaseSensitive:=True,
    OnlyMatchWholeWords:=True,
    DrawRectangles:=True,
    ReplacementText:="[REDACTED]"
)

' Case-insensitive redaction: matches regardless of case
' Will redact "Secret", "SECRET", "secret", etc.
pdf.RedactTextOnAllPages(
    "secret",
    CaseSensitive:=False,
    OnlyMatchWholeWords:=True,
    DrawRectangles:=True,
    ReplacementText:="*****"
)

' Whole word disabled: matches partial strings too
' Will redact "password", "passwords", "mypassword123", etc.
pdf.RedactTextOnAllPages(
    "password",
    CaseSensitive:=False,
    OnlyMatchWholeWords:=False,
    DrawRectangles:=True,
    ReplacementText:="XXXXX"
)

' No visual rectangle: text is removed but no black box appears
' Useful when you want seamless removal without obvious redaction marks
pdf.RedactTextOnAllPages(
    "internal-reference-code",
    CaseSensitive:=True,
    OnlyMatchWholeWords:=True,
    DrawRectangles:=False,
    ReplacementText:=""
)

pdf.SaveAs("legal-brief-redacted.pdf")

$vbLabelText $csharpLabel

此程式碼示範了使用 RedactTextOnAllPages 的可選參數的四種不同的編輯配置。它顯示區分大小寫的精確匹配(使用"[已編輯]"替換)、不區分大小寫的匹配(使用星號)、部分單字匹配(用於捕獲"密碼"等變體)以及無視覺矩形的隱形刪除，以實現無縫內容消除。

範例輸出

這些參數根據您的需求發揮不同的作用：

CaseSensitive決定匹配是否考慮字母大小寫。法律文件通常使用具有特定含義的大寫字母，因此區分大小寫的匹配可以確保您只刪除完全匹配的項目。處理大小寫不一的一般文字時，可能需要進行不區分大小寫的匹配才能捕獲所有實例。

OnlyMatchWholeWords控制搜尋是符合完整單字還是部分字串。在編輯姓名時，通常需要進行全詞匹配，這樣"Smith"就不會意外地編輯掉"Blacksmith"或"Smithfield"的一部分。在對帳號前綴等模式進行編輯時，可能需要進行部分配對才能發現差異。

DrawRectangles指定是否在內容移除的地方顯示黑色方塊。大多數監管和法律環境都要求使用可見的編輯標記，以證明內容是故意刪除的，而不是意外遺漏的。內部工作流程可能更傾向於採用不可見的刪除方式，以獲得更簡潔的輸出。

ReplacementText定義了用於取代已編輯內容的字元。常見選項包括星號、"已編輯"標籤或空字串。如果有人嘗試從已編輯區域選擇或複製內容，則替換文字會出現在文件結構中。

如何使用正規表示式尋找和編輯敏感模式？

當您需要刪除特定值時，刪除已知文字字串是有效的，但許多機密資料類型遵循可預測的模式，而不是固定值。社會安全號碼、信用卡號碼、電子郵件地址、電話號碼和日期都具有可識別的格式，可以用正規表示式進行比對。建立基於模式的編輯系統，可以在無需預先知道每個特定值的情況下，從 PDF 內容中刪除私人資訊。

IronPDF 的文字擷取功能與編輯方法結合，可實現強大的模式匹配工作流程。提取文本，使用.NET正規表示式識別匹配項，然後對每個發現的值進行編輯。

using IronPdf;
using System.Text.RegularExpressions;
using System.Collections.Generic;

public class PatternRedactor
{
    // Common patterns for sensitive data
    private static readonly Dictionary<string, string> SensitivePatterns = new Dictionary<string, string>
    {
        // US Social Security Number: 123-45-6789
        { "SSN", @"\b\d{3}-\d{2}-\d{4}\b" },

        // Credit Card Numbers: various formats with 13-19 digits
        { "CreditCard", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },

        // Email Addresses
        { "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },

        // US Phone Numbers: (123) 456-7890 or 123-456-7890
        { "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" },

        // Dates: MM/DD/YYYY or MM-DD-YYYY
        { "Date", @"\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" },

        // IP Addresses
        { "IPAddress", @"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b" }
    };

    public void RedactPatterns(string inputPath, string outputPath, params string[] patternNames)
    {
        // Load the PDF
        PdfDocument pdf = PdfDocument.FromFile(inputPath);

        // Extract all text from the document
        string fullText = pdf.ExtractAllText();

        // Track unique matches to avoid duplicate redaction attempts
        HashSet<string> matchesToRedact = new HashSet<string>();

        // Find all matches for requested patterns
        foreach (string patternName in patternNames)
        {
            if (SensitivePatterns.TryGetValue(patternName, out string pattern))
            {
                Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
                MatchCollection matches = regex.Matches(fullText);

                foreach (Match match in matches)
                {
                    matchesToRedact.Add(match.Value);
                }
            }
        }

        // Redact each unique match
        foreach (string sensitiveValue in matchesToRedact)
        {
            pdf.RedactTextOnAllPages(sensitiveValue);
        }

        // Save the redacted document
        pdf.SaveAs(outputPath);
    }
}

// Usage example
class Program
{
    static void Main()
    {
        PatternRedactor redactor = new PatternRedactor();

        // Redact SSNs and credit cards from a financial document
        redactor.RedactPatterns(
            "customer-data.pdf",
            "customer-data-safe.pdf",
            "SSN", "CreditCard", "Email"
        );
    }
}

using IronPdf;
using System.Text.RegularExpressions;
using System.Collections.Generic;

public class PatternRedactor
{
    // Common patterns for sensitive data
    private static readonly Dictionary<string, string> SensitivePatterns = new Dictionary<string, string>
    {
        // US Social Security Number: 123-45-6789
        { "SSN", @"\b\d{3}-\d{2}-\d{4}\b" },

        // Credit Card Numbers: various formats with 13-19 digits
        { "CreditCard", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },

        // Email Addresses
        { "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },

        // US Phone Numbers: (123) 456-7890 or 123-456-7890
        { "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" },

        // Dates: MM/DD/YYYY or MM-DD-YYYY
        { "Date", @"\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" },

        // IP Addresses
        { "IPAddress", @"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b" }
    };

    public void RedactPatterns(string inputPath, string outputPath, params string[] patternNames)
    {
        // Load the PDF
        PdfDocument pdf = PdfDocument.FromFile(inputPath);

        // Extract all text from the document
        string fullText = pdf.ExtractAllText();

        // Track unique matches to avoid duplicate redaction attempts
        HashSet<string> matchesToRedact = new HashSet<string>();

        // Find all matches for requested patterns
        foreach (string patternName in patternNames)
        {
            if (SensitivePatterns.TryGetValue(patternName, out string pattern))
            {
                Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
                MatchCollection matches = regex.Matches(fullText);

                foreach (Match match in matches)
                {
                    matchesToRedact.Add(match.Value);
                }
            }
        }

        // Redact each unique match
        foreach (string sensitiveValue in matchesToRedact)
        {
            pdf.RedactTextOnAllPages(sensitiveValue);
        }

        // Save the redacted document
        pdf.SaveAs(outputPath);
    }
}

// Usage example
class Program
{
    static void Main()
    {
        PatternRedactor redactor = new PatternRedactor();

        // Redact SSNs and credit cards from a financial document
        redactor.RedactPatterns(
            "customer-data.pdf",
            "customer-data-safe.pdf",
            "SSN", "CreditCard", "Email"
        );
    }
}

Imports IronPdf
Imports System.Text.RegularExpressions
Imports System.Collections.Generic

Public Class PatternRedactor
    ' Common patterns for sensitive data
    Private Shared ReadOnly SensitivePatterns As New Dictionary(Of String, String) From {
        ' US Social Security Number: 123-45-6789
        {"SSN", "\b\d{3}-\d{2}-\d{4}\b"},

        ' Credit Card Numbers: various formats with 13-19 digits
        {"CreditCard", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"},

        ' Email Addresses
        {"Email", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"},

        ' US Phone Numbers: (123) 456-7890 or 123-456-7890
        {"Phone", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"},

        ' Dates: MM/DD/YYYY or MM-DD-YYYY
        {"Date", "\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b"},

        ' IP Addresses
        {"IPAddress", "\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b"}
    }

    Public Sub RedactPatterns(inputPath As String, outputPath As String, ParamArray patternNames As String())
        ' Load the PDF
        Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath)

        ' Extract all text from the document
        Dim fullText As String = pdf.ExtractAllText()

        ' Track unique matches to avoid duplicate redaction attempts
        Dim matchesToRedact As New HashSet(Of String)()

        ' Find all matches for requested patterns
        For Each patternName As String In patternNames
            Dim pattern As String = Nothing
            If SensitivePatterns.TryGetValue(patternName, pattern) Then
                Dim regex As New Regex(pattern, RegexOptions.IgnoreCase)
                Dim matches As MatchCollection = regex.Matches(fullText)

                For Each match As Match In matches
                    matchesToRedact.Add(match.Value)
                Next
            End If
        Next

        ' Redact each unique match
        For Each sensitiveValue As String In matchesToRedact
            pdf.RedactTextOnAllPages(sensitiveValue)
        Next

        ' Save the redacted document
        pdf.SaveAs(outputPath)
    End Sub
End Class

' Usage example
Class Program
    Shared Sub Main()
        Dim redactor As New PatternRedactor()

        ' Redact SSNs and credit cards from a financial document
        redactor.RedactPatterns(
            "customer-data.pdf",
            "customer-data-safe.pdf",
            "SSN", "CreditCard", "Email"
        )
    End Sub
End Class

$vbLabelText $csharpLabel

這種基於模式的方法具有良好的可擴充性，因為您只需定義一次模式，即可將其套用至任何文件。新增的資料類型只需要在字典中新增新的正規表示式模式。

如何建立一個可重複使用的敏感資料掃描器？

對於生產環境，您通常需要掃描文件並報告其中存在的機密信息，然後再決定是否進行編輯。這有助於合規性審計，並允許人工審核編輯決定。下列類別除了提供編輯功能外，還提供掃描功能。

using IronPdf;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Linq;

public class SensitiveDataMatch
{
    public string PatternType { get; set; }
    public string Value { get; set; }
    public int PageNumber { get; set; }
}

public class ScanResult
{
    public string FilePath { get; set; }
    public List<SensitiveDataMatch> Matches { get; set; } = new List<SensitiveDataMatch>();
    public bool ContainsSensitiveData => Matches.Count > 0;

    public Dictionary<string, int> GetSummary()
    {
        return Matches.GroupBy(m => m.PatternType)
                      .ToDictionary(g => g.Key, g => g.Count());
    }
}

public class DocumentScanner
{
    private readonly Dictionary<string, string> _patterns;

    public DocumentScanner()
    {
        _patterns = new Dictionary<string, string>
        {
            { "Social Security Number", @"\b\d{3}-\d{2}-\d{4}\b" },
            { "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
            { "Email Address", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
            { "Phone Number", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" },
            { "Date of Birth Pattern", @"\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" }
        };
    }

    public ScanResult ScanDocument(string filePath)
    {
        ScanResult result = new ScanResult { FilePath = filePath };
        PdfDocument pdf = PdfDocument.FromFile(filePath);

        // Scan each page individually to track location
        for (int pageIndex = 0; pageIndex < pdf.PageCount; pageIndex++)
        {
            string pageText = pdf.ExtractTextFromPage(pageIndex);

            foreach (var pattern in _patterns)
            {
                Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
                MatchCollection matches = regex.Matches(pageText);

                foreach (Match match in matches)
                {
                    result.Matches.Add(new SensitiveDataMatch
                    {
                        PatternType = pattern.Key,
                        Value = MaskValue(match.Value, pattern.Key),
                        PageNumber = pageIndex + 1
                    });
                }
            }
        }

        return result;
    }

    // Partially mask values for safe storage
    private string MaskValue(string value, string patternType)
    {
        if (patternType == "Social Security Number" && value.Length >= 4)
        {
            return "XXX-XX-" + value.Substring(value.Length - 4);
        }
        if (patternType == "Credit Card" && value.Length >= 4)
        {
            return "****-****-****-" + value.Substring(value.Length - 4);
        }
        if (patternType == "Email Address")
        {
            int atIndex = value.IndexOf('@');
            if (atIndex > 2)
            {
                return value.Substring(0, 2) + "***" + value.Substring(atIndex);
            }
        }
        return value.Length > 4 ? value.Substring(0, 2) + "***" : "****";
    }

    public void ScanAndRedact(string inputPath, string outputPath)
    {
        // First scan to identify sensitive data
        ScanResult scanResult = ScanDocument(inputPath);

        if (!scanResult.ContainsSensitiveData)
        {
            return;
        }

        // Load document for redaction
        PdfDocument pdf = PdfDocument.FromFile(inputPath);

        // Extract unique actual values (not masked) for redaction
        string fullText = pdf.ExtractAllText();
        HashSet<string> valuesToRedact = new HashSet<string>();

        foreach (var pattern in _patterns)
        {
            Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
            foreach (Match match in regex.Matches(fullText))
            {
                valuesToRedact.Add(match.Value);
            }
        }

        // Apply redactions
        foreach (string value in valuesToRedact)
        {
            pdf.RedactTextOnAllPages(value);
        }

        pdf.SaveAs(outputPath);
    }
}

// Usage
class Program
{
    static void Main()
    {
        DocumentScanner scanner = new DocumentScanner();

        // Scan only (for audit purposes)
        ScanResult result = scanner.ScanDocument("application-form.pdf");
        var summary = result.GetSummary();

        // Scan and redact in one operation
        scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf");
    }
}

using IronPdf;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Linq;

public class SensitiveDataMatch
{
    public string PatternType { get; set; }
    public string Value { get; set; }
    public int PageNumber { get; set; }
}

public class ScanResult
{
    public string FilePath { get; set; }
    public List<SensitiveDataMatch> Matches { get; set; } = new List<SensitiveDataMatch>();
    public bool ContainsSensitiveData => Matches.Count > 0;

    public Dictionary<string, int> GetSummary()
    {
        return Matches.GroupBy(m => m.PatternType)
                      .ToDictionary(g => g.Key, g => g.Count());
    }
}

public class DocumentScanner
{
    private readonly Dictionary<string, string> _patterns;

    public DocumentScanner()
    {
        _patterns = new Dictionary<string, string>
        {
            { "Social Security Number", @"\b\d{3}-\d{2}-\d{4}\b" },
            { "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
            { "Email Address", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
            { "Phone Number", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" },
            { "Date of Birth Pattern", @"\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" }
        };
    }

    public ScanResult ScanDocument(string filePath)
    {
        ScanResult result = new ScanResult { FilePath = filePath };
        PdfDocument pdf = PdfDocument.FromFile(filePath);

        // Scan each page individually to track location
        for (int pageIndex = 0; pageIndex < pdf.PageCount; pageIndex++)
        {
            string pageText = pdf.ExtractTextFromPage(pageIndex);

            foreach (var pattern in _patterns)
            {
                Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
                MatchCollection matches = regex.Matches(pageText);

                foreach (Match match in matches)
                {
                    result.Matches.Add(new SensitiveDataMatch
                    {
                        PatternType = pattern.Key,
                        Value = MaskValue(match.Value, pattern.Key),
                        PageNumber = pageIndex + 1
                    });
                }
            }
        }

        return result;
    }

    // Partially mask values for safe storage
    private string MaskValue(string value, string patternType)
    {
        if (patternType == "Social Security Number" && value.Length >= 4)
        {
            return "XXX-XX-" + value.Substring(value.Length - 4);
        }
        if (patternType == "Credit Card" && value.Length >= 4)
        {
            return "****-****-****-" + value.Substring(value.Length - 4);
        }
        if (patternType == "Email Address")
        {
            int atIndex = value.IndexOf('@');
            if (atIndex > 2)
            {
                return value.Substring(0, 2) + "***" + value.Substring(atIndex);
            }
        }
        return value.Length > 4 ? value.Substring(0, 2) + "***" : "****";
    }

    public void ScanAndRedact(string inputPath, string outputPath)
    {
        // First scan to identify sensitive data
        ScanResult scanResult = ScanDocument(inputPath);

        if (!scanResult.ContainsSensitiveData)
        {
            return;
        }

        // Load document for redaction
        PdfDocument pdf = PdfDocument.FromFile(inputPath);

        // Extract unique actual values (not masked) for redaction
        string fullText = pdf.ExtractAllText();
        HashSet<string> valuesToRedact = new HashSet<string>();

        foreach (var pattern in _patterns)
        {
            Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
            foreach (Match match in regex.Matches(fullText))
            {
                valuesToRedact.Add(match.Value);
            }
        }

        // Apply redactions
        foreach (string value in valuesToRedact)
        {
            pdf.RedactTextOnAllPages(value);
        }

        pdf.SaveAs(outputPath);
    }
}

// Usage
class Program
{
    static void Main()
    {
        DocumentScanner scanner = new DocumentScanner();

        // Scan only (for audit purposes)
        ScanResult result = scanner.ScanDocument("application-form.pdf");
        var summary = result.GetSummary();

        // Scan and redact in one operation
        scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf");
    }
}

Imports IronPdf
Imports System.Collections.Generic
Imports System.Text.RegularExpressions
Imports System.Linq

Public Class SensitiveDataMatch
    Public Property PatternType As String
    Public Property Value As String
    Public Property PageNumber As Integer
End Class

Public Class ScanResult
    Public Property FilePath As String
    Public Property Matches As List(Of SensitiveDataMatch) = New List(Of SensitiveDataMatch)()
    Public ReadOnly Property ContainsSensitiveData As Boolean
        Get
            Return Matches.Count > 0
        End Get
    End Property

    Public Function GetSummary() As Dictionary(Of String, Integer)
        Return Matches.GroupBy(Function(m) m.PatternType) _
                      .ToDictionary(Function(g) g.Key, Function(g) g.Count())
    End Function
End Class

Public Class DocumentScanner
    Private ReadOnly _patterns As Dictionary(Of String, String)

    Public Sub New()
        _patterns = New Dictionary(Of String, String) From {
            {"Social Security Number", "\b\d{3}-\d{2}-\d{4}\b"},
            {"Credit Card", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"},
            {"Email Address", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"},
            {"Phone Number", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"},
            {"Date of Birth Pattern", "\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b"}
        }
    End Sub

    Public Function ScanDocument(filePath As String) As ScanResult
        Dim result As New ScanResult With {.FilePath = filePath}
        Dim pdf As PdfDocument = PdfDocument.FromFile(filePath)

        ' Scan each page individually to track location
        For pageIndex As Integer = 0 To pdf.PageCount - 1
            Dim pageText As String = pdf.ExtractTextFromPage(pageIndex)

            For Each pattern In _patterns
                Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase)
                Dim matches As MatchCollection = regex.Matches(pageText)

                For Each match As Match In matches
                    result.Matches.Add(New SensitiveDataMatch With {
                        .PatternType = pattern.Key,
                        .Value = MaskValue(match.Value, pattern.Key),
                        .PageNumber = pageIndex + 1
                    })
                Next
            Next
        Next

        Return result
    End Function

    ' Partially mask values for safe storage
    Private Function MaskValue(value As String, patternType As String) As String
        If patternType = "Social Security Number" AndAlso value.Length >= 4 Then
            Return "XXX-XX-" & value.Substring(value.Length - 4)
        End If
        If patternType = "Credit Card" AndAlso value.Length >= 4 Then
            Return "****-****-****-" & value.Substring(value.Length - 4)
        End If
        If patternType = "Email Address" Then
            Dim atIndex As Integer = value.IndexOf("@"c)
            If atIndex > 2 Then
                Return value.Substring(0, 2) & "***" & value.Substring(atIndex)
            End If
        End If
        Return If(value.Length > 4, value.Substring(0, 2) & "***", "****")
    End Function

    Public Sub ScanAndRedact(inputPath As String, outputPath As String)
        ' First scan to identify sensitive data
        Dim scanResult As ScanResult = ScanDocument(inputPath)

        If Not scanResult.ContainsSensitiveData Then
            Return
        End If

        ' Load document for redaction
        Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath)

        ' Extract unique actual values (not masked) for redaction
        Dim fullText As String = pdf.ExtractAllText()
        Dim valuesToRedact As New HashSet(Of String)()

        For Each pattern In _patterns
            Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase)
            For Each match As Match In regex.Matches(fullText)
                valuesToRedact.Add(match.Value)
            Next
        Next

        ' Apply redactions
        For Each value As String In valuesToRedact
            pdf.RedactTextOnAllPages(value)
        Next

        pdf.SaveAs(outputPath)
    End Sub
End Class

' Usage
Module Program
    Sub Main()
        Dim scanner As New DocumentScanner()

        ' Scan only (for audit purposes)
        Dim result As ScanResult = scanner.ScanDocument("application-form.pdf")
        Dim summary = result.GetSummary()

        ' Scan and redact in one operation
        scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf")
    End Sub
End Module

$vbLabelText $csharpLabel

掃描器可以顯示在任何修改發生之前存在的機密資訊。這有助於合規工作流程，在此類工作流程中，您需要記錄發現和移除的物品。遮罩功能可確保日誌檔案和報告本身不會成為資料外洩的來源。

如何在PDF檔案中隱藏特定區域？

文字編輯可以有效地處理基於字元的內容，但 PDF 文件中通常包含文字匹配無法處理的敏感資訊。簽名、照片、手寫註解、印章和圖形元素需要採用不同的處理方法。基於區域的編輯功能可讓您透過座標指定矩形區域，並永久遮蔽這些區域內的所有內容。

IronPDF使用 RectangleF 結構定義編輯區域。您需要指定左上角的 X 和 Y 座標，然後指定區域的寬度和高度。座標以頁面左下角為基準進行測量，這與 PDF 規範的座標系相符。

輸入

一份已簽署的協議文件，其中包含手寫簽名和照片身份證，需要使用基於坐標的區域定位進行編輯。

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-region-basic.cs

using IronPdf;
using IronSoftware.Drawing;

// Load a document with signature blocks and photos
PdfDocument pdf = PdfDocument.FromFile("signed-agreement.pdf");

// Define a region for a signature block
// Located 100 points from left, 650 points from bottom
// Width of 200 points, height of 50 points
RectangleF signatureRegion = new RectangleF(100, 650, 200, 50);

// Redact the signature region on all pages
pdf.RedactRegionsOnAllPages(signatureRegion);

// Define a region for a photo ID in the upper right
RectangleF photoRegion = new RectangleF(450, 700, 100, 120);
pdf.RedactRegionsOnAllPages(photoRegion);

// Save the document with regions redacted
pdf.SaveAs("signed-agreement-redacted.pdf");

Imports IronPdf
Imports IronSoftware.Drawing

' Load a document with signature blocks and photos
Dim pdf As PdfDocument = PdfDocument.FromFile("signed-agreement.pdf")

' Define a region for a signature block
' Located 100 points from left, 650 points from bottom
' Width of 200 points, height of 50 points
Dim signatureRegion As New RectangleF(100, 650, 200, 50)

' Redact the signature region on all pages
pdf.RedactRegionsOnAllPages(signatureRegion)

' Define a region for a photo ID in the upper right
Dim photoRegion As New RectangleF(450, 700, 100, 120)
pdf.RedactRegionsOnAllPages(photoRegion)

' Save the document with regions redacted
pdf.SaveAs("signed-agreement-redacted.pdf")

$vbLabelText $csharpLabel

此程式碼使用 RectangleF 結構來定義用於編輯的矩形區域。簽名區域位於座標 (100, 650) 處，面積為 200x50 像素；照片區域位於 (450, 700) 處，面積為 100x120 像素。 RedactRegionsOnAllPages 方法在所有頁面上的這些區域套用黑色矩形。

範例輸出

確定正確的座標通常需要一些實驗或測量。 PDF 頁面通常使用座標系，其中一點等於 1/72 英吋。標準的美國信紙寬度為 612 磅，高度為 792 磅。 A4 紙張的尺寸約為 595 磅 x 842 磅。使用能夠隨著遊標移動顯示座標的 PDF 檢視工具會很有幫助，或者您也可以透過程式設計擷取頁面尺寸：

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-region-dimensions.cs

using IronPdf;
using IronSoftware.Drawing;

PdfDocument pdf = PdfDocument.FromFile("form-document.pdf");

// Get dimensions of the first page
var pageInfo = pdf.Pages[0];

// Calculate regions relative to page dimensions
// Redact the bottom quarter of the page where signatures appear
float signatureAreaHeight = (float)(pageInfo.Height / 4);
RectangleF bottomQuarter = new RectangleF(
    0,                              // Start at left edge
    0,                              // Start at bottom
    (float)pageInfo.Width,          // Full page width
    signatureAreaHeight             // Quarter of page height
);

pdf.RedactRegionsOnAllPages(bottomQuarter);

// Redact a header area at the top containing letterhead with address
float headerHeight = 100;
RectangleF headerArea = new RectangleF(
    0,
    (float)(pageInfo.Height - headerHeight), // Position from bottom
    (float)pageInfo.Width,
    headerHeight
);

pdf.RedactRegionsOnAllPages(headerArea);

pdf.SaveAs("form-document-redacted.pdf");

Imports IronPdf
Imports IronSoftware.Drawing

Dim pdf As PdfDocument = PdfDocument.FromFile("form-document.pdf")

' Get dimensions of the first page
Dim pageInfo = pdf.Pages(0)

' Calculate regions relative to page dimensions
' Redact the bottom quarter of the page where signatures appear
Dim signatureAreaHeight As Single = CSng(pageInfo.Height / 4)
Dim bottomQuarter As New RectangleF(0, 0, CSng(pageInfo.Width), signatureAreaHeight)

pdf.RedactRegionsOnAllPages(bottomQuarter)

' Redact a header area at the top containing letterhead with address
Dim headerHeight As Single = 100
Dim headerArea As New RectangleF(0, CSng(pageInfo.Height - headerHeight), CSng(pageInfo.Width), headerHeight)

pdf.RedactRegionsOnAllPages(headerArea)

pdf.SaveAs("form-document-redacted.pdf")

$vbLabelText $csharpLabel

如何跨不同頁面編輯多個區域？

複雜的文件通常需要在不同的頁面上對不同的區域進行編輯。多頁表格的簽名欄位置可能不同，或者不同頁面可能在不同位置包含照片、印章或其他圖形元素。 IronPDF包含針對特定頁面區域進行編輯的方法。

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-multiple-regions.cs

using IronPdf;
using IronSoftware.Drawing;

PdfDocument pdf = PdfDocument.FromFile("multi-page-application.pdf");

// Define page-specific redaction regions
// Page 1: Cover page with applicant photo
RectangleF page1Photo = new RectangleF(450, 600, 120, 150);
pdf.RedactRegionOnPage(0, page1Photo);

// Page 2: Personal information section
RectangleF page2InfoBlock = new RectangleF(50, 400, 250, 200);
pdf.RedactRegionOnPage(1, page2InfoBlock);

// Pages 3-5: Signature lines at the same position
RectangleF signatureLine = new RectangleF(100, 100, 200, 40);
int[] signaturePages = { 2, 3, 4 };
pdf.RedactRegionOnPages(signaturePages, signatureLine);

// Page 6: Multiple regions - notary stamp and witness signature
RectangleF notaryStamp = new RectangleF(400, 150, 150, 150);
RectangleF witnessSignature = new RectangleF(100, 150, 200, 40);
pdf.RedactRegionOnPage(5, notaryStamp);
pdf.RedactRegionOnPage(5, witnessSignature);

pdf.SaveAs("multi-page-application-redacted.pdf");

Imports IronPdf
Imports IronSoftware.Drawing

Dim pdf As PdfDocument = PdfDocument.FromFile("multi-page-application.pdf")

' Define page-specific redaction regions
' Page 1: Cover page with applicant photo
Dim page1Photo As New RectangleF(450, 600, 120, 150)
pdf.RedactRegionOnPage(0, page1Photo)

' Page 2: Personal information section
Dim page2InfoBlock As New RectangleF(50, 400, 250, 200)
pdf.RedactRegionOnPage(1, page2InfoBlock)

' Pages 3-5: Signature lines at the same position
Dim signatureLine As New RectangleF(100, 100, 200, 40)
Dim signaturePages As Integer() = {2, 3, 4}
pdf.RedactRegionOnPages(signaturePages, signatureLine)

' Page 6: Multiple regions - notary stamp and witness signature
Dim notaryStamp As New RectangleF(400, 150, 150, 150)
Dim witnessSignature As New RectangleF(100, 150, 200, 40)
pdf.RedactRegionOnPage(5, notaryStamp)
pdf.RedactRegionOnPage(5, witnessSignature)

pdf.SaveAs("multi-page-application-redacted.pdf")

$vbLabelText $csharpLabel

佈局一致的文件受益於可重複使用的區域定義：

using IronPdf;
using IronSoftware.Drawing;

public class FormRegions
{
    // Standard form regions based on common templates
    public static RectangleF HeaderLogo => new RectangleF(20, 720, 150, 60);
    public static RectangleF SignatureBlock => new RectangleF(72, 72, 200, 50);
    public static RectangleF DateField => new RectangleF(400, 72, 120, 20);
    public static RectangleF PhotoId => new RectangleF(480, 650, 100, 130);
    public static RectangleF AddressBlock => new RectangleF(72, 600, 250, 80);
}

class Program
{
    static void Main()
    {
        PdfDocument pdf = PdfDocument.FromFile("standard-form.pdf");

        // Apply standard redactions using predefined regions
        pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock);
        pdf.RedactRegionsOnAllPages(FormRegions.DateField);
        pdf.RedactRegionOnPage(0, FormRegions.PhotoId);

        pdf.SaveAs("standard-form-redacted.pdf");
    }
}

using IronPdf;
using IronSoftware.Drawing;

public class FormRegions
{
    // Standard form regions based on common templates
    public static RectangleF HeaderLogo => new RectangleF(20, 720, 150, 60);
    public static RectangleF SignatureBlock => new RectangleF(72, 72, 200, 50);
    public static RectangleF DateField => new RectangleF(400, 72, 120, 20);
    public static RectangleF PhotoId => new RectangleF(480, 650, 100, 130);
    public static RectangleF AddressBlock => new RectangleF(72, 600, 250, 80);
}

class Program
{
    static void Main()
    {
        PdfDocument pdf = PdfDocument.FromFile("standard-form.pdf");

        // Apply standard redactions using predefined regions
        pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock);
        pdf.RedactRegionsOnAllPages(FormRegions.DateField);
        pdf.RedactRegionOnPage(0, FormRegions.PhotoId);

        pdf.SaveAs("standard-form-redacted.pdf");
    }
}

Imports IronPdf
Imports IronSoftware.Drawing

Public Class FormRegions
    ' Standard form regions based on common templates
    Public Shared ReadOnly Property HeaderLogo As RectangleF
        Get
            Return New RectangleF(20, 720, 150, 60)
        End Get
    End Property

    Public Shared ReadOnly Property SignatureBlock As RectangleF
        Get
            Return New RectangleF(72, 72, 200, 50)
        End Get
    End Property

    Public Shared ReadOnly Property DateField As RectangleF
        Get
            Return New RectangleF(400, 72, 120, 20)
        End Get
    End Property

    Public Shared ReadOnly Property PhotoId As RectangleF
        Get
            Return New RectangleF(480, 650, 100, 130)
        End Get
    End Property

    Public Shared ReadOnly Property AddressBlock As RectangleF
        Get
            Return New RectangleF(72, 600, 250, 80)
        End Get
    End Property
End Class

Module Program
    Sub Main()
        Dim pdf As PdfDocument = PdfDocument.FromFile("standard-form.pdf")

        ' Apply standard redactions using predefined regions
        pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock)
        pdf.RedactRegionsOnAllPages(FormRegions.DateField)
        pdf.RedactRegionOnPage(0, FormRegions.PhotoId)

        pdf.SaveAs("standard-form-redacted.pdf")
    End Sub
End Module

$vbLabelText $csharpLabel

如何刪除可能洩漏敏感資訊的元資料？

PDF 元資料是資訊外洩中一個經常被忽視的來源。每個 PDF 檔案都包含一些可以洩露敏感資訊的屬性：作者姓名和使用者名稱、用於建立文件的軟體、建立和修改時間戳記、原始檔案名稱、修訂歷史記錄以及各種應用程式新增的自訂屬性。在對外共享文件之前，去除或清理這些元資料至關重要。有關元資料操作的全面概述，請參閱我們的元資料操作指南。

IronPDF透過 MetaData 屬性公開文件元數據，讓您可以讀取現有值、修改它們或完全刪除它們。

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/view-metadata.cs

using IronPdf;
using System;

// Load a document containing sensitive metadata
PdfDocument pdf = PdfDocument.FromFile("internal-report.pdf");

// Access current metadata properties
string author = pdf.MetaData.Author;
string title = pdf.MetaData.Title;
string subject = pdf.MetaData.Subject;
string keywords = pdf.MetaData.Keywords;
string creator = pdf.MetaData.Creator;
string producer = pdf.MetaData.Producer;
DateTime? creationDate = pdf.MetaData.CreationDate;
DateTime? modifiedDate = pdf.MetaData.ModifiedDate;

// Get all metadata keys including custom properties
var allKeys = pdf.MetaData.Keys();

Imports IronPdf
Imports System

' Load a document containing sensitive metadata
Dim pdf As PdfDocument = PdfDocument.FromFile("internal-report.pdf")

' Access current metadata properties
Dim author As String = pdf.MetaData.Author
Dim title As String = pdf.MetaData.Title
Dim subject As String = pdf.MetaData.Subject
Dim keywords As String = pdf.MetaData.Keywords
Dim creator As String = pdf.MetaData.Creator
Dim producer As String = pdf.MetaData.Producer
Dim creationDate As DateTime? = pdf.MetaData.CreationDate
Dim modifiedDate As DateTime? = pdf.MetaData.ModifiedDate

' Get all metadata keys including custom properties
Dim allKeys = pdf.MetaData.Keys()

$vbLabelText $csharpLabel

在分發前移除敏感元資料：

輸入

一份內部備忘錄，其中包含嵌入式元數據，例如作者姓名、創建時間戳和自訂屬性，這些內容可能會洩露敏感的組織資訊。

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/remove-metadata.cs

using IronPdf;
using System;

PdfDocument pdf = PdfDocument.FromFile("confidential-memo.pdf");

// Replace identifying metadata with generic values
pdf.MetaData.Author = "Organization Name";
pdf.MetaData.Creator = "Document System";
pdf.MetaData.Producer = "";
pdf.MetaData.Title = "Public Document";
pdf.MetaData.Subject = "";
pdf.MetaData.Keywords = "";

// Normalize dates to remove timing information
pdf.MetaData.CreationDate = DateTime.Now;
pdf.MetaData.ModifiedDate = DateTime.Now;

// Remove specific custom metadata keys
pdf.MetaData.RemoveMetaDataKey("OriginalFilename");
pdf.MetaData.RemoveMetaDataKey("LastSavedBy");
pdf.MetaData.RemoveMetaDataKey("Company");
pdf.MetaData.RemoveMetaDataKey("Manager");

// Remove custom properties added by applications
try
{
    pdf.MetaData.CustomProperties.Remove("SourcePath");
}
catch { }

pdf.SaveAs("confidential-memo-cleaned.pdf");

Imports IronPdf
Imports System

Dim pdf As PdfDocument = PdfDocument.FromFile("confidential-memo.pdf")

' Replace identifying metadata with generic values
pdf.MetaData.Author = "Organization Name"
pdf.MetaData.Creator = "Document System"
pdf.MetaData.Producer = ""
pdf.MetaData.Title = "Public Document"
pdf.MetaData.Subject = ""
pdf.MetaData.Keywords = ""

' Normalize dates to remove timing information
pdf.MetaData.CreationDate = DateTime.Now
pdf.MetaData.ModifiedDate = DateTime.Now

' Remove specific custom metadata keys
pdf.MetaData.RemoveMetaDataKey("OriginalFilename")
pdf.MetaData.RemoveMetaDataKey("LastSavedBy")
pdf.MetaData.RemoveMetaDataKey("Company")
pdf.MetaData.RemoveMetaDataKey("Manager")

' Remove custom properties added by applications
Try
    pdf.MetaData.CustomProperties.Remove("SourcePath")
Catch
End Try

pdf.SaveAs("confidential-memo-cleaned.pdf")

$vbLabelText $csharpLabel

此程式碼將標識元資料欄位替換為通用值，將時間戳規範化為當前日期，並刪除應用程式可能新增的自訂元資料鍵。 RemoveMetaDataKey 方法針對"OriginalFilename"和"LastSavedBy"等特定屬性，這些屬性可能會洩漏內部資訊。

範例輸出

對批量操作進行徹底的元資料清理需要係統化的方法：

using IronPdf;
using System;
using System.Collections.Generic;

public class MetadataCleaner
{
    private readonly string _defaultAuthor;
    private readonly string _defaultCreator;

    public MetadataCleaner(string organizationName)
    {
        _defaultAuthor = organizationName;
        _defaultCreator = $"{organizationName} Document System";
    }

    public void CleanMetadata(PdfDocument pdf)
    {
        // Replace standard metadata fields
        pdf.MetaData.Author = _defaultAuthor;
        pdf.MetaData.Creator = _defaultCreator;
        pdf.MetaData.Producer = "";
        pdf.MetaData.Subject = "";
        pdf.MetaData.Keywords = "";

        // Normalize timestamps
        DateTime now = DateTime.Now;
        pdf.MetaData.CreationDate = now;
        pdf.MetaData.ModifiedDate = now;

        // Get all keys and remove potentially sensitive ones
        List<string> keysToRemove = new List<string>();
        foreach (string key in pdf.MetaData.Keys())
        {
            // Keep only essential keys
            if (!IsEssentialKey(key))
            {
                keysToRemove.Add(key);
            }
        }

        foreach (string key in keysToRemove)
        {
            pdf.MetaData.RemoveMetaDataKey(key);
        }
    }

    private bool IsEssentialKey(string key)
    {
        // Keep only the basic display properties
        string[] essentialKeys = { "Title", "Author", "CreationDate", "ModifiedDate" };
        foreach (string essential in essentialKeys)
        {
            if (key.Equals(essential, StringComparison.OrdinalIgnoreCase))
            {
                return true;
            }
        }
        return false;
    }
}

// Usage
class Program
{
    static void Main()
    {
        MetadataCleaner cleaner = new MetadataCleaner("Acme Corporation");

        PdfDocument pdf = PdfDocument.FromFile("report.pdf");
        cleaner.CleanMetadata(pdf);
        pdf.SaveAs("report-clean.pdf");
    }
}

using IronPdf;
using System;
using System.Collections.Generic;

public class MetadataCleaner
{
    private readonly string _defaultAuthor;
    private readonly string _defaultCreator;

    public MetadataCleaner(string organizationName)
    {
        _defaultAuthor = organizationName;
        _defaultCreator = $"{organizationName} Document System";
    }

    public void CleanMetadata(PdfDocument pdf)
    {
        // Replace standard metadata fields
        pdf.MetaData.Author = _defaultAuthor;
        pdf.MetaData.Creator = _defaultCreator;
        pdf.MetaData.Producer = "";
        pdf.MetaData.Subject = "";
        pdf.MetaData.Keywords = "";

        // Normalize timestamps
        DateTime now = DateTime.Now;
        pdf.MetaData.CreationDate = now;
        pdf.MetaData.ModifiedDate = now;

        // Get all keys and remove potentially sensitive ones
        List<string> keysToRemove = new List<string>();
        foreach (string key in pdf.MetaData.Keys())
        {
            // Keep only essential keys
            if (!IsEssentialKey(key))
            {
                keysToRemove.Add(key);
            }
        }

        foreach (string key in keysToRemove)
        {
            pdf.MetaData.RemoveMetaDataKey(key);
        }
    }

    private bool IsEssentialKey(string key)
    {
        // Keep only the basic display properties
        string[] essentialKeys = { "Title", "Author", "CreationDate", "ModifiedDate" };
        foreach (string essential in essentialKeys)
        {
            if (key.Equals(essential, StringComparison.OrdinalIgnoreCase))
            {
                return true;
            }
        }
        return false;
    }
}

// Usage
class Program
{
    static void Main()
    {
        MetadataCleaner cleaner = new MetadataCleaner("Acme Corporation");

        PdfDocument pdf = PdfDocument.FromFile("report.pdf");
        cleaner.CleanMetadata(pdf);
        pdf.SaveAs("report-clean.pdf");
    }
}

Imports IronPdf
Imports System
Imports System.Collections.Generic

Public Class MetadataCleaner
    Private ReadOnly _defaultAuthor As String
    Private ReadOnly _defaultCreator As String

    Public Sub New(organizationName As String)
        _defaultAuthor = organizationName
        _defaultCreator = $"{organizationName} Document System"
    End Sub

    Public Sub CleanMetadata(pdf As PdfDocument)
        ' Replace standard metadata fields
        pdf.MetaData.Author = _defaultAuthor
        pdf.MetaData.Creator = _defaultCreator
        pdf.MetaData.Producer = ""
        pdf.MetaData.Subject = ""
        pdf.MetaData.Keywords = ""

        ' Normalize timestamps
        Dim now As DateTime = DateTime.Now
        pdf.MetaData.CreationDate = now
        pdf.MetaData.ModifiedDate = now

        ' Get all keys and remove potentially sensitive ones
        Dim keysToRemove As New List(Of String)()
        For Each key As String In pdf.MetaData.Keys()
            ' Keep only essential keys
            If Not IsEssentialKey(key) Then
                keysToRemove.Add(key)
            End If
        Next

        For Each key As String In keysToRemove
            pdf.MetaData.RemoveMetaDataKey(key)
        Next
    End Sub

    Private Function IsEssentialKey(key As String) As Boolean
        ' Keep only the basic display properties
        Dim essentialKeys As String() = {"Title", "Author", "CreationDate", "ModifiedDate"}
        For Each essential As String In essentialKeys
            If key.Equals(essential, StringComparison.OrdinalIgnoreCase) Then
                Return True
            End If
        Next
        Return False
    End Function
End Class

' Usage
Class Program
    Shared Sub Main()
        Dim cleaner As New MetadataCleaner("Acme Corporation")

        Dim pdf As PdfDocument = PdfDocument.FromFile("report.pdf")
        cleaner.CleanMetadata(pdf)
        pdf.SaveAs("report-clean.pdf")
    End Sub
End Class

$vbLabelText $csharpLabel

如何清理 PDF 文件以移除嵌入的腳本和隱藏的威脅？

PDF 清理可以解決除可見內容和元資料之外的安全性問題。 PDF 檔案可能包含JavaScript程式碼、嵌入式執行檔、觸發外部連線的表單操作以及其他潛在的惡意元素。這些功能的存在有其合法用途，例如互動式表單和多媒體內容，但它們也造成了攻擊途徑。對 PDF 檔案進行清潔可以移除這些活動元素，同時保留視覺內容。有關消毒方法的更多詳細信息，請參閱我們的消毒 PDF 操作指南。

IronPDF 的 Cleaner 類別透過優雅的方法處理清理：將 PDF 轉換為影像格式，然後再轉換回來。過程會移除JavaScript、嵌入式物件、表單操作和註釋，同時保持視覺外觀不變。該庫提供了兩種具有不同特點的清理方法。

輸入

從外部來源接收的 PDF 文件可能包含JavaScript、嵌入式物件或其他潛在的惡意活動內容。

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/sanitize-pdf.cs

using IronPdf;

// Load a PDF that may contain active content
PdfDocument pdf = PdfDocument.FromFile("received-document.pdf");

// Sanitize using SVG conversion
// Faster processing, results in searchable text, slight layout variations possible
PdfDocument sanitizedSvg = Cleaner.SanitizeWithSvg(pdf);
sanitizedSvg.SaveAs("sanitized-svg.pdf");

// Sanitize using Bitmap conversion
// Slower processing, text becomes image (not searchable), exact visual reproduction
PdfDocument sanitizedBitmap = Cleaner.SanitizeWithBitmap(pdf);
sanitizedBitmap.SaveAs("sanitized-bitmap.pdf");

Imports IronPdf

' Load a PDF that may contain active content
Dim pdf As PdfDocument = PdfDocument.FromFile("received-document.pdf")

' Sanitize using SVG conversion
' Faster processing, results in searchable text, slight layout variations possible
Dim sanitizedSvg As PdfDocument = Cleaner.SanitizeWithSvg(pdf)
sanitizedSvg.SaveAs("sanitized-svg.pdf")

' Sanitize using Bitmap conversion
' Slower processing, text becomes image (not searchable), exact visual reproduction
Dim sanitizedBitmap As PdfDocument = Cleaner.SanitizeWithBitmap(pdf)
sanitizedBitmap.SaveAs("sanitized-bitmap.pdf")

$vbLabelText $csharpLabel

此程式碼示範了 IronPDF 的 Cleaner 類別提供的兩種清理方法。 SanitizeWithSvg 透過 SVG 中間格式轉換 PDF，保留可搜尋文本，同時刪除活動內容。 SanitizeWithBitmap 首先將頁面轉換為圖像，產生精確的視覺副本，但文字渲染為不可搜尋的圖形。

範例輸出

SVG 方法速度更快，並且能將文字保留為可搜尋的內容，因此適用於需要保持索引或可存取性的文件。點陣圖方法會產生精確的視覺副本，但會將文字轉換為圖像，從而阻止文字選擇和搜尋。根據您對輸出文件的要求進行選擇。

您也可以在清理過程中套用渲染選項來調整輸出：

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/sanitize-with-options.cs

using IronPdf;

// Load the potentially unsafe document
PdfDocument pdf = PdfDocument.FromFile("untrusted-source.pdf");

// Configure rendering options for sanitization
var renderOptions = new ChromePdfRenderOptions
{
    MarginTop = 10,
    MarginBottom = 10,
    MarginLeft = 10,
    MarginRight = 10
};

// Sanitize with custom options
PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf, renderOptions);
sanitized.SaveAs("untrusted-source-safe.pdf");

Imports IronPdf

' Load the potentially unsafe document
Dim pdf As PdfDocument = PdfDocument.FromFile("untrusted-source.pdf")

' Configure rendering options for sanitization
Dim renderOptions As New ChromePdfRenderOptions With {
    .MarginTop = 10,
    .MarginBottom = 10,
    .MarginLeft = 10,
    .MarginRight = 10
}

' Sanitize with custom options
Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf, renderOptions)
sanitized.SaveAs("untrusted-source-safe.pdf")

$vbLabelText $csharpLabel

高安全等級的環境通常需要將消毒與其他防護措施結合：

using IronPdf;
using System;

public class SecureDocumentProcessor
{
    public PdfDocument ProcessUntrustedDocument(string inputPath)
    {
        // Load the document
        PdfDocument original = PdfDocument.FromFile(inputPath);

        // Step 1: Sanitize to remove active content
        PdfDocument sanitized = Cleaner.SanitizeWithSvg(original);

        // Step 2: Clean metadata
        sanitized.MetaData.Author = "Processed Document";
        sanitized.MetaData.Creator = "Secure Processor";
        sanitized.MetaData.Producer = "";
        sanitized.MetaData.CreationDate = DateTime.Now;
        sanitized.MetaData.ModifiedDate = DateTime.Now;

        // Remove all custom metadata
        foreach (string key in sanitized.MetaData.Keys())
        {
            if (key != "Title" && key != "Author" && key != "CreationDate" && key != "ModifiedDate")
            {
                sanitized.MetaData.RemoveMetaDataKey(key);
            }
        }

        return sanitized;
    }
}

// Usage
class Program
{
    static void Main()
    {
        SecureDocumentProcessor processor = new SecureDocumentProcessor();
        PdfDocument safe = processor.ProcessUntrustedDocument("email-attachment.pdf");
        safe.SaveAs("email-attachment-safe.pdf");
    }
}

using IronPdf;
using System;

public class SecureDocumentProcessor
{
    public PdfDocument ProcessUntrustedDocument(string inputPath)
    {
        // Load the document
        PdfDocument original = PdfDocument.FromFile(inputPath);

        // Step 1: Sanitize to remove active content
        PdfDocument sanitized = Cleaner.SanitizeWithSvg(original);

        // Step 2: Clean metadata
        sanitized.MetaData.Author = "Processed Document";
        sanitized.MetaData.Creator = "Secure Processor";
        sanitized.MetaData.Producer = "";
        sanitized.MetaData.CreationDate = DateTime.Now;
        sanitized.MetaData.ModifiedDate = DateTime.Now;

        // Remove all custom metadata
        foreach (string key in sanitized.MetaData.Keys())
        {
            if (key != "Title" && key != "Author" && key != "CreationDate" && key != "ModifiedDate")
            {
                sanitized.MetaData.RemoveMetaDataKey(key);
            }
        }

        return sanitized;
    }
}

// Usage
class Program
{
    static void Main()
    {
        SecureDocumentProcessor processor = new SecureDocumentProcessor();
        PdfDocument safe = processor.ProcessUntrustedDocument("email-attachment.pdf");
        safe.SaveAs("email-attachment-safe.pdf");
    }
}

Imports IronPdf
Imports System

Public Class SecureDocumentProcessor
    Public Function ProcessUntrustedDocument(inputPath As String) As PdfDocument
        ' Load the document
        Dim original As PdfDocument = PdfDocument.FromFile(inputPath)

        ' Step 1: Sanitize to remove active content
        Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(original)

        ' Step 2: Clean metadata
        sanitized.MetaData.Author = "Processed Document"
        sanitized.MetaData.Creator = "Secure Processor"
        sanitized.MetaData.Producer = ""
        sanitized.MetaData.CreationDate = DateTime.Now
        sanitized.MetaData.ModifiedDate = DateTime.Now

        ' Remove all custom metadata
        For Each key As String In sanitized.MetaData.Keys()
            If key <> "Title" AndAlso key <> "Author" AndAlso key <> "CreationDate" AndAlso key <> "ModifiedDate" Then
                sanitized.MetaData.RemoveMetaDataKey(key)
            End If
        Next

        Return sanitized
    End Function
End Class

' Usage
Module Program
    Sub Main()
        Dim processor As New SecureDocumentProcessor()
        Dim safe As PdfDocument = processor.ProcessUntrustedDocument("email-attachment.pdf")
        safe.SaveAs("email-attachment-safe.pdf")
    End Sub
End Module

$vbLabelText $csharpLabel

如何掃描PDF檔案以查找安全漏洞？

在處理或清理文件之前，您可能需要評估其中可能包含的潛在威脅。 IronPDF 的 Cleaner.ScanPdf 方法使用 YARA 規則檢查文檔，YARA 規則是惡意軟體分析和威脅檢測中常用的模式定義。掃描結果可辨識與惡意PDF檔案相關的特徵。

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/scan-vulnerabilities.cs

using IronPdf;

// Load the document to scan
PdfDocument pdf = PdfDocument.FromFile("suspicious-document.pdf");

// Scan using default YARA rules
CleanerScanResult scanResult = Cleaner.ScanPdf(pdf);

// Check the scan results
bool threatsDetected = scanResult.IsDetected;
int riskCount = scanResult.Risks.Count;

// Process identified risks
if (scanResult.IsDetected)
{
    foreach (var risk in scanResult.Risks)
    {
        // Handle each identified risk
    }

    // Sanitize the document before use
    PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf);
    sanitized.SaveAs("suspicious-document-safe.pdf");
}

Imports IronPdf

' Load the document to scan
Dim pdf As PdfDocument = PdfDocument.FromFile("suspicious-document.pdf")

' Scan using default YARA rules
Dim scanResult As CleanerScanResult = Cleaner.ScanPdf(pdf)

' Check the scan results
Dim threatsDetected As Boolean = scanResult.IsDetected
Dim riskCount As Integer = scanResult.Risks.Count

' Process identified risks
If scanResult.IsDetected Then
    For Each risk In scanResult.Risks
        ' Handle each identified risk
    Next

    ' Sanitize the document before use
    Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf)
    sanitized.SaveAs("suspicious-document-safe.pdf")
End If

$vbLabelText $csharpLabel

您可以提供自訂的 YARA 規則檔案以滿足特殊的檢測需求。具有特定威脅模型或合規性需求的組織通常會維護自己的規則集，以針對特定的漏洞模式。

:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/scan-custom-yara.cs

using IronPdf;

PdfDocument pdf = PdfDocument.FromFile("incoming-document.pdf");

// Scan with custom YARA rules
string[] customYaraFiles = { "corporate-rules.yar", "industry-specific.yar" };
CleanerScanResult result = Cleaner.ScanPdf(pdf, customYaraFiles);

if (result.IsDetected)
{
    // Document triggered custom rules and requires review or sanitization
    PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf);
    sanitized.SaveAs("incoming-document-safe.pdf");
}

Imports IronPdf

Dim pdf As PdfDocument = PdfDocument.FromFile("incoming-document.pdf")

' Scan with custom YARA rules
Dim customYaraFiles As String() = {"corporate-rules.yar", "industry-specific.yar"}
Dim result As CleanerScanResult = Cleaner.ScanPdf(pdf, customYaraFiles)

If result.IsDetected Then
    ' Document triggered custom rules and requires review or sanitization
    Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf)
    sanitized.SaveAs("incoming-document-safe.pdf")
End If

$vbLabelText $csharpLabel

將掃描功能整合到文件接收工作流程中，有助於實現安全決策的自動化：

using IronPdf;
using System;
using System.IO;

public enum DocumentSafetyLevel
{
    Safe,
    Suspicious,
    Dangerous
}

public class DocumentSecurityGateway
{
    public DocumentSafetyLevel EvaluateDocument(string filePath)
    {
        PdfDocument pdf = PdfDocument.FromFile(filePath);
        CleanerScanResult scan = Cleaner.ScanPdf(pdf);

        if (!scan.IsDetected)
        {
            return DocumentSafetyLevel.Safe;
        }

        // Evaluate severity based on number of risks
        if (scan.Risks.Count > 5)
        {
            return DocumentSafetyLevel.Dangerous;
        }

        return DocumentSafetyLevel.Suspicious;
    }

    public PdfDocument ProcessIncomingDocument(string filePath, string outputDirectory)
    {
        DocumentSafetyLevel safety = EvaluateDocument(filePath);
        string fileName = Path.GetFileName(filePath);

        switch (safety)
        {
            case DocumentSafetyLevel.Safe:
                return PdfDocument.FromFile(filePath);

            case DocumentSafetyLevel.Suspicious:
                PdfDocument suspicious = PdfDocument.FromFile(filePath);
                return Cleaner.SanitizeWithSvg(suspicious);

            case DocumentSafetyLevel.Dangerous:
                throw new SecurityException($"Document {fileName} contains dangerous content");

            default:
                throw new InvalidOperationException("Unknown safety level");
        }
    }
}

using IronPdf;
using System;
using System.IO;

public enum DocumentSafetyLevel
{
    Safe,
    Suspicious,
    Dangerous
}

public class DocumentSecurityGateway
{
    public DocumentSafetyLevel EvaluateDocument(string filePath)
    {
        PdfDocument pdf = PdfDocument.FromFile(filePath);
        CleanerScanResult scan = Cleaner.ScanPdf(pdf);

        if (!scan.IsDetected)
        {
            return DocumentSafetyLevel.Safe;
        }

        // Evaluate severity based on number of risks
        if (scan.Risks.Count > 5)
        {
            return DocumentSafetyLevel.Dangerous;
        }

        return DocumentSafetyLevel.Suspicious;
    }

    public PdfDocument ProcessIncomingDocument(string filePath, string outputDirectory)
    {
        DocumentSafetyLevel safety = EvaluateDocument(filePath);
        string fileName = Path.GetFileName(filePath);

        switch (safety)
        {
            case DocumentSafetyLevel.Safe:
                return PdfDocument.FromFile(filePath);

            case DocumentSafetyLevel.Suspicious:
                PdfDocument suspicious = PdfDocument.FromFile(filePath);
                return Cleaner.SanitizeWithSvg(suspicious);

            case DocumentSafetyLevel.Dangerous:
                throw new SecurityException($"Document {fileName} contains dangerous content");

            default:
                throw new InvalidOperationException("Unknown safety level");
        }
    }
}

Imports IronPdf
Imports System
Imports System.IO

Public Enum DocumentSafetyLevel
    Safe
    Suspicious
    Dangerous
End Enum

Public Class DocumentSecurityGateway
    Public Function EvaluateDocument(filePath As String) As DocumentSafetyLevel
        Dim pdf As PdfDocument = PdfDocument.FromFile(filePath)
        Dim scan As CleanerScanResult = Cleaner.ScanPdf(pdf)

        If Not scan.IsDetected Then
            Return DocumentSafetyLevel.Safe
        End If

        ' Evaluate severity based on number of risks
        If scan.Risks.Count > 5 Then
            Return DocumentSafetyLevel.Dangerous
        End If

        Return DocumentSafetyLevel.Suspicious
    End Function

    Public Function ProcessIncomingDocument(filePath As String, outputDirectory As String) As PdfDocument
        Dim safety As DocumentSafetyLevel = EvaluateDocument(filePath)
        Dim fileName As String = Path.GetFileName(filePath)

        Select Case safety
            Case DocumentSafetyLevel.Safe
                Return PdfDocument.FromFile(filePath)

            Case DocumentSafetyLevel.Suspicious
                Dim suspicious As PdfDocument = PdfDocument.FromFile(filePath)
                Return Cleaner.SanitizeWithSvg(suspicious)

            Case DocumentSafetyLevel.Dangerous
                Throw New SecurityException($"Document {fileName} contains dangerous content")

            Case Else
                Throw New InvalidOperationException("Unknown safety level")
        End Select
    End Function
End Class

$vbLabelText $csharpLabel

如何建構完整的脫敏和淨化流程？

生產文件處理通常需要將多種保護技術結合起來，形成一個連貫的工作流程。完整的流程可能包括：掃描傳入文件是否有威脅、清理通過初步篩選的文件、應用文字和區域編輯、移除元數據，以及產生記錄所有操作的稽核日誌。這個例子展示了這種綜合方法。

using IronPdf;
using IronSoftware.Drawing;
using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;

public class DocumentProcessingResult
{
    public string OriginalFile { get; set; }
    public string OutputFile { get; set; }
    public bool WasSanitized { get; set; }
    public int TextRedactionsApplied { get; set; }
    public int RegionRedactionsApplied { get; set; }
    public bool MetadataCleaned { get; set; }
    public List<string> SensitiveDataTypesFound { get; set; } = new List<string>();
    public DateTime ProcessedAt { get; set; }
    public bool Success { get; set; }
    public string ErrorMessage { get; set; }
}

public class ComprehensiveDocumentProcessor
{
    // Sensitive data patterns
    private readonly Dictionary<string, string> _sensitivePatterns = new Dictionary<string, string>
    {
        { "SSN", @"\b\d{3}-\d{2}-\d{4}\b" },
        { "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
        { "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
        { "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }
    };

    // Standard regions to redact (signature areas, photo locations)
    private readonly List<RectangleF> _standardRedactionRegions = new List<RectangleF>
    {
        new RectangleF(72, 72, 200, 50),    // Bottom left signature
        new RectangleF(350, 72, 200, 50)    // Bottom right signature
    };

    private readonly string _organizationName;

    public ComprehensiveDocumentProcessor(string organizationName)
    {
        _organizationName = organizationName;
    }

    public DocumentProcessingResult ProcessDocument(
        string inputPath,
        string outputPath,
        bool sanitize = true,
        bool redactPatterns = true,
        bool redactRegions = true,
        bool cleanMetadata = true,
        List<string> additionalTermsToRedact = null)
    {
        var result = new DocumentProcessingResult
        {
            OriginalFile = inputPath,
            OutputFile = outputPath,
            ProcessedAt = DateTime.Now
        };

        try
        {
            // Load the document
            PdfDocument pdf = PdfDocument.FromFile(inputPath);

            // Step 1: Security scan
            CleanerScanResult scanResult = Cleaner.ScanPdf(pdf);

            if (scanResult.IsDetected && scanResult.Risks.Count > 10)
            {
                throw new SecurityException("Document contains too many security risks to process");
            }

            // Step 2: Sanitization (if needed or requested)
            if (sanitize || scanResult.IsDetected)
            {
                pdf = Cleaner.SanitizeWithSvg(pdf);
                result.WasSanitized = true;
            }

            // Step 3: Pattern-based text redaction
            if (redactPatterns)
            {
                string fullText = pdf.ExtractAllText();
                HashSet<string> valuesToRedact = new HashSet<string>();

                foreach (var pattern in _sensitivePatterns)
                {
                    Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
                    MatchCollection matches = regex.Matches(fullText);

                    if (matches.Count > 0)
                    {
                        result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})");
                        foreach (Match match in matches)
                        {
                            valuesToRedact.Add(match.Value);
                        }
                    }
                }

                // Apply redactions
                foreach (string value in valuesToRedact)
                {
                    pdf.RedactTextOnAllPages(value);
                    result.TextRedactionsApplied++;
                }
            }

            // Step 4: Additional specific terms
            if (additionalTermsToRedact != null)
            {
                foreach (string term in additionalTermsToRedact)
                {
                    pdf.RedactTextOnAllPages(term);
                    result.TextRedactionsApplied++;
                }
            }

            // Step 5: Region-based redaction
            if (redactRegions)
            {
                foreach (RectangleF region in _standardRedactionRegions)
                {
                    pdf.RedactRegionsOnAllPages(region);
                    result.RegionRedactionsApplied++;
                }
            }

            // Step 6: Metadata cleaning
            if (cleanMetadata)
            {
                pdf.MetaData.Author = _organizationName;
                pdf.MetaData.Creator = $"{_organizationName} Document Processor";
                pdf.MetaData.Producer = "";
                pdf.MetaData.Subject = "";
                pdf.MetaData.Keywords = "";
                pdf.MetaData.CreationDate = DateTime.Now;
                pdf.MetaData.ModifiedDate = DateTime.Now;
                result.MetadataCleaned = true;
            }

            // Step 7: Save the processed document
            pdf.SaveAs(outputPath);
            result.Success = true;
        }
        catch (Exception ex)
        {
            result.Success = false;
            result.ErrorMessage = ex.Message;
        }

        return result;
    }
}

// Usage example
class Program
{
    static void Main()
    {
        var processor = new ComprehensiveDocumentProcessor("Acme Corporation");

        // Process a single document with all protections
        var result = processor.ProcessDocument(
            inputPath: "customer-application.pdf",
            outputPath: "customer-application-redacted.pdf",
            sanitize: true,
            redactPatterns: true,
            redactRegions: true,
            cleanMetadata: true,
            additionalTermsToRedact: new List<string> { "Project Alpha", "Internal Use Only" }
        );

        // Batch process multiple documents
        string[] inputFiles = Directory.GetFiles("incoming", "*.pdf");
        foreach (string file in inputFiles)
        {
            string outputFile = Path.Combine("processed", Path.GetFileName(file));
            processor.ProcessDocument(file, outputFile);
        }
    }
}

using IronPdf;
using IronSoftware.Drawing;
using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;

public class DocumentProcessingResult
{
    public string OriginalFile { get; set; }
    public string OutputFile { get; set; }
    public bool WasSanitized { get; set; }
    public int TextRedactionsApplied { get; set; }
    public int RegionRedactionsApplied { get; set; }
    public bool MetadataCleaned { get; set; }
    public List<string> SensitiveDataTypesFound { get; set; } = new List<string>();
    public DateTime ProcessedAt { get; set; }
    public bool Success { get; set; }
    public string ErrorMessage { get; set; }
}

public class ComprehensiveDocumentProcessor
{
    // Sensitive data patterns
    private readonly Dictionary<string, string> _sensitivePatterns = new Dictionary<string, string>
    {
        { "SSN", @"\b\d{3}-\d{2}-\d{4}\b" },
        { "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
        { "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
        { "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }
    };

    // Standard regions to redact (signature areas, photo locations)
    private readonly List<RectangleF> _standardRedactionRegions = new List<RectangleF>
    {
        new RectangleF(72, 72, 200, 50),    // Bottom left signature
        new RectangleF(350, 72, 200, 50)    // Bottom right signature
    };

    private readonly string _organizationName;

    public ComprehensiveDocumentProcessor(string organizationName)
    {
        _organizationName = organizationName;
    }

    public DocumentProcessingResult ProcessDocument(
        string inputPath,
        string outputPath,
        bool sanitize = true,
        bool redactPatterns = true,
        bool redactRegions = true,
        bool cleanMetadata = true,
        List<string> additionalTermsToRedact = null)
    {
        var result = new DocumentProcessingResult
        {
            OriginalFile = inputPath,
            OutputFile = outputPath,
            ProcessedAt = DateTime.Now
        };

        try
        {
            // Load the document
            PdfDocument pdf = PdfDocument.FromFile(inputPath);

            // Step 1: Security scan
            CleanerScanResult scanResult = Cleaner.ScanPdf(pdf);

            if (scanResult.IsDetected && scanResult.Risks.Count > 10)
            {
                throw new SecurityException("Document contains too many security risks to process");
            }

            // Step 2: Sanitization (if needed or requested)
            if (sanitize || scanResult.IsDetected)
            {
                pdf = Cleaner.SanitizeWithSvg(pdf);
                result.WasSanitized = true;
            }

            // Step 3: Pattern-based text redaction
            if (redactPatterns)
            {
                string fullText = pdf.ExtractAllText();
                HashSet<string> valuesToRedact = new HashSet<string>();

                foreach (var pattern in _sensitivePatterns)
                {
                    Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
                    MatchCollection matches = regex.Matches(fullText);

                    if (matches.Count > 0)
                    {
                        result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})");
                        foreach (Match match in matches)
                        {
                            valuesToRedact.Add(match.Value);
                        }
                    }
                }

                // Apply redactions
                foreach (string value in valuesToRedact)
                {
                    pdf.RedactTextOnAllPages(value);
                    result.TextRedactionsApplied++;
                }
            }

            // Step 4: Additional specific terms
            if (additionalTermsToRedact != null)
            {
                foreach (string term in additionalTermsToRedact)
                {
                    pdf.RedactTextOnAllPages(term);
                    result.TextRedactionsApplied++;
                }
            }

            // Step 5: Region-based redaction
            if (redactRegions)
            {
                foreach (RectangleF region in _standardRedactionRegions)
                {
                    pdf.RedactRegionsOnAllPages(region);
                    result.RegionRedactionsApplied++;
                }
            }

            // Step 6: Metadata cleaning
            if (cleanMetadata)
            {
                pdf.MetaData.Author = _organizationName;
                pdf.MetaData.Creator = $"{_organizationName} Document Processor";
                pdf.MetaData.Producer = "";
                pdf.MetaData.Subject = "";
                pdf.MetaData.Keywords = "";
                pdf.MetaData.CreationDate = DateTime.Now;
                pdf.MetaData.ModifiedDate = DateTime.Now;
                result.MetadataCleaned = true;
            }

            // Step 7: Save the processed document
            pdf.SaveAs(outputPath);
            result.Success = true;
        }
        catch (Exception ex)
        {
            result.Success = false;
            result.ErrorMessage = ex.Message;
        }

        return result;
    }
}

// Usage example
class Program
{
    static void Main()
    {
        var processor = new ComprehensiveDocumentProcessor("Acme Corporation");

        // Process a single document with all protections
        var result = processor.ProcessDocument(
            inputPath: "customer-application.pdf",
            outputPath: "customer-application-redacted.pdf",
            sanitize: true,
            redactPatterns: true,
            redactRegions: true,
            cleanMetadata: true,
            additionalTermsToRedact: new List<string> { "Project Alpha", "Internal Use Only" }
        );

        // Batch process multiple documents
        string[] inputFiles = Directory.GetFiles("incoming", "*.pdf");
        foreach (string file in inputFiles)
        {
            string outputFile = Path.Combine("processed", Path.GetFileName(file));
            processor.ProcessDocument(file, outputFile);
        }
    }
}

Imports IronPdf
Imports IronSoftware.Drawing
Imports System
Imports System.Collections.Generic
Imports System.IO
Imports System.Text.RegularExpressions

Public Class DocumentProcessingResult
    Public Property OriginalFile As String
    Public Property OutputFile As String
    Public Property WasSanitized As Boolean
    Public Property TextRedactionsApplied As Integer
    Public Property RegionRedactionsApplied As Integer
    Public Property MetadataCleaned As Boolean
    Public Property SensitiveDataTypesFound As List(Of String) = New List(Of String)()
    Public Property ProcessedAt As DateTime
    Public Property Success As Boolean
    Public Property ErrorMessage As String
End Class

Public Class ComprehensiveDocumentProcessor
    ' Sensitive data patterns
    Private ReadOnly _sensitivePatterns As Dictionary(Of String, String) = New Dictionary(Of String, String) From {
        {"SSN", "\b\d{3}-\d{2}-\d{4}\b"},
        {"Credit Card", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"},
        {"Email", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"},
        {"Phone", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"}
    }

    ' Standard regions to redact (signature areas, photo locations)
    Private ReadOnly _standardRedactionRegions As List(Of RectangleF) = New List(Of RectangleF) From {
        New RectangleF(72, 72, 200, 50),    ' Bottom left signature
        New RectangleF(350, 72, 200, 50)    ' Bottom right signature
    }

    Private ReadOnly _organizationName As String

    Public Sub New(organizationName As String)
        _organizationName = organizationName
    End Sub

    Public Function ProcessDocument(
        inputPath As String,
        outputPath As String,
        Optional sanitize As Boolean = True,
        Optional redactPatterns As Boolean = True,
        Optional redactRegions As Boolean = True,
        Optional cleanMetadata As Boolean = True,
        Optional additionalTermsToRedact As List(Of String) = Nothing) As DocumentProcessingResult

        Dim result As New DocumentProcessingResult With {
            .OriginalFile = inputPath,
            .OutputFile = outputPath,
            .ProcessedAt = DateTime.Now
        }

        Try
            ' Load the document
            Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath)

            ' Step 1: Security scan
            Dim scanResult As CleanerScanResult = Cleaner.ScanPdf(pdf)

            If scanResult.IsDetected AndAlso scanResult.Risks.Count > 10 Then
                Throw New SecurityException("Document contains too many security risks to process")
            End If

            ' Step 2: Sanitization (if needed or requested)
            If sanitize OrElse scanResult.IsDetected Then
                pdf = Cleaner.SanitizeWithSvg(pdf)
                result.WasSanitized = True
            End If

            ' Step 3: Pattern-based text redaction
            If redactPatterns Then
                Dim fullText As String = pdf.ExtractAllText()
                Dim valuesToRedact As New HashSet(Of String)()

                For Each pattern In _sensitivePatterns
                    Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase)
                    Dim matches As MatchCollection = regex.Matches(fullText)

                    If matches.Count > 0 Then
                        result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})")
                        For Each match As Match In matches
                            valuesToRedact.Add(match.Value)
                        Next
                    End If
                Next

                ' Apply redactions
                For Each value As String In valuesToRedact
                    pdf.RedactTextOnAllPages(value)
                    result.TextRedactionsApplied += 1
                Next
            End If

            ' Step 4: Additional specific terms
            If additionalTermsToRedact IsNot Nothing Then
                For Each term As String In additionalTermsToRedact
                    pdf.RedactTextOnAllPages(term)
                    result.TextRedactionsApplied += 1
                Next
            End If

            ' Step 5: Region-based redaction
            If redactRegions Then
                For Each region As RectangleF In _standardRedactionRegions
                    pdf.RedactRegionsOnAllPages(region)
                    result.RegionRedactionsApplied += 1
                Next
            End If

            ' Step 6: Metadata cleaning
            If cleanMetadata Then
                pdf.MetaData.Author = _organizationName
                pdf.MetaData.Creator = $"{_organizationName} Document Processor"
                pdf.MetaData.Producer = ""
                pdf.MetaData.Subject = ""
                pdf.MetaData.Keywords = ""
                pdf.MetaData.CreationDate = DateTime.Now
                pdf.MetaData.ModifiedDate = DateTime.Now
                result.MetadataCleaned = True
            End If

            ' Step 7: Save the processed document
            pdf.SaveAs(outputPath)
            result.Success = True
        Catch ex As Exception
            result.Success = False
            result.ErrorMessage = ex.Message
        End Try

        Return result
    End Function
End Class

' Usage example
Class Program
    Shared Sub Main()
        Dim processor As New ComprehensiveDocumentProcessor("Acme Corporation")

        ' Process a single document with all protections
        Dim result = processor.ProcessDocument(
            inputPath:="customer-application.pdf",
            outputPath:="customer-application-redacted.pdf",
            sanitize:=True,
            redactPatterns:=True,
            redactRegions:=True,
            cleanMetadata:=True,
            additionalTermsToRedact:=New List(Of String) From {"Project Alpha", "Internal Use Only"}
        )

        ' Batch process multiple documents
        Dim inputFiles As String() = Directory.GetFiles("incoming", "*.pdf")
        For Each file As String In inputFiles
            Dim outputFile As String = Path.Combine("processed", Path.GetFileName(file))
            processor.ProcessDocument(file, outputFile)
        Next
    End Sub
End Class

$vbLabelText $csharpLabel

輸入

客戶申請表包含多種類型的敏感數據，包括社會安全號碼、信用卡號碼、電子郵件地址和簽名欄，需要全面保護。

範例輸出

這個綜合處理器將本指南中介紹的所有技術整合到一個可設定的類別中。它可以掃描威脅，在必要時進行清理，查找並編輯敏感模式，應用區域編輯，清理元數據，並產生詳細報告。您可以根據具體要求調整敏感度模式、編輯區域和處理選項。

後續步驟

保護PDF文件中的敏感資訊需要採取的不僅僅是表面上的措施。真正的編輯會從文件結構中永久刪除內容。模式匹配可自動發現並刪除社會安全號碼、信用卡詳細資訊和電子郵件地址等資料。基於區域的編輯可以處理簽名、照片和其他文字匹配無法處理的圖形元素。元資料清理可以消除可能洩漏作者、時間戳記或內部文件路徑的隱藏資訊。清理操作會移除有安全風險的嵌入式腳本和活動內容。

IronPDF透過一致、精心設計的 API 提供所有這些功能，可與 C# 和.NET開發實務自然整合。本指南中演示的方法既可以處理單一文檔，也可以擴展到批量處理數千個文件。無論您是為醫療保健資料建立合規工作流程、準備用於取證的法律文件，還是僅僅確保內部報告可以安全地與外部共享，這些技術都構成了負責任的文件處理的基礎。為了實現全面的安全防護，請將密文功能與密碼保護、權限控制和數位簽章結合。

準備開始建造了嗎？下載IronPDF並免費試用。該庫包含一個免費的開發許可證，因此您可以在購買生產許可證之前充分評估其編輯、文字擷取和清理功能。如果您對實施或合規工作流程有任何疑問，請聯絡我們的工程支援團隊。