PDF Redaction in C#: Remove Sensitive Data and Sanitize Documents with IronPDF
When organizations share PDF documents, they often overlook the hidden dangers lurking within those files. A contract sent to a client might contain metadata revealing the original author's name, creation timestamps, or even tracked changes from previous drafts. A medical report shared between departments could expose patient identifiers that should have been removed. Legal documents prepared for discovery might still contain privileged information buried in form fields or annotations. These oversights can lead to compliance violations, privacy breaches, and significant legal liability.
True PDF redaction goes far beyond drawing black rectangles over text. It requires permanently removing sensitive content from the document structure so that no amount of copying, searching, or metadata extraction can recover the original information. This distinction matters enormously in regulated industries where HIPAA, GDPR, PCI DSS, and other frameworks mandate specific data protection measures. A visual overlay might look secure on screen, but anyone with basic PDF tools can select the text beneath that black box and paste it elsewhere.
This guide provides a complete walkthrough of PDF data protection using C# and IronPDF. You will learn how to redact text across entire documents or specific pages, use pattern matching to find and remove data like Social Security numbers and credit card details, redact defined regions for signatures and images, strip metadata that could leak confidential information, sanitize documents to eliminate embedded scripts and hidden threats, and scan files for vulnerabilities before processing. The code examples are production ready and demonstrate patterns you can adapt for your own document workflows.
Quickstart: Redact Sensitive Text from a PDF
Remove confidential information from PDF documents with just a few lines of code. This example loads an existing PDF, redacts all instances of a specified phrase throughout every page, and saves the cleaned document. The redaction permanently removes the text from the PDF structure rather than simply covering it visually.
Input
A sample confidential report PDF containing the word "CONFIDENTIAL" that needs to be permanently removed.
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/quickstart-redact-text.csusing IronPdf;
// Load the PDF document
PdfDocument pdf = PdfDocument.FromFile("confidential-report.pdf");
// Redact all instances of sensitive text from every page
pdf.RedactTextOnAllPages("CONFIDENTIAL");
// Save the redacted document
pdf.SaveAs("redacted-report.pdf");
Imports IronPdf
' Load the PDF document
Dim pdf As PdfDocument = PdfDocument.FromFile("confidential-report.pdf")
' Redact all instances of sensitive text from every page
pdf.RedactTextOnAllPages("CONFIDENTIAL")
' Save the redacted document
pdf.SaveAs("redacted-report.pdf")This code loads an existing PDF, uses RedactTextOnAllPages to find and permanently remove all occurrences of "CONFIDENTIAL" throughout the document, then saves the cleaned version. The redaction replaces the text with black rectangles and removes the underlying character data from the PDF structure.
Sample Output
Table of Contents
- Redact Text from PDF Documents
- Pattern Matching and Automated Redaction
- Region Based Redaction
- Remove Sensitive Data from PDF Metadata
- PDF Sanitization in .NET
- Complete Workflows
- Next Steps
What is the Difference Between True Redaction and Visual Overlay?
Understanding the distinction between true redaction and visual overlay is critical for anyone handling sensitive documents. Many tools and manual methods create the appearance of redaction without actually removing the underlying data. This false sense of security has caused numerous high profile data breaches and compliance failures.
Visual overlay approaches typically draw opaque shapes over sensitive content. The text remains fully intact within the PDF structure. Someone viewing the document sees a black rectangle, but the original characters still exist in the file's content streams. Selecting all text on the page, using accessibility tools, or examining the raw PDF data will reveal everything that was supposedly hidden. Court cases have been compromised when redacted filings were trivially unredacted by opposing counsel. Government agencies have accidentally released classified information that appeared censored but was completely recoverable.
True redaction works differently. When you use IronPDF's redaction methods, the library locates the specified text within the PDF's internal structure and removes it entirely. The character data is deleted from content streams. The visual representation is replaced with a redaction mark, typically a black rectangle, but the original content no longer exists anywhere in the file. No amount of selection, copying, or forensic analysis can recover what has been permanently erased.
IronPDF implements true redaction by modifying the PDF at a structural level. The RedactTextOnAllPages method and its variants search through page content, identify matching text, remove it from the document object model, and optionally draw a visual indicator where the content used to appear. This approach aligns with guidelines from organizations like NIST for secure document redaction.
The practical implications are significant. If you need to share documents externally, submit files for legal discovery, publish records under freedom of information requests, or distribute reports while protecting personally identifiable information, only true redaction provides adequate protection. Visual overlays might suffice for internal drafts where you simply want to draw attention away from certain sections, but they should never be trusted for actual data protection. For additional document security measures, see our guides on encrypting PDFs and digital signatures.
How do I Redact PDF Text in C# Across an Entire Document?
The most common redaction scenario involves removing all instances of specific text throughout a document. Perhaps you need to eliminate a person's name from a report, remove account numbers from financial statements, or strip internal reference codes before external distribution. IronPDF makes this straightforward with the RedactTextOnAllPages method.
Input
An employee records document containing personal information including names, Social Security numbers, and employee IDs.
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-text-all-pages.csusing IronPdf;
// Load the source document
PdfDocument pdf = PdfDocument.FromFile("employee-records.pdf");
// Redact an employee name from the entire document
pdf.RedactTextOnAllPages("John Smith");
// Redact a Social Security Number
pdf.RedactTextOnAllPages("123-45-6789");
// Redact an internal employee ID
pdf.RedactTextOnAllPages("EMP-2024-0042");
// Save the cleaned document
pdf.SaveAs("employee-records-redacted.pdf");
Imports IronPdf
' Load the source document
Dim pdf As PdfDocument = PdfDocument.FromFile("employee-records.pdf")
' Redact an employee name from the entire document
pdf.RedactTextOnAllPages("John Smith")
' Redact a Social Security Number
pdf.RedactTextOnAllPages("123-45-6789")
' Redact an internal employee ID
pdf.RedactTextOnAllPages("EMP-2024-0042")
' Save the cleaned document
pdf.SaveAs("employee-records-redacted.pdf")This code loads a PDF containing employee information and removes three pieces of confidential data by calling RedactTextOnAllPages for each value. Each call searches through every page in the document and permanently removes all matching instances of the employee's name, Social Security number, and internal identifier.
Sample Output
The default behavior draws black rectangles where the redacted text appeared and replaces the actual characters with asterisks in the document structure. This provides both visual confirmation that redaction occurred and ensures the original content is completely gone.
When working with longer documents or multiple redaction targets, you can chain these calls efficiently:
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-text-list.csusing IronPdf;
using System.Collections.Generic;
// Load the document once
PdfDocument pdf = PdfDocument.FromFile("quarterly-report.pdf");
// Define all terms that need redaction
List<string> sensitiveTerms = new List<string>
{
"Project Titan",
"Sarah Johnson",
"Budget: $4.2M",
"Q3-INTERNAL-2024",
"sarah.johnson@company.com"
};
// Redact each term
foreach (string term in sensitiveTerms)
{
pdf.RedactTextOnAllPages(term);
}
// Save the result
pdf.SaveAs("quarterly-report-public.pdf");
Imports IronPdf
Imports System.Collections.Generic
' Load the document once
Dim pdf As PdfDocument = PdfDocument.FromFile("quarterly-report.pdf")
' Define all terms that need redaction
Dim sensitiveTerms As New List(Of String) From {
"Project Titan",
"Sarah Johnson",
"Budget: $4.2M",
"Q3-INTERNAL-2024",
"sarah.johnson@company.com"
}
' Redact each term
For Each term As String In sensitiveTerms
pdf.RedactTextOnAllPages(term)
Next
' Save the result
pdf.SaveAs("quarterly-report-public.pdf")This pattern works well when you have a known list of sensitive values to remove. The document is loaded once, all redactions are applied in memory, and the final result is saved. Each term is processed independently, so partial matches or formatting differences between terms do not affect other redactions.
How can I Redact Text on Specific Pages Only?
Sometimes you need more precise control over where redactions occur. A document might have a cover page with information that should remain intact, or you might know that confidential data only appears in certain sections. IronPDF offers RedactTextOnPage for single page redaction and RedactTextOnPages for targeting multiple specific pages.
Input
A multi-page contract bundle with client names on the signature page and financial terms appearing on specific pages throughout the document.
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-specific-pages.csusing IronPdf;
// Load the document
PdfDocument pdf = PdfDocument.FromFile("contract-bundle.pdf");
// Redact text only on page 1 (index 0)
pdf.RedactTextOnPage(0, "Client Name: Acme Corporation");
// Redact text on pages 3, 5, and 7 (indices 2, 4, 6)
int[] financialPages = { 2, 4, 6 };
pdf.RedactTextOnPages(financialPages, "Payment Terms: Net 30");
// Other pages remain untouched except for the specific redactions applied
pdf.SaveAs("contract-bundle-redacted.pdf");
Imports IronPdf
' Load the document
Dim pdf As PdfDocument = PdfDocument.FromFile("contract-bundle.pdf")
' Redact text only on page 1 (index 0)
pdf.RedactTextOnPage(0, "Client Name: Acme Corporation")
' Redact text on pages 3, 5, and 7 (indices 2, 4, 6)
Dim financialPages As Integer() = {2, 4, 6}
pdf.RedactTextOnPages(financialPages, "Payment Terms: Net 30")
' Other pages remain untouched except for the specific redactions applied
pdf.SaveAs("contract-bundle-redacted.pdf")This code demonstrates targeted redaction by using RedactTextOnPage for a single page and RedactTextOnPages for multiple specific pages. The client name is removed only from page 1 (index 0), while payment terms are redacted from pages 3, 5, and 7 (indices 2, 4, 6), leaving all other pages untouched.
Sample Output
Page indices in IronPDF are zero based, meaning the first page is index 0, the second is index 1, and so forth. This matches standard programming conventions and aligns with how most developers think about array access.
Targeting specific pages improves performance when processing large documents. Rather than scanning hundreds of pages for text that only appears in a few locations, you can direct the redaction engine exactly where to look. This matters for batch processing scenarios where you might handle thousands of documents. For maximum throughput, consider using async and multithreading techniques.
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-large-document.csusing IronPdf;
// Process a large document efficiently
PdfDocument pdf = PdfDocument.FromFile("annual-report-500-pages.pdf");
// We know from document structure that:
// - Executive summary with names is on pages 1-3
// - Financial data is on pages 45-60
// - Appendix with employee info is on pages 480-495
// Redact executive names from summary section
for (int i = 0; i <= 2; i++)
{
pdf.RedactTextOnPage(i, "CEO: Robert Williams");
pdf.RedactTextOnPage(i, "CFO: Maria Garcia");
}
// Redact specific financial figures from the financial section
int[] financialSection = { 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 };
pdf.RedactTextOnPages(financialSection, "Net Revenue: $847M");
// Redact employee identifiers from appendix
for (int i = 479; i <= 494; i++)
{
pdf.RedactTextOnPage(i, "Employee ID:");
}
pdf.SaveAs("annual-report-public-release.pdf");
Imports IronPdf
' Process a large document efficiently
Dim pdf As PdfDocument = PdfDocument.FromFile("annual-report-500-pages.pdf")
' We know from document structure that:
' - Executive summary with names is on pages 1-3
' - Financial data is on pages 45-60
' - Appendix with employee info is on pages 480-495
' Redact executive names from summary section
For i As Integer = 0 To 2
pdf.RedactTextOnPage(i, "CEO: Robert Williams")
pdf.RedactTextOnPage(i, "CFO: Maria Garcia")
Next
' Redact specific financial figures from the financial section
Dim financialSection As Integer() = {44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59}
pdf.RedactTextOnPages(financialSection, "Net Revenue: $847M")
' Redact employee identifiers from appendix
For i As Integer = 479 To 494
pdf.RedactTextOnPage(i, "Employee ID:")
Next
pdf.SaveAs("annual-report-public-release.pdf")This targeted approach processes only the relevant sections of a 500 page document, significantly reducing execution time compared to scanning every page for each redaction term.
How do I Customize the Appearance of Redacted Content?
IronPDF offers several parameters to control how redactions appear in the final document. You can adjust case sensitivity, whole word matching, whether to draw visual rectangles, and what replacement text appears in place of the redacted content.
Input
A legal brief containing various sensitive terms including classification labels, passwords, and internal reference codes that require different redaction treatments.
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/customize-redaction-appearance.csusing IronPdf;
// Load the document
PdfDocument pdf = PdfDocument.FromFile("legal-brief.pdf");
// Case-sensitive redaction: only matches exact case
// "CLASSIFIED" will be redacted but "classified" or "Classified" will not
pdf.RedactTextOnAllPages(
"CLASSIFIED",
CaseSensitive: true,
OnlyMatchWholeWords: true,
DrawRectangles: true,
ReplacementText: "[REDACTED]"
);
// Case-insensitive redaction: matches regardless of case
// Will redact "Secret", "SECRET", "secret", etc.
pdf.RedactTextOnAllPages(
"secret",
CaseSensitive: false,
OnlyMatchWholeWords: true,
DrawRectangles: true,
ReplacementText: "*****"
);
// Whole word disabled: matches partial strings too
// Will redact "password", "passwords", "mypassword123", etc.
pdf.RedactTextOnAllPages(
"password",
CaseSensitive: false,
OnlyMatchWholeWords: false,
DrawRectangles: true,
ReplacementText: "XXXXX"
);
// No visual rectangle: text is removed but no black box appears
// Useful when you want seamless removal without obvious redaction marks
pdf.RedactTextOnAllPages(
"internal-reference-code",
CaseSensitive: true,
OnlyMatchWholeWords: true,
DrawRectangles: false,
ReplacementText: ""
);
pdf.SaveAs("legal-brief-redacted.pdf");
Imports IronPdf
' Load the document
Dim pdf As PdfDocument = PdfDocument.FromFile("legal-brief.pdf")
' Case-sensitive redaction: only matches exact case
' "CLASSIFIED" will be redacted but "classified" or "Classified" will not
pdf.RedactTextOnAllPages(
"CLASSIFIED",
CaseSensitive:=True,
OnlyMatchWholeWords:=True,
DrawRectangles:=True,
ReplacementText:="[REDACTED]"
)
' Case-insensitive redaction: matches regardless of case
' Will redact "Secret", "SECRET", "secret", etc.
pdf.RedactTextOnAllPages(
"secret",
CaseSensitive:=False,
OnlyMatchWholeWords:=True,
DrawRectangles:=True,
ReplacementText:="*****"
)
' Whole word disabled: matches partial strings too
' Will redact "password", "passwords", "mypassword123", etc.
pdf.RedactTextOnAllPages(
"password",
CaseSensitive:=False,
OnlyMatchWholeWords:=False,
DrawRectangles:=True,
ReplacementText:="XXXXX"
)
' No visual rectangle: text is removed but no black box appears
' Useful when you want seamless removal without obvious redaction marks
pdf.RedactTextOnAllPages(
"internal-reference-code",
CaseSensitive:=True,
OnlyMatchWholeWords:=True,
DrawRectangles:=False,
ReplacementText:=""
)
pdf.SaveAs("legal-brief-redacted.pdf")This code demonstrates four different redaction configurations using the optional parameters of RedactTextOnAllPages. It shows case-sensitive exact matching with "[REDACTED]" replacement, case-insensitive matching with asterisks, partial word matching to catch variations like "passwords", and invisible removal without visual rectangles for seamless content elimination.
Sample Output
The parameters serve different purposes depending on your requirements:
CaseSensitive determines whether matching considers letter case. Legal documents often use specific capitalizations that carry meaning, so case sensitive matching ensures you only remove exact matches. Processing general text where case varies may require case insensitive matching to catch all instances.
OnlyMatchWholeWords controls whether the search matches complete words or partial strings. When redacting names, you typically want whole word matching so that "Smith" does not accidentally redact part of "Blacksmith" or "Smithfield". When redacting patterns like account number prefixes, partial matching may be necessary to catch variations.
DrawRectangles specifies whether black boxes appear where content was removed. Most regulatory and legal contexts require visible redaction marks as evidence that content was deliberately removed instead of accidentally omitted. Internal workflows may prefer invisible removal for cleaner output.
ReplacementText defines what characters appear in place of the redacted content. Common choices include asterisks, "REDACTED" labels, or empty strings. The replacement text appears in the document structure if someone attempts to select or copy from the redacted area.
How can I Use Regular Expressions to Find and Redact Sensitive Patterns?
Redacting known text strings works when you have specific values to remove, but many confidential data types follow predictable patterns rather than fixed values. Social Security numbers, credit card numbers, email addresses, phone numbers, and dates all have recognizable formats that can be matched with regular expressions. Building a pattern based redaction system allows you to remove private information from PDF content without knowing every specific value in advance.
IronPDF's text extraction capabilities combined with redaction methods enable powerful pattern matching workflows. You extract the text, identify matches using .NET regular expressions, and then redact each discovered value.
using IronPdf;
using System.Text.RegularExpressions;
using System.Collections.Generic;
public class PatternRedactor
{
// Common patterns for sensitive data
private static readonly Dictionary<string, string> SensitivePatterns = new Dictionary<string, string>
{
// US Social Security Number: 123-45-6789
{ "SSN", @"\b\d{3}-\d{2}-\d{4}\b" },
// Credit Card Numbers: various formats with 13-19 digits
{ "CreditCard", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
// Email Addresses
{ "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
// US Phone Numbers: (123) 456-7890 or 123-456-7890
{ "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" },
// Dates: MM/DD/YYYY or MM-DD-YYYY
{ "Date", @"\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" },
// IP Addresses
{ "IPAddress", @"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b" }
};
public void RedactPatterns(string inputPath, string outputPath, params string[] patternNames)
{
// Load the PDF
PdfDocument pdf = PdfDocument.FromFile(inputPath);
// Extract all text from the document
string fullText = pdf.ExtractAllText();
// Track unique matches to avoid duplicate redaction attempts
HashSet<string> matchesToRedact = new HashSet<string>();
// Find all matches for requested patterns
foreach (string patternName in patternNames)
{
if (SensitivePatterns.TryGetValue(patternName, out string pattern))
{
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(fullText);
foreach (Match match in matches)
{
matchesToRedact.Add(match.Value);
}
}
}
// Redact each unique match
foreach (string sensitiveValue in matchesToRedact)
{
pdf.RedactTextOnAllPages(sensitiveValue);
}
// Save the redacted document
pdf.SaveAs(outputPath);
}
}
// Usage example
class Program
{
static void Main()
{
PatternRedactor redactor = new PatternRedactor();
// Redact SSNs and credit cards from a financial document
redactor.RedactPatterns(
"customer-data.pdf",
"customer-data-safe.pdf",
"SSN", "CreditCard", "Email"
);
}
}using IronPdf;
using System.Text.RegularExpressions;
using System.Collections.Generic;
public class PatternRedactor
{
// Common patterns for sensitive data
private static readonly Dictionary<string, string> SensitivePatterns = new Dictionary<string, string>
{
// US Social Security Number: 123-45-6789
{ "SSN", @"\b\d{3}-\d{2}-\d{4}\b" },
// Credit Card Numbers: various formats with 13-19 digits
{ "CreditCard", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
// Email Addresses
{ "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
// US Phone Numbers: (123) 456-7890 or 123-456-7890
{ "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" },
// Dates: MM/DD/YYYY or MM-DD-YYYY
{ "Date", @"\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" },
// IP Addresses
{ "IPAddress", @"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b" }
};
public void RedactPatterns(string inputPath, string outputPath, params string[] patternNames)
{
// Load the PDF
PdfDocument pdf = PdfDocument.FromFile(inputPath);
// Extract all text from the document
string fullText = pdf.ExtractAllText();
// Track unique matches to avoid duplicate redaction attempts
HashSet<string> matchesToRedact = new HashSet<string>();
// Find all matches for requested patterns
foreach (string patternName in patternNames)
{
if (SensitivePatterns.TryGetValue(patternName, out string pattern))
{
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(fullText);
foreach (Match match in matches)
{
matchesToRedact.Add(match.Value);
}
}
}
// Redact each unique match
foreach (string sensitiveValue in matchesToRedact)
{
pdf.RedactTextOnAllPages(sensitiveValue);
}
// Save the redacted document
pdf.SaveAs(outputPath);
}
}
// Usage example
class Program
{
static void Main()
{
PatternRedactor redactor = new PatternRedactor();
// Redact SSNs and credit cards from a financial document
redactor.RedactPatterns(
"customer-data.pdf",
"customer-data-safe.pdf",
"SSN", "CreditCard", "Email"
);
}
}Imports IronPdf
Imports System.Text.RegularExpressions
Imports System.Collections.Generic
Public Class PatternRedactor
' Common patterns for sensitive data
Private Shared ReadOnly SensitivePatterns As New Dictionary(Of String, String) From {
' US Social Security Number: 123-45-6789
{"SSN", "\b\d{3}-\d{2}-\d{4}\b"},
' Credit Card Numbers: various formats with 13-19 digits
{"CreditCard", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"},
' Email Addresses
{"Email", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"},
' US Phone Numbers: (123) 456-7890 or 123-456-7890
{"Phone", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"},
' Dates: MM/DD/YYYY or MM-DD-YYYY
{"Date", "\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b"},
' IP Addresses
{"IPAddress", "\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b"}
}
Public Sub RedactPatterns(inputPath As String, outputPath As String, ParamArray patternNames As String())
' Load the PDF
Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath)
' Extract all text from the document
Dim fullText As String = pdf.ExtractAllText()
' Track unique matches to avoid duplicate redaction attempts
Dim matchesToRedact As New HashSet(Of String)()
' Find all matches for requested patterns
For Each patternName As String In patternNames
Dim pattern As String = Nothing
If SensitivePatterns.TryGetValue(patternName, pattern) Then
Dim regex As New Regex(pattern, RegexOptions.IgnoreCase)
Dim matches As MatchCollection = regex.Matches(fullText)
For Each match As Match In matches
matchesToRedact.Add(match.Value)
Next
End If
Next
' Redact each unique match
For Each sensitiveValue As String In matchesToRedact
pdf.RedactTextOnAllPages(sensitiveValue)
Next
' Save the redacted document
pdf.SaveAs(outputPath)
End Sub
End Class
' Usage example
Class Program
Shared Sub Main()
Dim redactor As New PatternRedactor()
' Redact SSNs and credit cards from a financial document
redactor.RedactPatterns(
"customer-data.pdf",
"customer-data-safe.pdf",
"SSN", "CreditCard", "Email"
)
End Sub
End ClassThis pattern based approach scales well because you define the patterns once and apply them to any document. Adding new data types only requires adding new regex patterns to the dictionary.
How do I Build a Reusable Sensitive Data Scanner?
For production environments, you often need to scan documents and report what confidential information exists before deciding whether to redact. This helps with compliance auditing and allows human review of redaction decisions. The following class offers scanning capabilities alongside redaction.
using IronPdf;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Linq;
public class SensitiveDataMatch
{
public string PatternType { get; set; }
public string Value { get; set; }
public int PageNumber { get; set; }
}
public class ScanResult
{
public string FilePath { get; set; }
public List<SensitiveDataMatch> Matches { get; set; } = new List<SensitiveDataMatch>();
public bool ContainsSensitiveData => Matches.Count > 0;
public Dictionary<string, int> GetSummary()
{
return Matches.GroupBy(m => m.PatternType)
.ToDictionary(g => g.Key, g => g.Count());
}
}
public class DocumentScanner
{
private readonly Dictionary<string, string> _patterns;
public DocumentScanner()
{
_patterns = new Dictionary<string, string>
{
{ "Social Security Number", @"\b\d{3}-\d{2}-\d{4}\b" },
{ "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
{ "Email Address", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
{ "Phone Number", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" },
{ "Date of Birth Pattern", @"\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" }
};
}
public ScanResult ScanDocument(string filePath)
{
ScanResult result = new ScanResult { FilePath = filePath };
PdfDocument pdf = PdfDocument.FromFile(filePath);
// Scan each page individually to track location
for (int pageIndex = 0; pageIndex < pdf.PageCount; pageIndex++)
{
string pageText = pdf.ExtractTextFromPage(pageIndex);
foreach (var pattern in _patterns)
{
Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(pageText);
foreach (Match match in matches)
{
result.Matches.Add(new SensitiveDataMatch
{
PatternType = pattern.Key,
Value = MaskValue(match.Value, pattern.Key),
PageNumber = pageIndex + 1
});
}
}
}
return result;
}
// Partially mask values for safe storage
private string MaskValue(string value, string patternType)
{
if (patternType == "Social Security Number" && value.Length >= 4)
{
return "XXX-XX-" + value.Substring(value.Length - 4);
}
if (patternType == "Credit Card" && value.Length >= 4)
{
return "****-****-****-" + value.Substring(value.Length - 4);
}
if (patternType == "Email Address")
{
int atIndex = value.IndexOf('@');
if (atIndex > 2)
{
return value.Substring(0, 2) + "***" + value.Substring(atIndex);
}
}
return value.Length > 4 ? value.Substring(0, 2) + "***" : "****";
}
public void ScanAndRedact(string inputPath, string outputPath)
{
// First scan to identify sensitive data
ScanResult scanResult = ScanDocument(inputPath);
if (!scanResult.ContainsSensitiveData)
{
return;
}
// Load document for redaction
PdfDocument pdf = PdfDocument.FromFile(inputPath);
// Extract unique actual values (not masked) for redaction
string fullText = pdf.ExtractAllText();
HashSet<string> valuesToRedact = new HashSet<string>();
foreach (var pattern in _patterns)
{
Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
foreach (Match match in regex.Matches(fullText))
{
valuesToRedact.Add(match.Value);
}
}
// Apply redactions
foreach (string value in valuesToRedact)
{
pdf.RedactTextOnAllPages(value);
}
pdf.SaveAs(outputPath);
}
}
// Usage
class Program
{
static void Main()
{
DocumentScanner scanner = new DocumentScanner();
// Scan only (for audit purposes)
ScanResult result = scanner.ScanDocument("application-form.pdf");
var summary = result.GetSummary();
// Scan and redact in one operation
scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf");
}
}using IronPdf;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Linq;
public class SensitiveDataMatch
{
public string PatternType { get; set; }
public string Value { get; set; }
public int PageNumber { get; set; }
}
public class ScanResult
{
public string FilePath { get; set; }
public List<SensitiveDataMatch> Matches { get; set; } = new List<SensitiveDataMatch>();
public bool ContainsSensitiveData => Matches.Count > 0;
public Dictionary<string, int> GetSummary()
{
return Matches.GroupBy(m => m.PatternType)
.ToDictionary(g => g.Key, g => g.Count());
}
}
public class DocumentScanner
{
private readonly Dictionary<string, string> _patterns;
public DocumentScanner()
{
_patterns = new Dictionary<string, string>
{
{ "Social Security Number", @"\b\d{3}-\d{2}-\d{4}\b" },
{ "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
{ "Email Address", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
{ "Phone Number", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" },
{ "Date of Birth Pattern", @"\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b" }
};
}
public ScanResult ScanDocument(string filePath)
{
ScanResult result = new ScanResult { FilePath = filePath };
PdfDocument pdf = PdfDocument.FromFile(filePath);
// Scan each page individually to track location
for (int pageIndex = 0; pageIndex < pdf.PageCount; pageIndex++)
{
string pageText = pdf.ExtractTextFromPage(pageIndex);
foreach (var pattern in _patterns)
{
Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(pageText);
foreach (Match match in matches)
{
result.Matches.Add(new SensitiveDataMatch
{
PatternType = pattern.Key,
Value = MaskValue(match.Value, pattern.Key),
PageNumber = pageIndex + 1
});
}
}
}
return result;
}
// Partially mask values for safe storage
private string MaskValue(string value, string patternType)
{
if (patternType == "Social Security Number" && value.Length >= 4)
{
return "XXX-XX-" + value.Substring(value.Length - 4);
}
if (patternType == "Credit Card" && value.Length >= 4)
{
return "****-****-****-" + value.Substring(value.Length - 4);
}
if (patternType == "Email Address")
{
int atIndex = value.IndexOf('@');
if (atIndex > 2)
{
return value.Substring(0, 2) + "***" + value.Substring(atIndex);
}
}
return value.Length > 4 ? value.Substring(0, 2) + "***" : "****";
}
public void ScanAndRedact(string inputPath, string outputPath)
{
// First scan to identify sensitive data
ScanResult scanResult = ScanDocument(inputPath);
if (!scanResult.ContainsSensitiveData)
{
return;
}
// Load document for redaction
PdfDocument pdf = PdfDocument.FromFile(inputPath);
// Extract unique actual values (not masked) for redaction
string fullText = pdf.ExtractAllText();
HashSet<string> valuesToRedact = new HashSet<string>();
foreach (var pattern in _patterns)
{
Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
foreach (Match match in regex.Matches(fullText))
{
valuesToRedact.Add(match.Value);
}
}
// Apply redactions
foreach (string value in valuesToRedact)
{
pdf.RedactTextOnAllPages(value);
}
pdf.SaveAs(outputPath);
}
}
// Usage
class Program
{
static void Main()
{
DocumentScanner scanner = new DocumentScanner();
// Scan only (for audit purposes)
ScanResult result = scanner.ScanDocument("application-form.pdf");
var summary = result.GetSummary();
// Scan and redact in one operation
scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf");
}
}Imports IronPdf
Imports System.Collections.Generic
Imports System.Text.RegularExpressions
Imports System.Linq
Public Class SensitiveDataMatch
Public Property PatternType As String
Public Property Value As String
Public Property PageNumber As Integer
End Class
Public Class ScanResult
Public Property FilePath As String
Public Property Matches As List(Of SensitiveDataMatch) = New List(Of SensitiveDataMatch)()
Public ReadOnly Property ContainsSensitiveData As Boolean
Get
Return Matches.Count > 0
End Get
End Property
Public Function GetSummary() As Dictionary(Of String, Integer)
Return Matches.GroupBy(Function(m) m.PatternType) _
.ToDictionary(Function(g) g.Key, Function(g) g.Count())
End Function
End Class
Public Class DocumentScanner
Private ReadOnly _patterns As Dictionary(Of String, String)
Public Sub New()
_patterns = New Dictionary(Of String, String) From {
{"Social Security Number", "\b\d{3}-\d{2}-\d{4}\b"},
{"Credit Card", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"},
{"Email Address", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"},
{"Phone Number", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"},
{"Date of Birth Pattern", "\b(?:DOB|Date of Birth|Birth Date)[:\s]+\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b"}
}
End Sub
Public Function ScanDocument(filePath As String) As ScanResult
Dim result As New ScanResult With {.FilePath = filePath}
Dim pdf As PdfDocument = PdfDocument.FromFile(filePath)
' Scan each page individually to track location
For pageIndex As Integer = 0 To pdf.PageCount - 1
Dim pageText As String = pdf.ExtractTextFromPage(pageIndex)
For Each pattern In _patterns
Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase)
Dim matches As MatchCollection = regex.Matches(pageText)
For Each match As Match In matches
result.Matches.Add(New SensitiveDataMatch With {
.PatternType = pattern.Key,
.Value = MaskValue(match.Value, pattern.Key),
.PageNumber = pageIndex + 1
})
Next
Next
Next
Return result
End Function
' Partially mask values for safe storage
Private Function MaskValue(value As String, patternType As String) As String
If patternType = "Social Security Number" AndAlso value.Length >= 4 Then
Return "XXX-XX-" & value.Substring(value.Length - 4)
End If
If patternType = "Credit Card" AndAlso value.Length >= 4 Then
Return "****-****-****-" & value.Substring(value.Length - 4)
End If
If patternType = "Email Address" Then
Dim atIndex As Integer = value.IndexOf("@"c)
If atIndex > 2 Then
Return value.Substring(0, 2) & "***" & value.Substring(atIndex)
End If
End If
Return If(value.Length > 4, value.Substring(0, 2) & "***", "****")
End Function
Public Sub ScanAndRedact(inputPath As String, outputPath As String)
' First scan to identify sensitive data
Dim scanResult As ScanResult = ScanDocument(inputPath)
If Not scanResult.ContainsSensitiveData Then
Return
End If
' Load document for redaction
Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath)
' Extract unique actual values (not masked) for redaction
Dim fullText As String = pdf.ExtractAllText()
Dim valuesToRedact As New HashSet(Of String)()
For Each pattern In _patterns
Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase)
For Each match As Match In regex.Matches(fullText)
valuesToRedact.Add(match.Value)
Next
Next
' Apply redactions
For Each value As String In valuesToRedact
pdf.RedactTextOnAllPages(value)
Next
pdf.SaveAs(outputPath)
End Sub
End Class
' Usage
Module Program
Sub Main()
Dim scanner As New DocumentScanner()
' Scan only (for audit purposes)
Dim result As ScanResult = scanner.ScanDocument("application-form.pdf")
Dim summary = result.GetSummary()
' Scan and redact in one operation
scanner.ScanAndRedact("application-form.pdf", "application-form-redacted.pdf")
End Sub
End ModuleThe scanner offers visibility into what confidential information exists before any modifications occur. This supports compliance workflows where you need documentation of what was found and removed. The masking function ensures that log files and reports do not themselves become sources of data exposure.
How do I Redact Specific Regions or Areas in a PDF?
Text redaction handles character based content effectively, but PDFs often contain sensitive information in forms that text matching cannot address. Signatures, photographs, handwritten annotations, stamps, and graphical elements require a different approach. Region based redaction lets you specify rectangular areas by their coordinates and permanently obscure everything within those bounds.
IronPDF uses the RectangleF structure to define redaction regions. You specify the X and Y coordinates of the top left corner, then the width and height of the area. Coordinates are measured in points from the bottom left of the page, which matches the PDF specification's coordinate system.
Input
A signed agreement document containing handwritten signatures and a photo ID that need to be redacted using coordinate-based region targeting.
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-region-basic.csusing IronPdf;
using IronSoftware.Drawing;
// Load a document with signature blocks and photos
PdfDocument pdf = PdfDocument.FromFile("signed-agreement.pdf");
// Define a region for a signature block
// Located 100 points from left, 650 points from bottom
// Width of 200 points, height of 50 points
RectangleF signatureRegion = new RectangleF(100, 650, 200, 50);
// Redact the signature region on all pages
pdf.RedactRegionsOnAllPages(signatureRegion);
// Define a region for a photo ID in the upper right
RectangleF photoRegion = new RectangleF(450, 700, 100, 120);
pdf.RedactRegionsOnAllPages(photoRegion);
// Save the document with regions redacted
pdf.SaveAs("signed-agreement-redacted.pdf");
Imports IronPdf
Imports IronSoftware.Drawing
' Load a document with signature blocks and photos
Dim pdf As PdfDocument = PdfDocument.FromFile("signed-agreement.pdf")
' Define a region for a signature block
' Located 100 points from left, 650 points from bottom
' Width of 200 points, height of 50 points
Dim signatureRegion As New RectangleF(100, 650, 200, 50)
' Redact the signature region on all pages
pdf.RedactRegionsOnAllPages(signatureRegion)
' Define a region for a photo ID in the upper right
Dim photoRegion As New RectangleF(450, 700, 100, 120)
pdf.RedactRegionsOnAllPages(photoRegion)
' Save the document with regions redacted
pdf.SaveAs("signed-agreement-redacted.pdf")This code uses RectangleF structures to define rectangular areas for redaction. The signature region is positioned at coordinates (100, 650) with a 200x50 pixel area, while the photo region is at (450, 700) with a 100x120 pixel area. The RedactRegionsOnAllPages method applies black rectangles over these regions across all pages.
Sample Output
Determining the correct coordinates often requires some experimentation or measurement. PDF pages typically use a coordinate system where one point equals 1/72 of an inch. A standard US Letter page is 612 points wide and 792 points tall. A4 pages are approximately 595 by 842 points. PDF viewing tools that display coordinates as you move the cursor can help, or you can extract page dimensions programmatically:
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-region-dimensions.csusing IronPdf;
using IronSoftware.Drawing;
PdfDocument pdf = PdfDocument.FromFile("form-document.pdf");
// Get dimensions of the first page
var pageInfo = pdf.Pages[0];
// Calculate regions relative to page dimensions
// Redact the bottom quarter of the page where signatures appear
float signatureAreaHeight = (float)(pageInfo.Height / 4);
RectangleF bottomQuarter = new RectangleF(
0, // Start at left edge
0, // Start at bottom
(float)pageInfo.Width, // Full page width
signatureAreaHeight // Quarter of page height
);
pdf.RedactRegionsOnAllPages(bottomQuarter);
// Redact a header area at the top containing letterhead with address
float headerHeight = 100;
RectangleF headerArea = new RectangleF(
0,
(float)(pageInfo.Height - headerHeight), // Position from bottom
(float)pageInfo.Width,
headerHeight
);
pdf.RedactRegionsOnAllPages(headerArea);
pdf.SaveAs("form-document-redacted.pdf");
Imports IronPdf
Imports IronSoftware.Drawing
Dim pdf As PdfDocument = PdfDocument.FromFile("form-document.pdf")
' Get dimensions of the first page
Dim pageInfo = pdf.Pages(0)
' Calculate regions relative to page dimensions
' Redact the bottom quarter of the page where signatures appear
Dim signatureAreaHeight As Single = CSng(pageInfo.Height / 4)
Dim bottomQuarter As New RectangleF(0, 0, CSng(pageInfo.Width), signatureAreaHeight)
pdf.RedactRegionsOnAllPages(bottomQuarter)
' Redact a header area at the top containing letterhead with address
Dim headerHeight As Single = 100
Dim headerArea As New RectangleF(0, CSng(pageInfo.Height - headerHeight), CSng(pageInfo.Width), headerHeight)
pdf.RedactRegionsOnAllPages(headerArea)
pdf.SaveAs("form-document-redacted.pdf")How can I Redact Multiple Regions Across Different Pages?
Complex documents often require different regions redacted on different pages. A multi page form could have signature lines in varying positions, or different pages may contain photos, stamps, or other graphical elements at unique locations. IronPDF includes page specific methods for targeted region redaction.
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/redact-multiple-regions.csusing IronPdf;
using IronSoftware.Drawing;
PdfDocument pdf = PdfDocument.FromFile("multi-page-application.pdf");
// Define page-specific redaction regions
// Page 1: Cover page with applicant photo
RectangleF page1Photo = new RectangleF(450, 600, 120, 150);
pdf.RedactRegionOnPage(0, page1Photo);
// Page 2: Personal information section
RectangleF page2InfoBlock = new RectangleF(50, 400, 250, 200);
pdf.RedactRegionOnPage(1, page2InfoBlock);
// Pages 3-5: Signature lines at the same position
RectangleF signatureLine = new RectangleF(100, 100, 200, 40);
int[] signaturePages = { 2, 3, 4 };
pdf.RedactRegionOnPages(signaturePages, signatureLine);
// Page 6: Multiple regions - notary stamp and witness signature
RectangleF notaryStamp = new RectangleF(400, 150, 150, 150);
RectangleF witnessSignature = new RectangleF(100, 150, 200, 40);
pdf.RedactRegionOnPage(5, notaryStamp);
pdf.RedactRegionOnPage(5, witnessSignature);
pdf.SaveAs("multi-page-application-redacted.pdf");
Imports IronPdf
Imports IronSoftware.Drawing
Dim pdf As PdfDocument = PdfDocument.FromFile("multi-page-application.pdf")
' Define page-specific redaction regions
' Page 1: Cover page with applicant photo
Dim page1Photo As New RectangleF(450, 600, 120, 150)
pdf.RedactRegionOnPage(0, page1Photo)
' Page 2: Personal information section
Dim page2InfoBlock As New RectangleF(50, 400, 250, 200)
pdf.RedactRegionOnPage(1, page2InfoBlock)
' Pages 3-5: Signature lines at the same position
Dim signatureLine As New RectangleF(100, 100, 200, 40)
Dim signaturePages As Integer() = {2, 3, 4}
pdf.RedactRegionOnPages(signaturePages, signatureLine)
' Page 6: Multiple regions - notary stamp and witness signature
Dim notaryStamp As New RectangleF(400, 150, 150, 150)
Dim witnessSignature As New RectangleF(100, 150, 200, 40)
pdf.RedactRegionOnPage(5, notaryStamp)
pdf.RedactRegionOnPage(5, witnessSignature)
pdf.SaveAs("multi-page-application-redacted.pdf")Documents with consistent layouts benefit from reusable region definitions:
using IronPdf;
using IronSoftware.Drawing;
public class FormRegions
{
// Standard form regions based on common templates
public static RectangleF HeaderLogo => new RectangleF(20, 720, 150, 60);
public static RectangleF SignatureBlock => new RectangleF(72, 72, 200, 50);
public static RectangleF DateField => new RectangleF(400, 72, 120, 20);
public static RectangleF PhotoId => new RectangleF(480, 650, 100, 130);
public static RectangleF AddressBlock => new RectangleF(72, 600, 250, 80);
}
class Program
{
static void Main()
{
PdfDocument pdf = PdfDocument.FromFile("standard-form.pdf");
// Apply standard redactions using predefined regions
pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock);
pdf.RedactRegionsOnAllPages(FormRegions.DateField);
pdf.RedactRegionOnPage(0, FormRegions.PhotoId);
pdf.SaveAs("standard-form-redacted.pdf");
}
}using IronPdf;
using IronSoftware.Drawing;
public class FormRegions
{
// Standard form regions based on common templates
public static RectangleF HeaderLogo => new RectangleF(20, 720, 150, 60);
public static RectangleF SignatureBlock => new RectangleF(72, 72, 200, 50);
public static RectangleF DateField => new RectangleF(400, 72, 120, 20);
public static RectangleF PhotoId => new RectangleF(480, 650, 100, 130);
public static RectangleF AddressBlock => new RectangleF(72, 600, 250, 80);
}
class Program
{
static void Main()
{
PdfDocument pdf = PdfDocument.FromFile("standard-form.pdf");
// Apply standard redactions using predefined regions
pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock);
pdf.RedactRegionsOnAllPages(FormRegions.DateField);
pdf.RedactRegionOnPage(0, FormRegions.PhotoId);
pdf.SaveAs("standard-form-redacted.pdf");
}
}Imports IronPdf
Imports IronSoftware.Drawing
Public Class FormRegions
' Standard form regions based on common templates
Public Shared ReadOnly Property HeaderLogo As RectangleF
Get
Return New RectangleF(20, 720, 150, 60)
End Get
End Property
Public Shared ReadOnly Property SignatureBlock As RectangleF
Get
Return New RectangleF(72, 72, 200, 50)
End Get
End Property
Public Shared ReadOnly Property DateField As RectangleF
Get
Return New RectangleF(400, 72, 120, 20)
End Get
End Property
Public Shared ReadOnly Property PhotoId As RectangleF
Get
Return New RectangleF(480, 650, 100, 130)
End Get
End Property
Public Shared ReadOnly Property AddressBlock As RectangleF
Get
Return New RectangleF(72, 600, 250, 80)
End Get
End Property
End Class
Module Program
Sub Main()
Dim pdf As PdfDocument = PdfDocument.FromFile("standard-form.pdf")
' Apply standard redactions using predefined regions
pdf.RedactRegionsOnAllPages(FormRegions.SignatureBlock)
pdf.RedactRegionsOnAllPages(FormRegions.DateField)
pdf.RedactRegionOnPage(0, FormRegions.PhotoId)
pdf.SaveAs("standard-form-redacted.pdf")
End Sub
End ModuleHow do I Remove Metadata that Could Expose Sensitive Information?
PDF metadata represents a frequently overlooked source of information leakage. Every PDF carries properties that can reveal sensitive details: the author's name and username, the software used to create the document, creation and modification timestamps, the original filename, revision history, and custom properties added by various applications. Before sharing documents externally, stripping or sanitizing this metadata is essential. For a comprehensive overview of metadata operations, see our metadata how-to guide.
IronPDF exposes document metadata through the MetaData property, allowing you to read existing values, modify them, or remove them entirely.
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/view-metadata.csusing IronPdf;
// Load a document containing sensitive metadata
PdfDocument pdf = PdfDocument.FromFile("internal-report.pdf");
// Access current metadata properties
string author = pdf.MetaData.Author;
string title = pdf.MetaData.Title;
string subject = pdf.MetaData.Subject;
string keywords = pdf.MetaData.Keywords;
string creator = pdf.MetaData.Creator;
string producer = pdf.MetaData.Producer;
DateTime? creationDate = pdf.MetaData.CreationDate;
DateTime? modifiedDate = pdf.MetaData.ModifiedDate;
// Get all metadata keys including custom properties
var allKeys = pdf.MetaData.Keys();
Imports IronPdf
' Load a document containing sensitive metadata
Dim pdf As PdfDocument = PdfDocument.FromFile("internal-report.pdf")
' Access current metadata properties
Dim author As String = pdf.MetaData.Author
Dim title As String = pdf.MetaData.Title
Dim subject As String = pdf.MetaData.Subject
Dim keywords As String = pdf.MetaData.Keywords
Dim creator As String = pdf.MetaData.Creator
Dim producer As String = pdf.MetaData.Producer
Dim creationDate As DateTime? = pdf.MetaData.CreationDate
Dim modifiedDate As DateTime? = pdf.MetaData.ModifiedDate
' Get all metadata keys including custom properties
Dim allKeys = pdf.MetaData.Keys()To remove sensitive metadata before distribution:
Input
An internal memo containing embedded metadata such as author names, creation timestamps, and custom properties that could reveal sensitive organizational information.
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/remove-metadata.csusing IronPdf;
using System;
PdfDocument pdf = PdfDocument.FromFile("confidential-memo.pdf");
// Replace identifying metadata with generic values
pdf.MetaData.Author = "Organization Name";
pdf.MetaData.Creator = "Document System";
pdf.MetaData.Producer = "";
pdf.MetaData.Title = "Public Document";
pdf.MetaData.Subject = "";
pdf.MetaData.Keywords = "";
// Normalize dates to remove timing information
pdf.MetaData.CreationDate = DateTime.Now;
pdf.MetaData.ModifiedDate = DateTime.Now;
// Remove specific custom metadata keys
pdf.MetaData.RemoveMetaDataKey("OriginalFilename");
pdf.MetaData.RemoveMetaDataKey("LastSavedBy");
pdf.MetaData.RemoveMetaDataKey("Company");
pdf.MetaData.RemoveMetaDataKey("Manager");
// Remove custom properties added by applications
try
{
pdf.MetaData.CustomProperties.Remove("SourcePath");
}
catch { }
pdf.SaveAs("confidential-memo-cleaned.pdf");
Imports IronPdf
Imports System
Dim pdf As PdfDocument = PdfDocument.FromFile("confidential-memo.pdf")
' Replace identifying metadata with generic values
pdf.MetaData.Author = "Organization Name"
pdf.MetaData.Creator = "Document System"
pdf.MetaData.Producer = ""
pdf.MetaData.Title = "Public Document"
pdf.MetaData.Subject = ""
pdf.MetaData.Keywords = ""
' Normalize dates to remove timing information
pdf.MetaData.CreationDate = DateTime.Now
pdf.MetaData.ModifiedDate = DateTime.Now
' Remove specific custom metadata keys
pdf.MetaData.RemoveMetaDataKey("OriginalFilename")
pdf.MetaData.RemoveMetaDataKey("LastSavedBy")
pdf.MetaData.RemoveMetaDataKey("Company")
pdf.MetaData.RemoveMetaDataKey("Manager")
' Remove custom properties added by applications
Try
pdf.MetaData.CustomProperties.Remove("SourcePath")
Catch
End Try
pdf.SaveAs("confidential-memo-cleaned.pdf")This code replaces identifying metadata fields with generic values, normalizes timestamps to the current date, and removes custom metadata keys that applications may have added. The RemoveMetaDataKey method targets specific properties like "OriginalFilename" and "LastSavedBy" that could expose internal information.
Sample Output
Thorough metadata cleaning across batch operations requires a systematic approach:
using IronPdf;
using System;
using System.Collections.Generic;
public class MetadataCleaner
{
private readonly string _defaultAuthor;
private readonly string _defaultCreator;
public MetadataCleaner(string organizationName)
{
_defaultAuthor = organizationName;
_defaultCreator = $"{organizationName} Document System";
}
public void CleanMetadata(PdfDocument pdf)
{
// Replace standard metadata fields
pdf.MetaData.Author = _defaultAuthor;
pdf.MetaData.Creator = _defaultCreator;
pdf.MetaData.Producer = "";
pdf.MetaData.Subject = "";
pdf.MetaData.Keywords = "";
// Normalize timestamps
DateTime now = DateTime.Now;
pdf.MetaData.CreationDate = now;
pdf.MetaData.ModifiedDate = now;
// Get all keys and remove potentially sensitive ones
List<string> keysToRemove = new List<string>();
foreach (string key in pdf.MetaData.Keys())
{
// Keep only essential keys
if (!IsEssentialKey(key))
{
keysToRemove.Add(key);
}
}
foreach (string key in keysToRemove)
{
pdf.MetaData.RemoveMetaDataKey(key);
}
}
private bool IsEssentialKey(string key)
{
// Keep only the basic display properties
string[] essentialKeys = { "Title", "Author", "CreationDate", "ModifiedDate" };
foreach (string essential in essentialKeys)
{
if (key.Equals(essential, StringComparison.OrdinalIgnoreCase))
{
return true;
}
}
return false;
}
}
// Usage
class Program
{
static void Main()
{
MetadataCleaner cleaner = new MetadataCleaner("Acme Corporation");
PdfDocument pdf = PdfDocument.FromFile("report.pdf");
cleaner.CleanMetadata(pdf);
pdf.SaveAs("report-clean.pdf");
}
}using IronPdf;
using System;
using System.Collections.Generic;
public class MetadataCleaner
{
private readonly string _defaultAuthor;
private readonly string _defaultCreator;
public MetadataCleaner(string organizationName)
{
_defaultAuthor = organizationName;
_defaultCreator = $"{organizationName} Document System";
}
public void CleanMetadata(PdfDocument pdf)
{
// Replace standard metadata fields
pdf.MetaData.Author = _defaultAuthor;
pdf.MetaData.Creator = _defaultCreator;
pdf.MetaData.Producer = "";
pdf.MetaData.Subject = "";
pdf.MetaData.Keywords = "";
// Normalize timestamps
DateTime now = DateTime.Now;
pdf.MetaData.CreationDate = now;
pdf.MetaData.ModifiedDate = now;
// Get all keys and remove potentially sensitive ones
List<string> keysToRemove = new List<string>();
foreach (string key in pdf.MetaData.Keys())
{
// Keep only essential keys
if (!IsEssentialKey(key))
{
keysToRemove.Add(key);
}
}
foreach (string key in keysToRemove)
{
pdf.MetaData.RemoveMetaDataKey(key);
}
}
private bool IsEssentialKey(string key)
{
// Keep only the basic display properties
string[] essentialKeys = { "Title", "Author", "CreationDate", "ModifiedDate" };
foreach (string essential in essentialKeys)
{
if (key.Equals(essential, StringComparison.OrdinalIgnoreCase))
{
return true;
}
}
return false;
}
}
// Usage
class Program
{
static void Main()
{
MetadataCleaner cleaner = new MetadataCleaner("Acme Corporation");
PdfDocument pdf = PdfDocument.FromFile("report.pdf");
cleaner.CleanMetadata(pdf);
pdf.SaveAs("report-clean.pdf");
}
}Imports IronPdf
Imports System
Imports System.Collections.Generic
Public Class MetadataCleaner
Private ReadOnly _defaultAuthor As String
Private ReadOnly _defaultCreator As String
Public Sub New(organizationName As String)
_defaultAuthor = organizationName
_defaultCreator = $"{organizationName} Document System"
End Sub
Public Sub CleanMetadata(pdf As PdfDocument)
' Replace standard metadata fields
pdf.MetaData.Author = _defaultAuthor
pdf.MetaData.Creator = _defaultCreator
pdf.MetaData.Producer = ""
pdf.MetaData.Subject = ""
pdf.MetaData.Keywords = ""
' Normalize timestamps
Dim now As DateTime = DateTime.Now
pdf.MetaData.CreationDate = now
pdf.MetaData.ModifiedDate = now
' Get all keys and remove potentially sensitive ones
Dim keysToRemove As New List(Of String)()
For Each key As String In pdf.MetaData.Keys()
' Keep only essential keys
If Not IsEssentialKey(key) Then
keysToRemove.Add(key)
End If
Next
For Each key As String In keysToRemove
pdf.MetaData.RemoveMetaDataKey(key)
Next
End Sub
Private Function IsEssentialKey(key As String) As Boolean
' Keep only the basic display properties
Dim essentialKeys As String() = {"Title", "Author", "CreationDate", "ModifiedDate"}
For Each essential As String In essentialKeys
If key.Equals(essential, StringComparison.OrdinalIgnoreCase) Then
Return True
End If
Next
Return False
End Function
End Class
' Usage
Class Program
Shared Sub Main()
Dim cleaner As New MetadataCleaner("Acme Corporation")
Dim pdf As PdfDocument = PdfDocument.FromFile("report.pdf")
cleaner.CleanMetadata(pdf)
pdf.SaveAs("report-clean.pdf")
End Sub
End ClassHow can I Sanitize a PDF to Remove Embedded Scripts and Hidden Threats?
PDF sanitization addresses security concerns that go beyond visible content and metadata. PDF files can contain JavaScript code, embedded executables, form actions that trigger external connections, and other potentially malicious elements. These capabilities exist for legitimate purposes like interactive forms and multimedia content, but they also create attack vectors. Sanitizing a PDF removes these active elements while preserving the visual content. For additional details on sanitization methods, see our sanitize PDF how-to guide.
IronPDF's Cleaner class handles sanitization through an elegant approach: converting the PDF to an image format and then converting it back. This process strips away JavaScript, embedded objects, form actions, and annotations while keeping the visual appearance intact. The library offers two sanitization methods with different characteristics.
Input
A PDF document received from an external source that may contain JavaScript, embedded objects, or other potentially malicious active content.
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/sanitize-pdf.csusing IronPdf;
// Load a PDF that may contain active content
PdfDocument pdf = PdfDocument.FromFile("received-document.pdf");
// Sanitize using SVG conversion
// Faster processing, results in searchable text, slight layout variations possible
PdfDocument sanitizedSvg = Cleaner.SanitizeWithSvg(pdf);
sanitizedSvg.SaveAs("sanitized-svg.pdf");
// Sanitize using Bitmap conversion
// Slower processing, text becomes image (not searchable), exact visual reproduction
PdfDocument sanitizedBitmap = Cleaner.SanitizeWithBitmap(pdf);
sanitizedBitmap.SaveAs("sanitized-bitmap.pdf");
Imports IronPdf
' Load a PDF that may contain active content
Dim pdf As PdfDocument = PdfDocument.FromFile("received-document.pdf")
' Sanitize using SVG conversion
' Faster processing, results in searchable text, slight layout variations possible
Dim sanitizedSvg As PdfDocument = Cleaner.SanitizeWithSvg(pdf)
sanitizedSvg.SaveAs("sanitized-svg.pdf")
' Sanitize using Bitmap conversion
' Slower processing, text becomes image (not searchable), exact visual reproduction
Dim sanitizedBitmap As PdfDocument = Cleaner.SanitizeWithBitmap(pdf)
sanitizedBitmap.SaveAs("sanitized-bitmap.pdf")This code demonstrates two sanitization methods provided by IronPDF's Cleaner class. SanitizeWithSvg converts the PDF through an SVG intermediate format, preserving searchable text while removing active content. SanitizeWithBitmap converts pages to images first, producing exact visual copies but with text rendered as non-searchable graphics.
Sample Output
The SVG method works faster and preserves text as searchable content, making it suitable for documents that need to remain indexed or accessible. The bitmap method produces exact visual copies but converts text to images, which prevents text selection and searching. Choose based on your requirements for the output document.
You can also apply rendering options during sanitization to adjust the output:
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/sanitize-with-options.csusing IronPdf;
// Load the potentially unsafe document
PdfDocument pdf = PdfDocument.FromFile("untrusted-source.pdf");
// Configure rendering options for sanitization
var renderOptions = new ChromePdfRenderOptions
{
MarginTop = 10,
MarginBottom = 10,
MarginLeft = 10,
MarginRight = 10
};
// Sanitize with custom options
PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf, renderOptions);
sanitized.SaveAs("untrusted-source-safe.pdf");
Imports IronPdf
' Load the potentially unsafe document
Dim pdf As PdfDocument = PdfDocument.FromFile("untrusted-source.pdf")
' Configure rendering options for sanitization
Dim renderOptions As New ChromePdfRenderOptions With {
.MarginTop = 10,
.MarginBottom = 10,
.MarginLeft = 10,
.MarginRight = 10
}
' Sanitize with custom options
Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf, renderOptions)
sanitized.SaveAs("untrusted-source-safe.pdf")High security environments often require combining sanitization with other protective measures:
using IronPdf;
using System;
public class SecureDocumentProcessor
{
public PdfDocument ProcessUntrustedDocument(string inputPath)
{
// Load the document
PdfDocument original = PdfDocument.FromFile(inputPath);
// Step 1: Sanitize to remove active content
PdfDocument sanitized = Cleaner.SanitizeWithSvg(original);
// Step 2: Clean metadata
sanitized.MetaData.Author = "Processed Document";
sanitized.MetaData.Creator = "Secure Processor";
sanitized.MetaData.Producer = "";
sanitized.MetaData.CreationDate = DateTime.Now;
sanitized.MetaData.ModifiedDate = DateTime.Now;
// Remove all custom metadata
foreach (string key in sanitized.MetaData.Keys())
{
if (key != "Title" && key != "Author" && key != "CreationDate" && key != "ModifiedDate")
{
sanitized.MetaData.RemoveMetaDataKey(key);
}
}
return sanitized;
}
}
// Usage
class Program
{
static void Main()
{
SecureDocumentProcessor processor = new SecureDocumentProcessor();
PdfDocument safe = processor.ProcessUntrustedDocument("email-attachment.pdf");
safe.SaveAs("email-attachment-safe.pdf");
}
}using IronPdf;
using System;
public class SecureDocumentProcessor
{
public PdfDocument ProcessUntrustedDocument(string inputPath)
{
// Load the document
PdfDocument original = PdfDocument.FromFile(inputPath);
// Step 1: Sanitize to remove active content
PdfDocument sanitized = Cleaner.SanitizeWithSvg(original);
// Step 2: Clean metadata
sanitized.MetaData.Author = "Processed Document";
sanitized.MetaData.Creator = "Secure Processor";
sanitized.MetaData.Producer = "";
sanitized.MetaData.CreationDate = DateTime.Now;
sanitized.MetaData.ModifiedDate = DateTime.Now;
// Remove all custom metadata
foreach (string key in sanitized.MetaData.Keys())
{
if (key != "Title" && key != "Author" && key != "CreationDate" && key != "ModifiedDate")
{
sanitized.MetaData.RemoveMetaDataKey(key);
}
}
return sanitized;
}
}
// Usage
class Program
{
static void Main()
{
SecureDocumentProcessor processor = new SecureDocumentProcessor();
PdfDocument safe = processor.ProcessUntrustedDocument("email-attachment.pdf");
safe.SaveAs("email-attachment-safe.pdf");
}
}Imports IronPdf
Imports System
Public Class SecureDocumentProcessor
Public Function ProcessUntrustedDocument(inputPath As String) As PdfDocument
' Load the document
Dim original As PdfDocument = PdfDocument.FromFile(inputPath)
' Step 1: Sanitize to remove active content
Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(original)
' Step 2: Clean metadata
sanitized.MetaData.Author = "Processed Document"
sanitized.MetaData.Creator = "Secure Processor"
sanitized.MetaData.Producer = ""
sanitized.MetaData.CreationDate = DateTime.Now
sanitized.MetaData.ModifiedDate = DateTime.Now
' Remove all custom metadata
For Each key As String In sanitized.MetaData.Keys()
If key <> "Title" AndAlso key <> "Author" AndAlso key <> "CreationDate" AndAlso key <> "ModifiedDate" Then
sanitized.MetaData.RemoveMetaDataKey(key)
End If
Next
Return sanitized
End Function
End Class
' Usage
Module Program
Sub Main()
Dim processor As New SecureDocumentProcessor()
Dim safe As PdfDocument = processor.ProcessUntrustedDocument("email-attachment.pdf")
safe.SaveAs("email-attachment-safe.pdf")
End Sub
End ModuleHow do I Scan a PDF for Security Vulnerabilities?
Before processing or sanitizing documents, you may want to assess what potential threats they contain. IronPDF's Cleaner.ScanPdf method examines documents using YARA rules, which are pattern definitions commonly used in malware analysis and threat detection. The scan identifies characteristics associated with malicious PDF files.
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/scan-vulnerabilities.csusing IronPdf;
// Load the document to scan
PdfDocument pdf = PdfDocument.FromFile("suspicious-document.pdf");
// Scan using default YARA rules
CleanerScanResult scanResult = Cleaner.ScanPdf(pdf);
// Check the scan results
bool threatsDetected = scanResult.IsDetected;
int riskCount = scanResult.Risks.Count;
// Process identified risks
if (scanResult.IsDetected)
{
foreach (var risk in scanResult.Risks)
{
// Handle each identified risk
}
// Sanitize the document before use
PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf);
sanitized.SaveAs("suspicious-document-safe.pdf");
}
Imports IronPdf
' Load the document to scan
Dim pdf As PdfDocument = PdfDocument.FromFile("suspicious-document.pdf")
' Scan using default YARA rules
Dim scanResult As CleanerScanResult = Cleaner.ScanPdf(pdf)
' Check the scan results
Dim threatsDetected As Boolean = scanResult.IsDetected
Dim riskCount As Integer = scanResult.Risks.Count
' Process identified risks
If scanResult.IsDetected Then
For Each risk In scanResult.Risks
' Handle each identified risk
Next
' Sanitize the document before use
Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf)
sanitized.SaveAs("suspicious-document-safe.pdf")
End IfYou can provide custom YARA rule files for specialized detection requirements. Organizations with specific threat models or compliance needs often maintain their own rule sets targeting particular vulnerability patterns.
:path=/static-assets/pdf/content-code-examples/tutorials/pdf-redaction-csharp/scan-custom-yara.csusing IronPdf;
PdfDocument pdf = PdfDocument.FromFile("incoming-document.pdf");
// Scan with custom YARA rules
string[] customYaraFiles = { "corporate-rules.yar", "industry-specific.yar" };
CleanerScanResult result = Cleaner.ScanPdf(pdf, customYaraFiles);
if (result.IsDetected)
{
// Document triggered custom rules and requires review or sanitization
PdfDocument sanitized = Cleaner.SanitizeWithSvg(pdf);
sanitized.SaveAs("incoming-document-safe.pdf");
}
Imports IronPdf
Dim pdf As PdfDocument = PdfDocument.FromFile("incoming-document.pdf")
' Scan with custom YARA rules
Dim customYaraFiles As String() = {"corporate-rules.yar", "industry-specific.yar"}
Dim result As CleanerScanResult = Cleaner.ScanPdf(pdf, customYaraFiles)
If result.IsDetected Then
' Document triggered custom rules and requires review or sanitization
Dim sanitized As PdfDocument = Cleaner.SanitizeWithSvg(pdf)
sanitized.SaveAs("incoming-document-safe.pdf")
End IfIntegrating scanning into document intake workflows helps automate security decisions:
using IronPdf;
using System;
using System.IO;
public enum DocumentSafetyLevel
{
Safe,
Suspicious,
Dangerous
}
public class DocumentSecurityGateway
{
public DocumentSafetyLevel EvaluateDocument(string filePath)
{
PdfDocument pdf = PdfDocument.FromFile(filePath);
CleanerScanResult scan = Cleaner.ScanPdf(pdf);
if (!scan.IsDetected)
{
return DocumentSafetyLevel.Safe;
}
// Evaluate severity based on number of risks
if (scan.Risks.Count > 5)
{
return DocumentSafetyLevel.Dangerous;
}
return DocumentSafetyLevel.Suspicious;
}
public PdfDocument ProcessIncomingDocument(string filePath, string outputDirectory)
{
DocumentSafetyLevel safety = EvaluateDocument(filePath);
string fileName = Path.GetFileName(filePath);
switch (safety)
{
case DocumentSafetyLevel.Safe:
return PdfDocument.FromFile(filePath);
case DocumentSafetyLevel.Suspicious:
PdfDocument suspicious = PdfDocument.FromFile(filePath);
return Cleaner.SanitizeWithSvg(suspicious);
case DocumentSafetyLevel.Dangerous:
throw new SecurityException($"Document {fileName} contains dangerous content");
default:
throw new InvalidOperationException("Unknown safety level");
}
}
}using IronPdf;
using System;
using System.IO;
public enum DocumentSafetyLevel
{
Safe,
Suspicious,
Dangerous
}
public class DocumentSecurityGateway
{
public DocumentSafetyLevel EvaluateDocument(string filePath)
{
PdfDocument pdf = PdfDocument.FromFile(filePath);
CleanerScanResult scan = Cleaner.ScanPdf(pdf);
if (!scan.IsDetected)
{
return DocumentSafetyLevel.Safe;
}
// Evaluate severity based on number of risks
if (scan.Risks.Count > 5)
{
return DocumentSafetyLevel.Dangerous;
}
return DocumentSafetyLevel.Suspicious;
}
public PdfDocument ProcessIncomingDocument(string filePath, string outputDirectory)
{
DocumentSafetyLevel safety = EvaluateDocument(filePath);
string fileName = Path.GetFileName(filePath);
switch (safety)
{
case DocumentSafetyLevel.Safe:
return PdfDocument.FromFile(filePath);
case DocumentSafetyLevel.Suspicious:
PdfDocument suspicious = PdfDocument.FromFile(filePath);
return Cleaner.SanitizeWithSvg(suspicious);
case DocumentSafetyLevel.Dangerous:
throw new SecurityException($"Document {fileName} contains dangerous content");
default:
throw new InvalidOperationException("Unknown safety level");
}
}
}Imports IronPdf
Imports System
Imports System.IO
Public Enum DocumentSafetyLevel
Safe
Suspicious
Dangerous
End Enum
Public Class DocumentSecurityGateway
Public Function EvaluateDocument(filePath As String) As DocumentSafetyLevel
Dim pdf As PdfDocument = PdfDocument.FromFile(filePath)
Dim scan As CleanerScanResult = Cleaner.ScanPdf(pdf)
If Not scan.IsDetected Then
Return DocumentSafetyLevel.Safe
End If
' Evaluate severity based on number of risks
If scan.Risks.Count > 5 Then
Return DocumentSafetyLevel.Dangerous
End If
Return DocumentSafetyLevel.Suspicious
End Function
Public Function ProcessIncomingDocument(filePath As String, outputDirectory As String) As PdfDocument
Dim safety As DocumentSafetyLevel = EvaluateDocument(filePath)
Dim fileName As String = Path.GetFileName(filePath)
Select Case safety
Case DocumentSafetyLevel.Safe
Return PdfDocument.FromFile(filePath)
Case DocumentSafetyLevel.Suspicious
Dim suspicious As PdfDocument = PdfDocument.FromFile(filePath)
Return Cleaner.SanitizeWithSvg(suspicious)
Case DocumentSafetyLevel.Dangerous
Throw New SecurityException($"Document {fileName} contains dangerous content")
Case Else
Throw New InvalidOperationException("Unknown safety level")
End Select
End Function
End ClassHow can I Build a Complete Redaction and Sanitization Pipeline?
Production document processing typically requires combining multiple protection techniques into a cohesive workflow. A complete pipeline might scan incoming documents for threats, sanitize those that pass initial screening, apply text and region redactions, strip metadata, and produce audit logs documenting all actions taken. This example demonstrates such an integrated approach.
using IronPdf;
using IronSoftware.Drawing;
using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
public class DocumentProcessingResult
{
public string OriginalFile { get; set; }
public string OutputFile { get; set; }
public bool WasSanitized { get; set; }
public int TextRedactionsApplied { get; set; }
public int RegionRedactionsApplied { get; set; }
public bool MetadataCleaned { get; set; }
public List<string> SensitiveDataTypesFound { get; set; } = new List<string>();
public DateTime ProcessedAt { get; set; }
public bool Success { get; set; }
public string ErrorMessage { get; set; }
}
public class ComprehensiveDocumentProcessor
{
// Sensitive data patterns
private readonly Dictionary<string, string> _sensitivePatterns = new Dictionary<string, string>
{
{ "SSN", @"\b\d{3}-\d{2}-\d{4}\b" },
{ "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
{ "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
{ "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }
};
// Standard regions to redact (signature areas, photo locations)
private readonly List<RectangleF> _standardRedactionRegions = new List<RectangleF>
{
new RectangleF(72, 72, 200, 50), // Bottom left signature
new RectangleF(350, 72, 200, 50) // Bottom right signature
};
private readonly string _organizationName;
public ComprehensiveDocumentProcessor(string organizationName)
{
_organizationName = organizationName;
}
public DocumentProcessingResult ProcessDocument(
string inputPath,
string outputPath,
bool sanitize = true,
bool redactPatterns = true,
bool redactRegions = true,
bool cleanMetadata = true,
List<string> additionalTermsToRedact = null)
{
var result = new DocumentProcessingResult
{
OriginalFile = inputPath,
OutputFile = outputPath,
ProcessedAt = DateTime.Now
};
try
{
// Load the document
PdfDocument pdf = PdfDocument.FromFile(inputPath);
// Step 1: Security scan
CleanerScanResult scanResult = Cleaner.ScanPdf(pdf);
if (scanResult.IsDetected && scanResult.Risks.Count > 10)
{
throw new SecurityException("Document contains too many security risks to process");
}
// Step 2: Sanitization (if needed or requested)
if (sanitize || scanResult.IsDetected)
{
pdf = Cleaner.SanitizeWithSvg(pdf);
result.WasSanitized = true;
}
// Step 3: Pattern-based text redaction
if (redactPatterns)
{
string fullText = pdf.ExtractAllText();
HashSet<string> valuesToRedact = new HashSet<string>();
foreach (var pattern in _sensitivePatterns)
{
Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(fullText);
if (matches.Count > 0)
{
result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})");
foreach (Match match in matches)
{
valuesToRedact.Add(match.Value);
}
}
}
// Apply redactions
foreach (string value in valuesToRedact)
{
pdf.RedactTextOnAllPages(value);
result.TextRedactionsApplied++;
}
}
// Step 4: Additional specific terms
if (additionalTermsToRedact != null)
{
foreach (string term in additionalTermsToRedact)
{
pdf.RedactTextOnAllPages(term);
result.TextRedactionsApplied++;
}
}
// Step 5: Region-based redaction
if (redactRegions)
{
foreach (RectangleF region in _standardRedactionRegions)
{
pdf.RedactRegionsOnAllPages(region);
result.RegionRedactionsApplied++;
}
}
// Step 6: Metadata cleaning
if (cleanMetadata)
{
pdf.MetaData.Author = _organizationName;
pdf.MetaData.Creator = $"{_organizationName} Document Processor";
pdf.MetaData.Producer = "";
pdf.MetaData.Subject = "";
pdf.MetaData.Keywords = "";
pdf.MetaData.CreationDate = DateTime.Now;
pdf.MetaData.ModifiedDate = DateTime.Now;
result.MetadataCleaned = true;
}
// Step 7: Save the processed document
pdf.SaveAs(outputPath);
result.Success = true;
}
catch (Exception ex)
{
result.Success = false;
result.ErrorMessage = ex.Message;
}
return result;
}
}
// Usage example
class Program
{
static void Main()
{
var processor = new ComprehensiveDocumentProcessor("Acme Corporation");
// Process a single document with all protections
var result = processor.ProcessDocument(
inputPath: "customer-application.pdf",
outputPath: "customer-application-redacted.pdf",
sanitize: true,
redactPatterns: true,
redactRegions: true,
cleanMetadata: true,
additionalTermsToRedact: new List<string> { "Project Alpha", "Internal Use Only" }
);
// Batch process multiple documents
string[] inputFiles = Directory.GetFiles("incoming", "*.pdf");
foreach (string file in inputFiles)
{
string outputFile = Path.Combine("processed", Path.GetFileName(file));
processor.ProcessDocument(file, outputFile);
}
}
}using IronPdf;
using IronSoftware.Drawing;
using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
public class DocumentProcessingResult
{
public string OriginalFile { get; set; }
public string OutputFile { get; set; }
public bool WasSanitized { get; set; }
public int TextRedactionsApplied { get; set; }
public int RegionRedactionsApplied { get; set; }
public bool MetadataCleaned { get; set; }
public List<string> SensitiveDataTypesFound { get; set; } = new List<string>();
public DateTime ProcessedAt { get; set; }
public bool Success { get; set; }
public string ErrorMessage { get; set; }
}
public class ComprehensiveDocumentProcessor
{
// Sensitive data patterns
private readonly Dictionary<string, string> _sensitivePatterns = new Dictionary<string, string>
{
{ "SSN", @"\b\d{3}-\d{2}-\d{4}\b" },
{ "Credit Card", @"\b(?:\d{4}[-\s]?){3}\d{1,4}\b" },
{ "Email", @"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b" },
{ "Phone", @"\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b" }
};
// Standard regions to redact (signature areas, photo locations)
private readonly List<RectangleF> _standardRedactionRegions = new List<RectangleF>
{
new RectangleF(72, 72, 200, 50), // Bottom left signature
new RectangleF(350, 72, 200, 50) // Bottom right signature
};
private readonly string _organizationName;
public ComprehensiveDocumentProcessor(string organizationName)
{
_organizationName = organizationName;
}
public DocumentProcessingResult ProcessDocument(
string inputPath,
string outputPath,
bool sanitize = true,
bool redactPatterns = true,
bool redactRegions = true,
bool cleanMetadata = true,
List<string> additionalTermsToRedact = null)
{
var result = new DocumentProcessingResult
{
OriginalFile = inputPath,
OutputFile = outputPath,
ProcessedAt = DateTime.Now
};
try
{
// Load the document
PdfDocument pdf = PdfDocument.FromFile(inputPath);
// Step 1: Security scan
CleanerScanResult scanResult = Cleaner.ScanPdf(pdf);
if (scanResult.IsDetected && scanResult.Risks.Count > 10)
{
throw new SecurityException("Document contains too many security risks to process");
}
// Step 2: Sanitization (if needed or requested)
if (sanitize || scanResult.IsDetected)
{
pdf = Cleaner.SanitizeWithSvg(pdf);
result.WasSanitized = true;
}
// Step 3: Pattern-based text redaction
if (redactPatterns)
{
string fullText = pdf.ExtractAllText();
HashSet<string> valuesToRedact = new HashSet<string>();
foreach (var pattern in _sensitivePatterns)
{
Regex regex = new Regex(pattern.Value, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(fullText);
if (matches.Count > 0)
{
result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})");
foreach (Match match in matches)
{
valuesToRedact.Add(match.Value);
}
}
}
// Apply redactions
foreach (string value in valuesToRedact)
{
pdf.RedactTextOnAllPages(value);
result.TextRedactionsApplied++;
}
}
// Step 4: Additional specific terms
if (additionalTermsToRedact != null)
{
foreach (string term in additionalTermsToRedact)
{
pdf.RedactTextOnAllPages(term);
result.TextRedactionsApplied++;
}
}
// Step 5: Region-based redaction
if (redactRegions)
{
foreach (RectangleF region in _standardRedactionRegions)
{
pdf.RedactRegionsOnAllPages(region);
result.RegionRedactionsApplied++;
}
}
// Step 6: Metadata cleaning
if (cleanMetadata)
{
pdf.MetaData.Author = _organizationName;
pdf.MetaData.Creator = $"{_organizationName} Document Processor";
pdf.MetaData.Producer = "";
pdf.MetaData.Subject = "";
pdf.MetaData.Keywords = "";
pdf.MetaData.CreationDate = DateTime.Now;
pdf.MetaData.ModifiedDate = DateTime.Now;
result.MetadataCleaned = true;
}
// Step 7: Save the processed document
pdf.SaveAs(outputPath);
result.Success = true;
}
catch (Exception ex)
{
result.Success = false;
result.ErrorMessage = ex.Message;
}
return result;
}
}
// Usage example
class Program
{
static void Main()
{
var processor = new ComprehensiveDocumentProcessor("Acme Corporation");
// Process a single document with all protections
var result = processor.ProcessDocument(
inputPath: "customer-application.pdf",
outputPath: "customer-application-redacted.pdf",
sanitize: true,
redactPatterns: true,
redactRegions: true,
cleanMetadata: true,
additionalTermsToRedact: new List<string> { "Project Alpha", "Internal Use Only" }
);
// Batch process multiple documents
string[] inputFiles = Directory.GetFiles("incoming", "*.pdf");
foreach (string file in inputFiles)
{
string outputFile = Path.Combine("processed", Path.GetFileName(file));
processor.ProcessDocument(file, outputFile);
}
}
}Imports IronPdf
Imports IronSoftware.Drawing
Imports System
Imports System.Collections.Generic
Imports System.IO
Imports System.Text.RegularExpressions
Public Class DocumentProcessingResult
Public Property OriginalFile As String
Public Property OutputFile As String
Public Property WasSanitized As Boolean
Public Property TextRedactionsApplied As Integer
Public Property RegionRedactionsApplied As Integer
Public Property MetadataCleaned As Boolean
Public Property SensitiveDataTypesFound As List(Of String) = New List(Of String)()
Public Property ProcessedAt As DateTime
Public Property Success As Boolean
Public Property ErrorMessage As String
End Class
Public Class ComprehensiveDocumentProcessor
' Sensitive data patterns
Private ReadOnly _sensitivePatterns As Dictionary(Of String, String) = New Dictionary(Of String, String) From {
{"SSN", "\b\d{3}-\d{2}-\d{4}\b"},
{"Credit Card", "\b(?:\d{4}[-\s]?){3}\d{1,4}\b"},
{"Email", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"},
{"Phone", "\b(?:\(\d{3}\)\s?|\d{3}[-.])\d{3}[-.]?\d{4}\b"}
}
' Standard regions to redact (signature areas, photo locations)
Private ReadOnly _standardRedactionRegions As List(Of RectangleF) = New List(Of RectangleF) From {
New RectangleF(72, 72, 200, 50), ' Bottom left signature
New RectangleF(350, 72, 200, 50) ' Bottom right signature
}
Private ReadOnly _organizationName As String
Public Sub New(organizationName As String)
_organizationName = organizationName
End Sub
Public Function ProcessDocument(
inputPath As String,
outputPath As String,
Optional sanitize As Boolean = True,
Optional redactPatterns As Boolean = True,
Optional redactRegions As Boolean = True,
Optional cleanMetadata As Boolean = True,
Optional additionalTermsToRedact As List(Of String) = Nothing) As DocumentProcessingResult
Dim result As New DocumentProcessingResult With {
.OriginalFile = inputPath,
.OutputFile = outputPath,
.ProcessedAt = DateTime.Now
}
Try
' Load the document
Dim pdf As PdfDocument = PdfDocument.FromFile(inputPath)
' Step 1: Security scan
Dim scanResult As CleanerScanResult = Cleaner.ScanPdf(pdf)
If scanResult.IsDetected AndAlso scanResult.Risks.Count > 10 Then
Throw New SecurityException("Document contains too many security risks to process")
End If
' Step 2: Sanitization (if needed or requested)
If sanitize OrElse scanResult.IsDetected Then
pdf = Cleaner.SanitizeWithSvg(pdf)
result.WasSanitized = True
End If
' Step 3: Pattern-based text redaction
If redactPatterns Then
Dim fullText As String = pdf.ExtractAllText()
Dim valuesToRedact As New HashSet(Of String)()
For Each pattern In _sensitivePatterns
Dim regex As New Regex(pattern.Value, RegexOptions.IgnoreCase)
Dim matches As MatchCollection = regex.Matches(fullText)
If matches.Count > 0 Then
result.SensitiveDataTypesFound.Add($"{pattern.Key} ({matches.Count})")
For Each match As Match In matches
valuesToRedact.Add(match.Value)
Next
End If
Next
' Apply redactions
For Each value As String In valuesToRedact
pdf.RedactTextOnAllPages(value)
result.TextRedactionsApplied += 1
Next
End If
' Step 4: Additional specific terms
If additionalTermsToRedact IsNot Nothing Then
For Each term As String In additionalTermsToRedact
pdf.RedactTextOnAllPages(term)
result.TextRedactionsApplied += 1
Next
End If
' Step 5: Region-based redaction
If redactRegions Then
For Each region As RectangleF In _standardRedactionRegions
pdf.RedactRegionsOnAllPages(region)
result.RegionRedactionsApplied += 1
Next
End If
' Step 6: Metadata cleaning
If cleanMetadata Then
pdf.MetaData.Author = _organizationName
pdf.MetaData.Creator = $"{_organizationName} Document Processor"
pdf.MetaData.Producer = ""
pdf.MetaData.Subject = ""
pdf.MetaData.Keywords = ""
pdf.MetaData.CreationDate = DateTime.Now
pdf.MetaData.ModifiedDate = DateTime.Now
result.MetadataCleaned = True
End If
' Step 7: Save the processed document
pdf.SaveAs(outputPath)
result.Success = True
Catch ex As Exception
result.Success = False
result.ErrorMessage = ex.Message
End Try
Return result
End Function
End Class
' Usage example
Class Program
Shared Sub Main()
Dim processor As New ComprehensiveDocumentProcessor("Acme Corporation")
' Process a single document with all protections
Dim result = processor.ProcessDocument(
inputPath:="customer-application.pdf",
outputPath:="customer-application-redacted.pdf",
sanitize:=True,
redactPatterns:=True,
redactRegions:=True,
cleanMetadata:=True,
additionalTermsToRedact:=New List(Of String) From {"Project Alpha", "Internal Use Only"}
)
' Batch process multiple documents
Dim inputFiles As String() = Directory.GetFiles("incoming", "*.pdf")
For Each file As String In inputFiles
Dim outputFile As String = Path.Combine("processed", Path.GetFileName(file))
processor.ProcessDocument(file, outputFile)
Next
End Sub
End ClassInput
A customer application form containing multiple types of sensitive data including SSNs, credit card numbers, email addresses, and signature blocks requiring comprehensive protection.
Sample Output
This comprehensive processor combines all the techniques covered in this guide into a single, configurable class. It scans for threats, sanitizes when necessary, finds and redacts sensitive patterns, applies region redactions, cleans metadata, and produces detailed reports. You can adjust the sensitivity patterns, redaction regions, and processing options to match your specific requirements.
Next Steps
Protecting sensitive information in PDF documents requires more than superficial measures. True redaction eliminates content from the document structure permanently. Pattern matching automates the discovery and removal of data like Social Security numbers, credit card details, and email addresses. Region based redaction handles signatures, photos, and other graphical elements that text matching cannot address. Metadata cleaning eliminates hidden information that could reveal authors, timestamps, or internal file paths. Sanitization strips embedded scripts and active content that pose security risks.
IronPDF delivers all these capabilities through a consistent, well designed API that integrates naturally with C# and .NET development practices. The methods demonstrated in this guide handle single documents or scale to batch processing thousands of files. Whether you are building compliance workflows for healthcare data, preparing legal documents for discovery, or simply ensuring that internal reports can be safely shared externally, these techniques form the foundation of responsible document handling. For comprehensive security coverage, combine redaction with password protection and permissions and digital signatures.
The patterns shown here serve as starting points. Real implementations will need adaptation for your specific document formats, compliance requirements, and security policies. Consider adding comprehensive logging for audit trails, implementing approval workflows for redaction decisions, and building dashboards to monitor processing metrics across your document pipeline.
Ready to start building? Download a free trial of IronPDF and explore the complete documentation and additional code examples at ironpdf.com. The library includes a free development license, so you can fully evaluate its capabilities before committing to a production license. For questions about implementation or licensing, the Iron Software engineering team provides responsive support through live chat and email.
Related Tutorials and How-Tos
- Complete PDF Security Tutorial - Comprehensive guide to signing, encrypting, and securing PDFs
- Redact Text and Regions How-To - Quick reference for redaction methods
- Extract Text and Images - Guide to text extraction for pattern matching
- Set and Edit Metadata - Complete metadata management reference
- Sanitize PDF - Additional sanitization options and examples
- PDF Passwords and Permissions - Protect documents with encryption
- Digital Signatures - Add legally binding signatures to documents
- Async and Multithreading - Optimize batch processing performance
Frequently Asked Questions
What is PDF redaction?
PDF redaction is the process of permanently removing sensitive information from a PDF document. This can include text, images, and metadata that need to be hidden for privacy or compliance reasons.
How can I redact information in a PDF using C#?
You can use IronPDF to redact information in a PDF using C#. It allows you to permanently remove or hide text, images, and metadata in PDF documents, ensuring they meet privacy and compliance standards.
Why is PDF redaction important for compliance?
PDF redaction is crucial for compliance with standards like HIPAA, GDPR, and PCI DSS, as it helps in securing sensitive data and preventing unauthorized access to confidential information.
Can IronPDF redact entire regions of a PDF?
Yes, IronPDF can redact entire regions of a PDF. This allows you to define specific areas within a document that need to be hidden or removed for security purposes.
What types of data can be redacted using IronPDF?
IronPDF can redact various types of data including text, images, and metadata from PDF documents, ensuring comprehensive data privacy and security.
Does IronPDF support sanitizing documents?
Yes, IronPDF supports sanitizing documents, which involves cleaning up a PDF to remove hidden data or metadata that may not be visible but could still pose a privacy risk.
Is it possible to automate PDF redaction with IronPDF?
Yes, IronPDF allows for the automation of PDF redaction processes in C#, making it easier to handle large volumes of documents that require sensitive data removal.
How does IronPDF ensure the permanency of redaction?
IronPDF ensures the permanency of redaction by permanently removing the selected text and images from the document, rather than merely obscuring them, which means they cannot be recovered or viewed.
Can IronPDF redact metadata in a PDF?
Yes, IronPDF can redact metadata in a PDF document, ensuring that all forms of sensitive data, including hidden or background data, are thoroughly removed.
What are the benefits of using IronPDF for PDF redaction?
Using IronPDF for PDF redaction offers benefits such as ensuring compliance with data protection regulations, enhancing document security, and providing an efficient, automated process for managing sensitive information.






