Sanitize and Scan Untrusted PDFs in .NET
A user uploads a PDF to your portal, and you have no idea what is inside it. IronPDF sanitizes that file by converting it to an image and back, stripping embedded JavaScript, active objects, buttons, and metadata, then rebuilds a clean document. The Cleaner class exposes SanitizeWithBitmap and SanitizeWithSvg to clean, and ScanPdf to detect threats. One trade-off governs the choice: bitmap is more thorough but loses text searchability, while SVG stays searchable but layout may drift. You are trading fidelity for safety, and deciding how much.
Defanging PDFs from untrusted sources
Any app accepting uploaded PDFs (a portal, an applicant tracker, a claims system) can clean incoming files so embedded scripts and active content never reach a viewer:
using IronPdf;
var pdf = PdfDocument.FromFile("upload.pdf");
Cleaner.SanitizeWithSvg(pdf).SaveAs("upload-clean.pdf");
using IronPdf;
var pdf = PdfDocument.FromFile("upload.pdf");
Cleaner.SanitizeWithSvg(pdf).SaveAs("upload-clean.pdf");
Imports IronPdf
Dim pdf = PdfDocument.FromFile("upload.pdf")
Cleaner.SanitizeWithSvg(pdf).SaveAs("upload-clean.pdf")
Scanning before you process
ScanPdf checks a file against a default YARA ruleset (or a custom one) and reports whether anything dangerous was found, so a pipeline can quarantine or reject before storing or opening:
var pdf = PdfDocument.FromFile("upload.pdf");
CleanerScanResult result = Cleaner.ScanPdf(pdf);
if (result.IsDetected)
{
// quarantine: result.Risks holds the detections
}
var pdf = PdfDocument.FromFile("upload.pdf");
CleanerScanResult result = Cleaner.ScanPdf(pdf);
if (result.IsDetected)
{
// quarantine: result.Risks holds the detections
}
Imports System
Dim pdf = PdfDocument.FromFile("upload.pdf")
Dim result As CleanerScanResult = Cleaner.ScanPdf(pdf)
If result.IsDetected Then
' quarantine: result.Risks holds the detections
End If
It detects embedded JavaScript exploits, suspicious form actions, hidden content, known vulnerability patterns, and unauthorized embedded files. Note that ScanPdf only detects; cleaning is done by the SanitizeWith methods.
Hardening documents for cloud deployment
Stripping active content before a file enters a serverless or shared environment (Azure, AWS Lambda) reduces the attack surface there. Sanitize on the way in, not after the file already runs somewhere sensitive.
Stripping metadata before sharing
Because the document is rebuilt from an image, hidden author names, revision traces, and other metadata do not survive. Useful before publishing or sending a file beyond the organization.
Producing a clean baseline to sign or archive
Sanitizing before a digital signature or PDF/A archival gives you a script-free starting point. You rarely want to attest to, or preserve for decades, a document carrying active content.
Keeping text when you need it
When downstream steps need the words (search, indexing, accessibility, extraction), use SVG: it removes the active layer while keeping the PDF searchable, accepting possible layout drift.
What sanitization will not do
Sanitization is not redaction. It removes hidden and active threats, not visible information. Names, account numbers, and anything a human reads survive in full. To remove sensitive content, this is the wrong tool; use redaction.
And bitmap output is an image. It is unsearchable and inaccessible: a screen reader gets nothing from it, which conflicts with accessibility goals like PDF/UA. Reach for bitmap only when maximum security outweighs searchability, and use SVG when text must survive.
Treat it as a security step
Sanitization belongs to your security layer, not your formatting one. Run it on anything you did not generate yourself, scan first, keep the original, and pick bitmap or SVG by whether the text needs to live.
Frequently Asked Questions
How does IronPDF sanitize an uploaded PDF?
IronPDF's Cleaner class converts the PDF to an image and rebuilds it as a clean document, stripping embedded JavaScript, active objects, form buttons, and metadata. Use Cleaner.SanitizeWithBitmap for maximum thoroughness or Cleaner.SanitizeWithSvg to retain text searchability.
What is the difference between SanitizeWithBitmap and SanitizeWithSvg?
SanitizeWithBitmap renders the PDF to a pixel image and rebuilds from it — the most thorough option but produces an unsearchable, inaccessible document. SanitizeWithSvg removes the active content layer while keeping text searchable, but layout may drift slightly.
Can IronPDF scan a PDF for threats before I process it?
Yes. Cleaner.ScanPdf checks the file against a default YARA ruleset (or a custom one) and returns a CleanerScanResult indicating whether any threats were detected and what they are. Scanning only detects — call a SanitizeWith method to clean the file.
What threats does ScanPdf detect?
ScanPdf detects embedded JavaScript exploits, suspicious form actions, hidden content, known vulnerability patterns, and unauthorized embedded files.
Does sanitization remove sensitive visible information like names or account numbers?
No. Sanitization removes hidden and active threats, not visible content. Names, account numbers, and all human-readable text survive. To remove visible sensitive information, use redaction — sanitization is the wrong tool for that.

