Skip to footer content
USING IRONPDF

Sanitize and Scan Untrusted PDFs in .NET

A user uploads a PDF to your portal, and you have no idea what is inside it. IronPDF sanitizes that file by converting it to an image and back, stripping embedded JavaScript, active objects, buttons, and metadata, then rebuilds a clean document. The Cleaner class exposes SanitizeWithBitmap and SanitizeWithSvg to clean, and ScanPdf to detect threats. One trade-off governs the choice: bitmap is more thorough but loses text searchability, while SVG stays searchable but layout may drift. You are trading fidelity for safety, and deciding how much.

Defanging PDFs from untrusted sources

Any app accepting uploaded PDFs (a portal, an applicant tracker, a claims system) can clean incoming files so embedded scripts and active content never reach a viewer:

using IronPdf;

var pdf = PdfDocument.FromFile("upload.pdf");
Cleaner.SanitizeWithSvg(pdf).SaveAs("upload-clean.pdf");
using IronPdf;

var pdf = PdfDocument.FromFile("upload.pdf");
Cleaner.SanitizeWithSvg(pdf).SaveAs("upload-clean.pdf");
Imports IronPdf

Dim pdf = PdfDocument.FromFile("upload.pdf")
Cleaner.SanitizeWithSvg(pdf).SaveAs("upload-clean.pdf")
$vbLabelText   $csharpLabel

Scanning before you process

ScanPdf checks a file against a default YARA ruleset (or a custom one) and reports whether anything dangerous was found, so a pipeline can quarantine or reject before storing or opening:

var pdf = PdfDocument.FromFile("upload.pdf");
CleanerScanResult result = Cleaner.ScanPdf(pdf);

if (result.IsDetected)
{
    // quarantine: result.Risks holds the detections
}
var pdf = PdfDocument.FromFile("upload.pdf");
CleanerScanResult result = Cleaner.ScanPdf(pdf);

if (result.IsDetected)
{
    // quarantine: result.Risks holds the detections
}
Imports System

Dim pdf = PdfDocument.FromFile("upload.pdf")
Dim result As CleanerScanResult = Cleaner.ScanPdf(pdf)

If result.IsDetected Then
    ' quarantine: result.Risks holds the detections
End If
$vbLabelText   $csharpLabel

It detects embedded JavaScript exploits, suspicious form actions, hidden content, known vulnerability patterns, and unauthorized embedded files. Note that ScanPdf only detects; cleaning is done by the SanitizeWith methods.

Hardening documents for cloud deployment

Stripping active content before a file enters a serverless or shared environment (Azure, AWS Lambda) reduces the attack surface there. Sanitize on the way in, not after the file already runs somewhere sensitive.

Stripping metadata before sharing

Because the document is rebuilt from an image, hidden author names, revision traces, and other metadata do not survive. Useful before publishing or sending a file beyond the organization.

Producing a clean baseline to sign or archive

Sanitizing before a digital signature or PDF/A archival gives you a script-free starting point. You rarely want to attest to, or preserve for decades, a document carrying active content.

Keeping text when you need it

When downstream steps need the words (search, indexing, accessibility, extraction), use SVG: it removes the active layer while keeping the PDF searchable, accepting possible layout drift.

What sanitization will not do

Sanitization is not redaction. It removes hidden and active threats, not visible information. Names, account numbers, and anything a human reads survive in full. To remove sensitive content, this is the wrong tool; use redaction.

And bitmap output is an image. It is unsearchable and inaccessible: a screen reader gets nothing from it, which conflicts with accessibility goals like PDF/UA. Reach for bitmap only when maximum security outweighs searchability, and use SVG when text must survive.

Treat it as a security step

Sanitization belongs to your security layer, not your formatting one. Run it on anything you did not generate yourself, scan first, keep the original, and pick bitmap or SVG by whether the text needs to live.

Frequently Asked Questions

How does IronPDF sanitize an uploaded PDF?

IronPDF's Cleaner class converts the PDF to an image and rebuilds it as a clean document, stripping embedded JavaScript, active objects, form buttons, and metadata. Use Cleaner.SanitizeWithBitmap for maximum thoroughness or Cleaner.SanitizeWithSvg to retain text searchability.

What is the difference between SanitizeWithBitmap and SanitizeWithSvg?

SanitizeWithBitmap renders the PDF to a pixel image and rebuilds from it — the most thorough option but produces an unsearchable, inaccessible document. SanitizeWithSvg removes the active content layer while keeping text searchable, but layout may drift slightly.

Can IronPDF scan a PDF for threats before I process it?

Yes. Cleaner.ScanPdf checks the file against a default YARA ruleset (or a custom one) and returns a CleanerScanResult indicating whether any threats were detected and what they are. Scanning only detects — call a SanitizeWith method to clean the file.

What threats does ScanPdf detect?

ScanPdf detects embedded JavaScript exploits, suspicious form actions, hidden content, known vulnerability patterns, and unauthorized embedded files.

Does sanitization remove sensitive visible information like names or account numbers?

No. Sanitization removes hidden and active threats, not visible content. Names, account numbers, and all human-readable text survive. To remove visible sensitive information, use redaction — sanitization is the wrong tool for that.

Curtis Chau
Technical Writer

Curtis Chau holds a Bachelor’s degree in Computer Science (Carleton University) and specializes in front-end development with expertise in Node.js, TypeScript, JavaScript, and React. Passionate about crafting intuitive and aesthetically pleasing user interfaces, Curtis enjoys working with modern frameworks and creating well-structured, visually appealing manuals.

...

Read More

Iron Support Team

We're online 24 hours, 5 days a week.
Chat
Email
Call Me