How to convert PDF to HTML

by Hairil Hasyimi Bin Omar

Converting PDF to HTML offers various benefits, including enhanced web accessibility for users, responsiveness for different devices, improved search engine optimization (SEO), seamless web integration, easy content editing through web-based tools and CMS, cross-platform compatibility, and the ability to utilize dynamic elements and multimedia.

IronPdf simplifies the process of converting PDF to HTML in .NET C#.


C# NuGet Library for PDF

Install with NuGet

Install-Package IronPdf
or
C# PDF DLL

Download DLL

Download DLL

Manually install into your project

PDF to HTML example

The ToHtmlString method is primarily designed to allow users to analyze HTML elements in an existing PDF document. It serves as a useful tool for debugging or PDF comparison purposes. In addition to converting a PDF document to an HTML string, we offer a direct method for users to save a PDF document as an HTML file using the SaveAsHtml method. This provides flexibility for users to choose the most suitable approach based on their specific needs.

Please note
The interactive form filling in the original PDF will no longer be functional in the resulting HTML document.

Sample PDF file

:path=/static-assets/pdf/content-code-examples/how-to/pdf-to-html.cs
using IronPdf;
using System;

PdfDocument pdf = PdfDocument.FromFile("sample.pdf");

// Convert PDF to HTML string
string html = pdf.ToHtmlString();
Console.WriteLine(html);

// Convert PDF to HTML file
pdf.SaveAsHtml("myHtml.html");
IRON VB CONVERTER ERROR developers@ironsoftware.com
VB   C#

Output Html

The entire output HTML generated from the SaveAsHtml method has been input into the website below.


PDF to HTML Advanced example

Both the ToHtmlString and SaveAsHtml methods offer various configuration options. Below are the available properties:

  • BackgroundColor: Specifies the background color.
  • PdfPageMargin: Specifies the page margin.

Additionally, the properties below are meant for the title parameter in the ToHtmlString and SaveAsHtml methods. They will not modify the title content in the input PDF document:

  • H1Color: Specifies the title color.
  • H1FontSize: Specifies the title font size.
  • H1TextAlignment: Specifies the title alignment, such as left, center, or right.
:path=/static-assets/pdf/content-code-examples/how-to/pdf-to-html-advanced-settings.cs
using IronPdf;
using IronSoftware.Drawing;
using System;

PdfDocument pdf = PdfDocument.FromFile("sample.pdf");

// PDF to HTML configuration options
HtmlFormatOptions htmlformat = new HtmlFormatOptions();
htmlformat.BackgroundColor = Color.White;
htmlformat.PdfPageMargin = 10;
htmlformat.H1Color = Color.Blue;
htmlformat.H1FontSize = 25;
htmlformat.H1TextAlignment = TextAlignment.Center;

// Convert PDF to HTML string
string html = pdf.ToHtmlString();
Console.WriteLine(html);

// Convert PDF to HTML file
pdf.SaveAsHtml("myHtmlConfigured.html", true, "Hello World", htmlFormatOptions: htmlformat);
IRON VB CONVERTER ERROR developers@ironsoftware.com
VB   C#

Output Html

The entire output HTML generated from the SaveAsHtml method has been input into the website below.

These methods will produce an HTML string with inline CSS. The output HTML uses SVG terms/tags instead of the usual HTML tags. Despite this difference, it is a valid HTML string and can be rendered the same way in a web browser. However, it's important for users to be aware that the returned HTML string from this method may differ from the HTML input when using a PDF document that was rendered using the RenderHtmlAsPdf method, due to the reasons mentioned above.