How to Convert PDF to HTML

Converting PDF to HTML offers various benefits, including enhanced web accessibility for users, responsiveness for different devices, improved search engine optimization (SEO), seamless web integration, easy content editing through web-based tools and CMS, cross-platform compatibility, and the ability to utilize dynamic elements and multimedia.

IronPdf simplifies the process of converting PDF to HTML in .NET C#.

Go from PDF to HTML in one line of code:

new IronPdf.PdfDocument()
       .FromFile("example.pdf")
       .SaveAsHtml("output.html");
Install with NuGet
green arrow pointer

PM >  Install-Package IronPdf


Get started with IronPDF

Start using IronPDF in your project today with a free trial.

First Step:
green arrow pointer



PDF to HTML Example

The ToHtmlString method is primarily designed to allow users to analyze HTML elements in an existing PDF document. It serves as a useful tool for debugging or PDF comparison purposes. In addition to converting a PDF document to an HTML string, we offer a direct method for users to save a PDF document as an HTML file using the SaveAsHtml method. This provides flexibility for users to choose the most suitable approach based on their specific needs.

Please note
Note: All interactive form fields in the original PDF will no longer be functional in the resulting HTML document.

Sample PDF File

:path=/static-assets/pdf/content-code-examples/how-to/pdf-to-html.cs
using IronPdf;
using System;

// This code demonstrates how to convert a PDF document to an HTML string using the IronPdf library

// Load a PDF document from a file. Ensure the file path is correct and the file exists.
PdfDocument pdf = PdfDocument.FromFile("sample.pdf");

// Convert the loaded PDF to an HTML string representation. This method extracts the content of the PDF and represents it as HTML.
string html = pdf.ToHtmlString();

// Output the HTML string to the console. This will display the HTML representation of the PDF in the console.
Console.WriteLine(html);

// Save the HTML representation of the PDF to an HTML file. This will create a file named "myHtml.html" in the current working directory.
pdf.SaveAsHtml("myHtml.html");
Imports IronPdf
Imports System

' This code demonstrates how to convert a PDF document to an HTML string using the IronPdf library

' Load a PDF document from a file. Ensure the file path is correct and the file exists.
Private pdf As PdfDocument = PdfDocument.FromFile("sample.pdf")

' Convert the loaded PDF to an HTML string representation. This method extracts the content of the PDF and represents it as HTML.
Private html As String = pdf.ToHtmlString()

' Output the HTML string to the console. This will display the HTML representation of the PDF in the console.
Console.WriteLine(html)

' Save the HTML representation of the PDF to an HTML file. This will create a file named "myHtml.html" in the current working directory.
pdf.SaveAsHtml("myHtml.html")
$vbLabelText   $csharpLabel

Output HTML

The entire output HTML generated from the SaveAsHtml method has been input into the website below.


PDF to HTML Advanced Example

Both the ToHtmlString and SaveAsHtml methods offer various configuration options. Below are the available properties:

  • BackgroundColor: Specifies the background color.
  • PdfPageMargin: Specifies the page margin.

Additionally, the properties below are intended for the 'title' parameter in the ToHtmlString and SaveAsHtml methods. This will add a new title at the beginning of the content. They will not modify the title or h1 of the input PDF document.

  • H1Color: Specifies the title color.
  • H1FontSize: Specifies the title font size.
  • H1TextAlignment: Specifies the title alignment, such as left, center, or right.
:path=/static-assets/pdf/content-code-examples/how-to/pdf-to-html-advanced-settings.cs
using IronPdf;
using IronSoftware.Drawing; // IronSoftware.Drawing for additional drawing utilities
using System.Drawing;       // System.Drawing for color specification
using System;

// Load a PDF document from a file path.
// Replace "sample.pdf" with the path to your PDF file.
PdfDocument pdf = PdfDocument.FromFile("sample.pdf");

// Create a new PDF to HTML conversion configuration options.
HtmlFormatOptions htmlformat = new HtmlFormatOptions
{
    // Set the background color for the HTML output.
    BackgroundColor = Color.White,
    
    // Set a margin size around each converted PDF page.
    PdfPageMargin = 10,
    
    // Set color and style for H1 elements in the converted HTML.
    H1Color = Color.Blue,
    H1FontSize = 25,
    H1TextAlignment = TextAlignment.Center
};

// Convert the PDF document to an HTML string representation.
string html = pdf.ToHtmlString();

// Output the HTML string to the console to verify the conversion.
Console.WriteLine(html);

// Save the HTML representation of the PDF to a file with the specified configuration options.
// The title "Hello World" is added to the HTML file.
pdf.SaveAsHtml("myHtmlConfigured.html", true, "Hello World", htmlFormatOptions: htmlformat);
Imports IronPdf
Imports IronSoftware.Drawing ' IronSoftware.Drawing for additional drawing utilities
Imports System.Drawing ' System.Drawing for color specification
Imports System

' Load a PDF document from a file path.
' Replace "sample.pdf" with the path to your PDF file.
Private pdf As PdfDocument = PdfDocument.FromFile("sample.pdf")

' Create a new PDF to HTML conversion configuration options.
Private htmlformat As New HtmlFormatOptions With {
	.BackgroundColor = Color.White,
	.PdfPageMargin = 10,
	.H1Color = Color.Blue,
	.H1FontSize = 25,
	.H1TextAlignment = TextAlignment.Center
}

' Convert the PDF document to an HTML string representation.
Private html As String = pdf.ToHtmlString()

' Output the HTML string to the console to verify the conversion.
Console.WriteLine(html)

' Save the HTML representation of the PDF to a file with the specified configuration options.
' The title "Hello World" is added to the HTML file.
pdf.SaveAsHtml("myHtmlConfigured.html", True, "Hello World", htmlFormatOptions:= htmlformat)
$vbLabelText   $csharpLabel

Output HTML

The entire output HTML generated from the SaveAsHtml method has been input into the website below.

These methods will produce an HTML string with inline CSS. The output HTML uses SVG terms/tags instead of the usual HTML tags. Despite this difference, it is a valid HTML string and can be rendered the same way in a web browser. However, it's important for users to be aware that the returned HTML string from this method may differ from the HTML input when using a PDF document that was rendered using the RenderHtmlAsPdf method, due to the reasons mentioned above.

Frequently Asked Questions

What are the benefits of converting PDF to HTML?

Converting PDF to HTML offers enhanced web accessibility, responsiveness for different devices, improved SEO, seamless web integration, easy content editing through web-based tools and CMS, cross-platform compatibility, and the ability to utilize dynamic elements and multimedia.

How can I convert a PDF to HTML?

You can convert a PDF to HTML using IronPDF by loading the PDF file with the FromFile method and saving it as an HTML file with the SaveAsHtml method in .NET C#.

What is the ToHtmlString method used for?

The ToHtmlString method is used for converting a PDF document to an HTML string, which is useful for analyzing HTML elements, debugging, or PDF comparison purposes.

Can interactive form fields in a PDF be preserved in HTML?

No, interactive form fields in the original PDF will no longer be functional in the resulting HTML document.

What configuration options are available for PDF to HTML conversion?

Configuration options include BackgroundColor, PdfPageMargin, H1Color, H1FontSize, and H1TextAlignment, which allow customization of the output HTML's appearance.

Does the HTML output use standard HTML tags?

The output HTML uses SVG terms/tags instead of the usual HTML tags, but it is a valid HTML string that can be rendered in web browsers.

Where can I download the necessary library for .NET?

You can download IronPDF for .NET from the NuGet package manager at https://www.nuget.org/packages/IronPdf/.

How do I start using a library for PDF to HTML conversion?

To start using IronPDF, download the library, import an existing PDF document using the FromFile method, configure the output HTML with HtmlFormatOptions, convert the PDF to an HTML string with ToHtmlString, and export the HTML file using SaveAsHtml.

What is the purpose of configuring the title parameter in HTML conversion?

The title parameter in the ToHtmlString and SaveAsHtml methods is used to add a new title at the beginning of the content, without modifying the title or h1 of the input PDF document.

Is the HTML output from PDF conversion different from HTML input?

Yes, the returned HTML string may differ from the HTML input when using a PDF document rendered with the RenderHtmlAsPdf method due to differences in rendering.

Regan Pun
Software Engineer
Regan graduated from the University of Reading, with a BA in Electronic Engineering. Before joining Iron Software, his previous job roles had him laser-focused on single tasks; and what he most enjoys at Iron Software is the spectrum of work he gets to undertake, whether it’s adding value to sales, technical support, product development or marketing. He enjoys understanding the way developers are using the Iron Software library, and using that knowledge to continually improve documentation and develop the products.