How to Convert PDF to HTML
Converting PDF to HTML offers various benefits, including enhanced web accessibility for users, responsiveness for different devices, improved search engine optimization (SEO), seamless web integration, easy content editing through web-based tools and CMS, cross-platform compatibility, and the ability to utilize dynamic elements and multimedia.
IronPdf simplifies the process of converting PDF to HTML in .NET C#.
Go from PDF to HTML in one line of code:
new IronPdf.PdfDocument()
.FromFile("example.pdf")
.SaveAsHtml("output.html");
Get started with IronPDF
Start using IronPDF in your project today with a free trial.
How to Convert PDF to HTML
- Download the IronPdf Library for .NET
- Import an existing PDF document using the
FromFile
method - Configure the output HTML using the HtmlFormatOptions class
- Convert the PDF to an HTML string using the
ToHtmlString
method - Export the HTML file using the
SaveAsHtml
method
PDF to HTML Example
The ToHtmlString
method is primarily designed to allow users to analyze HTML elements in an existing PDF document. It serves as a useful tool for debugging or PDF comparison purposes. In addition to converting a PDF document to an HTML string, we offer a direct method for users to save a PDF document as an HTML file using the SaveAsHtml
method. This provides flexibility for users to choose the most suitable approach based on their specific needs.
Please note
Sample PDF File
:path=/static-assets/pdf/content-code-examples/how-to/pdf-to-html.cs
using IronPdf;
using System;
// This code demonstrates how to convert a PDF document to an HTML string using the IronPdf library
// Load a PDF document from a file. Ensure the file path is correct and the file exists.
PdfDocument pdf = PdfDocument.FromFile("sample.pdf");
// Convert the loaded PDF to an HTML string representation. This method extracts the content of the PDF and represents it as HTML.
string html = pdf.ToHtmlString();
// Output the HTML string to the console. This will display the HTML representation of the PDF in the console.
Console.WriteLine(html);
// Save the HTML representation of the PDF to an HTML file. This will create a file named "myHtml.html" in the current working directory.
pdf.SaveAsHtml("myHtml.html");
Imports IronPdf
Imports System
' This code demonstrates how to convert a PDF document to an HTML string using the IronPdf library
' Load a PDF document from a file. Ensure the file path is correct and the file exists.
Private pdf As PdfDocument = PdfDocument.FromFile("sample.pdf")
' Convert the loaded PDF to an HTML string representation. This method extracts the content of the PDF and represents it as HTML.
Private html As String = pdf.ToHtmlString()
' Output the HTML string to the console. This will display the HTML representation of the PDF in the console.
Console.WriteLine(html)
' Save the HTML representation of the PDF to an HTML file. This will create a file named "myHtml.html" in the current working directory.
pdf.SaveAsHtml("myHtml.html")
Output HTML
The entire output HTML generated from the SaveAsHtml
method has been input into the website below.
PDF to HTML Advanced Example
Both the ToHtmlString
and SaveAsHtml
methods offer various configuration options. Below are the available properties:
- BackgroundColor: Specifies the background color.
- PdfPageMargin: Specifies the page margin.
Additionally, the properties below are intended for the 'title' parameter in the ToHtmlString
and SaveAsHtml
methods. This will add a new title at the beginning of the content. They will not modify the title or h1 of the input PDF document.
- H1Color: Specifies the title color.
- H1FontSize: Specifies the title font size.
- H1TextAlignment: Specifies the title alignment, such as left, center, or right.
:path=/static-assets/pdf/content-code-examples/how-to/pdf-to-html-advanced-settings.cs
using IronPdf;
using IronSoftware.Drawing; // IronSoftware.Drawing for additional drawing utilities
using System.Drawing; // System.Drawing for color specification
using System;
// Load a PDF document from a file path.
// Replace "sample.pdf" with the path to your PDF file.
PdfDocument pdf = PdfDocument.FromFile("sample.pdf");
// Create a new PDF to HTML conversion configuration options.
HtmlFormatOptions htmlformat = new HtmlFormatOptions
{
// Set the background color for the HTML output.
BackgroundColor = Color.White,
// Set a margin size around each converted PDF page.
PdfPageMargin = 10,
// Set color and style for H1 elements in the converted HTML.
H1Color = Color.Blue,
H1FontSize = 25,
H1TextAlignment = TextAlignment.Center
};
// Convert the PDF document to an HTML string representation.
string html = pdf.ToHtmlString();
// Output the HTML string to the console to verify the conversion.
Console.WriteLine(html);
// Save the HTML representation of the PDF to a file with the specified configuration options.
// The title "Hello World" is added to the HTML file.
pdf.SaveAsHtml("myHtmlConfigured.html", true, "Hello World", htmlFormatOptions: htmlformat);
Imports IronPdf
Imports IronSoftware.Drawing ' IronSoftware.Drawing for additional drawing utilities
Imports System.Drawing ' System.Drawing for color specification
Imports System
' Load a PDF document from a file path.
' Replace "sample.pdf" with the path to your PDF file.
Private pdf As PdfDocument = PdfDocument.FromFile("sample.pdf")
' Create a new PDF to HTML conversion configuration options.
Private htmlformat As New HtmlFormatOptions With {
.BackgroundColor = Color.White,
.PdfPageMargin = 10,
.H1Color = Color.Blue,
.H1FontSize = 25,
.H1TextAlignment = TextAlignment.Center
}
' Convert the PDF document to an HTML string representation.
Private html As String = pdf.ToHtmlString()
' Output the HTML string to the console to verify the conversion.
Console.WriteLine(html)
' Save the HTML representation of the PDF to a file with the specified configuration options.
' The title "Hello World" is added to the HTML file.
pdf.SaveAsHtml("myHtmlConfigured.html", True, "Hello World", htmlFormatOptions:= htmlformat)
Output HTML
The entire output HTML generated from the SaveAsHtml
method has been input into the website below.
These methods will produce an HTML string with inline CSS. The output HTML uses SVG terms/tags instead of the usual HTML tags. Despite this difference, it is a valid HTML string and can be rendered the same way in a web browser. However, it's important for users to be aware that the returned HTML string from this method may differ from the HTML input when using a PDF document that was rendered using the RenderHtmlAsPdf
method, due to the reasons mentioned above.
Frequently Asked Questions
What are the benefits of converting PDF to HTML?
Converting PDF to HTML offers enhanced web accessibility, responsiveness for different devices, improved SEO, seamless web integration, easy content editing through web-based tools and CMS, cross-platform compatibility, and the ability to utilize dynamic elements and multimedia.
How can I convert a PDF to HTML?
You can convert a PDF to HTML using IronPDF by loading the PDF file with the FromFile method and saving it as an HTML file with the SaveAsHtml method in .NET C#.
What is the ToHtmlString method used for?
The ToHtmlString method is used for converting a PDF document to an HTML string, which is useful for analyzing HTML elements, debugging, or PDF comparison purposes.
Can interactive form fields in a PDF be preserved in HTML?
No, interactive form fields in the original PDF will no longer be functional in the resulting HTML document.
What configuration options are available for PDF to HTML conversion?
Configuration options include BackgroundColor, PdfPageMargin, H1Color, H1FontSize, and H1TextAlignment, which allow customization of the output HTML's appearance.
Does the HTML output use standard HTML tags?
The output HTML uses SVG terms/tags instead of the usual HTML tags, but it is a valid HTML string that can be rendered in web browsers.
Where can I download the necessary library for .NET?
You can download IronPDF for .NET from the NuGet package manager at https://www.nuget.org/packages/IronPdf/.
How do I start using a library for PDF to HTML conversion?
To start using IronPDF, download the library, import an existing PDF document using the FromFile method, configure the output HTML with HtmlFormatOptions, convert the PDF to an HTML string with ToHtmlString, and export the HTML file using SaveAsHtml.
What is the purpose of configuring the title parameter in HTML conversion?
The title parameter in the ToHtmlString and SaveAsHtml methods is used to add a new title at the beginning of the content, without modifying the title or h1 of the input PDF document.
Is the HTML output from PDF conversion different from HTML input?
Yes, the returned HTML string may differ from the HTML input when using a PDF document rendered with the RenderHtmlAsPdf method due to differences in rendering.