How to Convert PDF to HTML
Converting PDF to HTML offers various benefits, including enhanced web accessibility for users, responsiveness for different devices, improved search engine optimization (SEO), seamless web integration, easy content editing through web-based tools and CMS, cross-platform compatibility, and the ability to utilize dynamic elements and multimedia.
IronPdf simplifies the process of converting PDF to HTML in .NET C#.
How to Convert PDF to HTML
- Download the C# library to convert PDF to HTML
- Import an existing PDF document using the
FromFile
method - Configure the output HTML with the HtmlFormatOptions class
- Convert the PDF to an HTML string with the
ToHtmlString
method - Export the HTML file from the PDF using the
SaveAsHtml
method
Install with NuGet
Install-Package IronPdf
Download DLL
Manually install into your project
PDF to HTML example
The ToHtmlString
method is primarily designed to allow users to analyze HTML elements in an existing PDF document. It serves as a useful tool for debugging or PDF comparison purposes. In addition to converting a PDF document to HTML string, we offer a direct method for users to save a PDF document as HTML file using the SaveAsHtml
method. This provides flexibility for users to choose the most suitable approach based on their specific needs.
Please note
Sample PDF file
:path=/static-assets/pdf/content-code-examples/how-to/pdf-to-html.cs
using IronPdf;
using System;
PdfDocument pdf = PdfDocument.FromFile("sample.pdf");
// Convert PDF to HTML string
string html = pdf.ToHtmlString();
Console.WriteLine(html);
// Convert PDF to HTML file
pdf.SaveAsHtml("myHtml.html");
IRON VB CONVERTER ERROR developers@ironsoftware.com
Output Html
The entire output HTML generated from the SaveAsHtml
method has been input into the website below.
PDF to HTML Advanced example
Both the ToHtmlString
and SaveAsHtml
methods offer various configuration options. Below are the available properties:
- BackgroundColor: Specifies the background color.
- PdfPageMargin: Specifies the page margin.
Additionally, the properties below are intended for the 'title' parameter in the ToHtmlString
and SaveAsHtml
methods. This will add a new title at the beginning of the content. They will not modify the title or h1 of the input PDF document.
- H1Color: Specifies the title color.
- H1FontSize: Specifies the title font size.
- H1TextAlignment: Specifies the title alignment, such as left, center, or right.
:path=/static-assets/pdf/content-code-examples/how-to/pdf-to-html-advanced-settings.cs
using IronPdf;
using IronSoftware.Drawing;
using System;
PdfDocument pdf = PdfDocument.FromFile("sample.pdf");
// PDF to HTML configuration options
HtmlFormatOptions htmlformat = new HtmlFormatOptions();
htmlformat.BackgroundColor = Color.White;
htmlformat.PdfPageMargin = 10;
htmlformat.H1Color = Color.Blue;
htmlformat.H1FontSize = 25;
htmlformat.H1TextAlignment = TextAlignment.Center;
// Convert PDF to HTML string
string html = pdf.ToHtmlString();
Console.WriteLine(html);
// Convert PDF to HTML file
pdf.SaveAsHtml("myHtmlConfigured.html", true, "Hello World", htmlFormatOptions: htmlformat);
IRON VB CONVERTER ERROR developers@ironsoftware.com
Output Html
The entire output HTML generated from the SaveAsHtml
method has been input into the website below.
These methods will produce an HTML string with inline CSS. The output HTML uses SVG terms/tags instead of the usual HTML tags. Despite this difference, it is a valid HTML string and can be rendered the same way in a web browser. However, it's important for users to be aware that the returned HTML string from this method may differ from the HTML input when using a PDF document that was rendered using the RenderHtmlAsPdf
method, due to the reasons mentioned above.