PDF to HTML
Just as IronPDF can handle generating pixel-perfect PDF files from HTML content, it can also be used to convert PDF documents into HTML. Through the use of the PdfDocument
and HtmlFormatOptions
classes, users will have access to the methods necessary for converting the PDF to HTML and control over how the final HTML content will be formatted.
5 Steps to Converting PDF to HTML
Here's a step-by-step code example illustrating how to perform the conversion:
using IronPdf; // Import the IronPDF library
public class PdfToHtmlConverter
{
public void ConvertPdfToHtml()
{
// Step 1: Load the PDF file into a PdfDocument object
PdfDocument pdf = PdfDocument.FromFile("sample.pdf");
// Step 2: Convert the PDF to a simple HTML string
string html = pdf.ToHtmlString();
// Optionally, display HTML to the console (or handle as needed)
// Console.WriteLine(html);
// Step 3: Save the HTML string to a file
pdf.SaveAsHtml("myHtml.html");
// Step 4: Create HtmlFormatOptions for custom configurations
HtmlFormatOptions htmlFormat = new HtmlFormatOptions
{
BackgroundColor = IronSoftware.Drawing.Color.White, // Set background color to white
H1Color = IronSoftware.Drawing.Color.Blue, // Set H1 text color to blue
H1FontSizePx = 25, // Set H1 font size to 25 pixels
H1TextAlign = HtmlAlignment.Center, // Set H1 text alignment to center
PageMarginsPx = 10 // Set page margins to 10 pixels
};
// Step 5: Save the PDF to a HTML file with specific options
pdf.SaveAsHtml("myHtmlConfigured.html", fullContentWidth: true, title: "Hello World", htmlFormatOptions: htmlFormat);
}
}
using IronPdf; // Import the IronPDF library
public class PdfToHtmlConverter
{
public void ConvertPdfToHtml()
{
// Step 1: Load the PDF file into a PdfDocument object
PdfDocument pdf = PdfDocument.FromFile("sample.pdf");
// Step 2: Convert the PDF to a simple HTML string
string html = pdf.ToHtmlString();
// Optionally, display HTML to the console (or handle as needed)
// Console.WriteLine(html);
// Step 3: Save the HTML string to a file
pdf.SaveAsHtml("myHtml.html");
// Step 4: Create HtmlFormatOptions for custom configurations
HtmlFormatOptions htmlFormat = new HtmlFormatOptions
{
BackgroundColor = IronSoftware.Drawing.Color.White, // Set background color to white
H1Color = IronSoftware.Drawing.Color.Blue, // Set H1 text color to blue
H1FontSizePx = 25, // Set H1 font size to 25 pixels
H1TextAlign = HtmlAlignment.Center, // Set H1 text alignment to center
PageMarginsPx = 10 // Set page margins to 10 pixels
};
// Step 5: Save the PDF to a HTML file with specific options
pdf.SaveAsHtml("myHtmlConfigured.html", fullContentWidth: true, title: "Hello World", htmlFormatOptions: htmlFormat);
}
}
Imports IronPdf ' Import the IronPDF library
Public Class PdfToHtmlConverter
Public Sub ConvertPdfToHtml()
' Step 1: Load the PDF file into a PdfDocument object
Dim pdf As PdfDocument = PdfDocument.FromFile("sample.pdf")
' Step 2: Convert the PDF to a simple HTML string
Dim html As String = pdf.ToHtmlString()
' Optionally, display HTML to the console (or handle as needed)
' Console.WriteLine(html);
' Step 3: Save the HTML string to a file
pdf.SaveAsHtml("myHtml.html")
' Step 4: Create HtmlFormatOptions for custom configurations
Dim htmlFormat As New HtmlFormatOptions With {
.BackgroundColor = IronSoftware.Drawing.Color.White,
.H1Color = IronSoftware.Drawing.Color.Blue,
.H1FontSizePx = 25,
.H1TextAlign = HtmlAlignment.Center,
.PageMarginsPx = 10
}
' Step 5: Save the PDF to a HTML file with specific options
pdf.SaveAsHtml("myHtmlConfigured.html", fullContentWidth:= True, title:= "Hello World", htmlFormatOptions:= htmlFormat)
End Sub
End Class
To begin converting a PDF file to HTML, we must first load the PDF we wish to convert using the FromFile
method of the PdfDocument
class. This method takes the filename or file location we pass to it and loads it into a new PdfDocument
object, pdf. Now, we will be able to simply reference this object whenever we want to access it for the conversion process.
Next, we demonstrate the method of converting a PDF document to a simple HTML string object, which can then be displayed on the console, ready to be manipulated further depending on the needs of the developer. The following line of code demonstrates another way, where we convert the PDF to an HTML file, ready for more complex work or sharing compared to the simple HTML string. Both of these methods require only a single line to accomplish the conversion process, making them straightforward to use efficiently.
Now let's look at a more advanced example wherein we take the HtmlFormatOptions
class and use its properties to customize the final HTML output. With this class, you can adjust various aspects like background color, heading (H1) color, H1 text alignment, page margins, and more. First, we need to create a new instance of this class, named htmlFormat in the code.
In this example, we change the background color to white and set the H1 text color to blue using the IronSoftware.Drawing.Color
class. We then adjust the H1 font size to 25 pixels. Next, we customize the H1 text alignment to be centered. Finally, we set the PDF page margins in the HTML document to 10 pixels.
The final step involves using the SaveAsHtml
method again to convert the PDF to HTML, this time with additional parameters. The first parameter is the name and location to save the newly generated HTML document. We then set a boolean, fullContentWidth, to true, which configures the HTML to use the full width for the PDF content. We also specify a title for the HTML output and finally apply the customization settings we created earlier with htmlFormatOptions
.
Click here to view the How-to Guide, including examples, sample code, and files