PDF to HTML

Just as IronPDF can handle generating pixel-perfect PDF files from HTML content, it can also be used to convert PDF documents into HTML. Through the use of the PdfDocument and HtmlFormatOptions classes, users will have access to the methods necessary for converting the PDF to HTML and control over how the final HTML content will be formatted.

5 Steps to Converting PDF to HTML

Here's a step-by-step code example illustrating how to perform the conversion:

using IronPdf; // Import the IronPDF library

public class PdfToHtmlConverter
{
    public void ConvertPdfToHtml()
    {
        // Step 1: Load the PDF file into a PdfDocument object
        PdfDocument pdf = PdfDocument.FromFile("sample.pdf");

        // Step 2: Convert the PDF to a simple HTML string
        string html = pdf.ToHtmlString();

        // Optionally, display HTML to the console (or handle as needed)
        // Console.WriteLine(html);

        // Step 3: Save the HTML string to a file
        pdf.SaveAsHtml("myHtml.html");

        // Step 4: Create HtmlFormatOptions for custom configurations
        HtmlFormatOptions htmlFormat = new HtmlFormatOptions
        {
            BackgroundColor = IronSoftware.Drawing.Color.White, // Set background color to white
            H1Color = IronSoftware.Drawing.Color.Blue, // Set H1 text color to blue
            H1FontSizePx = 25, // Set H1 font size to 25 pixels
            H1TextAlign = HtmlAlignment.Center, // Set H1 text alignment to center
            PageMarginsPx = 10 // Set page margins to 10 pixels
        };

        // Step 5: Save the PDF to a HTML file with specific options
        pdf.SaveAsHtml("myHtmlConfigured.html", fullContentWidth: true, title: "Hello World", htmlFormatOptions: htmlFormat);
    }
}
using IronPdf; // Import the IronPDF library

public class PdfToHtmlConverter
{
    public void ConvertPdfToHtml()
    {
        // Step 1: Load the PDF file into a PdfDocument object
        PdfDocument pdf = PdfDocument.FromFile("sample.pdf");

        // Step 2: Convert the PDF to a simple HTML string
        string html = pdf.ToHtmlString();

        // Optionally, display HTML to the console (or handle as needed)
        // Console.WriteLine(html);

        // Step 3: Save the HTML string to a file
        pdf.SaveAsHtml("myHtml.html");

        // Step 4: Create HtmlFormatOptions for custom configurations
        HtmlFormatOptions htmlFormat = new HtmlFormatOptions
        {
            BackgroundColor = IronSoftware.Drawing.Color.White, // Set background color to white
            H1Color = IronSoftware.Drawing.Color.Blue, // Set H1 text color to blue
            H1FontSizePx = 25, // Set H1 font size to 25 pixels
            H1TextAlign = HtmlAlignment.Center, // Set H1 text alignment to center
            PageMarginsPx = 10 // Set page margins to 10 pixels
        };

        // Step 5: Save the PDF to a HTML file with specific options
        pdf.SaveAsHtml("myHtmlConfigured.html", fullContentWidth: true, title: "Hello World", htmlFormatOptions: htmlFormat);
    }
}
Imports IronPdf ' Import the IronPDF library

Public Class PdfToHtmlConverter
	Public Sub ConvertPdfToHtml()
		' Step 1: Load the PDF file into a PdfDocument object
		Dim pdf As PdfDocument = PdfDocument.FromFile("sample.pdf")

		' Step 2: Convert the PDF to a simple HTML string
		Dim html As String = pdf.ToHtmlString()

		' Optionally, display HTML to the console (or handle as needed)
		' Console.WriteLine(html);

		' Step 3: Save the HTML string to a file
		pdf.SaveAsHtml("myHtml.html")

		' Step 4: Create HtmlFormatOptions for custom configurations
		Dim htmlFormat As New HtmlFormatOptions With {
			.BackgroundColor = IronSoftware.Drawing.Color.White,
			.H1Color = IronSoftware.Drawing.Color.Blue,
			.H1FontSizePx = 25,
			.H1TextAlign = HtmlAlignment.Center,
			.PageMarginsPx = 10
		}

		' Step 5: Save the PDF to a HTML file with specific options
		pdf.SaveAsHtml("myHtmlConfigured.html", fullContentWidth:= True, title:= "Hello World", htmlFormatOptions:= htmlFormat)
	End Sub
End Class
$vbLabelText   $csharpLabel

To begin converting a PDF file to HTML, we must first load the PDF we wish to convert using the FromFile method of the PdfDocument class. This method takes the filename or file location we pass to it and loads it into a new PdfDocument object, pdf. Now, we will be able to simply reference this object whenever we want to access it for the conversion process.

Next, we demonstrate the method of converting a PDF document to a simple HTML string object, which can then be displayed on the console, ready to be manipulated further depending on the needs of the developer. The following line of code demonstrates another way, where we convert the PDF to an HTML file, ready for more complex work or sharing compared to the simple HTML string. Both of these methods require only a single line to accomplish the conversion process, making them straightforward to use efficiently.

Now let's look at a more advanced example wherein we take the HtmlFormatOptions class and use its properties to customize the final HTML output. With this class, you can adjust various aspects like background color, heading (H1) color, H1 text alignment, page margins, and more. First, we need to create a new instance of this class, named htmlFormat in the code.

In this example, we change the background color to white and set the H1 text color to blue using the IronSoftware.Drawing.Color class. We then adjust the H1 font size to 25 pixels. Next, we customize the H1 text alignment to be centered. Finally, we set the PDF page margins in the HTML document to 10 pixels.

The final step involves using the SaveAsHtml method again to convert the PDF to HTML, this time with additional parameters. The first parameter is the name and location to save the newly generated HTML document. We then set a boolean, fullContentWidth, to true, which configures the HTML to use the full width for the PDF content. We also specify a title for the HTML output and finally apply the customization settings we created earlier with htmlFormatOptions.

Click here to view the How-to Guide, including examples, sample code, and files