PDF to HTML

Just as IronPDF can handle generating pixel-perfect PDF files from HTML content, it can also be used to convert PDF documents into HTML. Through the use of the PdfDocument and HtmlFormatOptions classes, users will have access to the methods necessary for converting the PDF to HTML and control over how the final HTML content will be formatted.

5 Steps to Converting PDF to HTML

Here's a step-by-step code example illustrating how to perform the conversion:

To begin converting a PDF file to HTML, we must first load the PDF we wish to convert using the FromFile method of the PdfDocument class. This method takes the filename or file location we pass to it and loads it into a new PdfDocument object, pdf. Now, we will be able to simply reference this object whenever we want to access it for the conversion process.

Next, we demonstrate the method of converting a PDF document to a simple HTML string object, which can then be displayed on the console, ready to be manipulated further depending on the needs of the developer. The following line of code demonstrates another way, where we convert the PDF to an HTML file, ready for more complex work or sharing compared to the simple HTML string. Both of these methods require only a single line to accomplish the conversion process, making them straightforward to use efficiently.

Now let's look at a more advanced example wherein we take the HtmlFormatOptions class and use its properties to customize the final HTML output. With this class, you can adjust various aspects like background color, heading (H1) color, H1 text alignment, page margins, and more. First, we need to create a new instance of this class, named htmlFormat in the code.

In this example, we change the background color to white and set the H1 text color to blue using the IronSoftware.Drawing.Color class. We then adjust the H1 font size to 25 pixels. Next, we customize the H1 text alignment to be centered. Finally, we set the PDF page margins in the HTML document to 10 pixels.

The final step involves using the SaveAsHtml method again to convert the PDF to HTML, this time with additional parameters. The first parameter is the name and location to save the newly generated HTML document. We then set a boolean, fullContentWidth, to true, which configures the HTML to use the full width for the PDF content. We also specify a title for the HTML output and finally apply the customization settings we created earlier with htmlFormatOptions.

Unlock the full potential of PDF to HTML conversion in C# with our detailed How-to Guide!