PDF to HTML

Just as IronPDF can handle generating pixel-perfect PDF files from HTML content, it can also be used to convert PDF documents into HTML. Through the use of the PdfDocument and HtmlFormatOptions classes, users will have access to the methods necessary for converting the PDF to HTML and control over how the final HTML content will be formatted.

5 Steps to Converting PDF to HTML

  • PdfDocument pdf = PdfDocument.FromFile("sample.pdf");
  • string html = pdf.ToHtmlString();
  • pdf.SaveAsHtml("myHtml.html");
  • HtmlFormatOptions htmlformat = new HtmlFormatOptions();
  • pdf.SaveAsHtml("myHtmlConfigured.html", true, "Hello World", htmlFormatOptions: htmlformat);

To begin converting a PDF file to HTML, we must first load the PDF we wish to convert using the FromFile through the PdfDocument class. This method will take the file name/file location we pass to it, and load it into our new PdfDocument object, pdf. Now, we will be able to simply reference this object whenever we want to access it for the conversion process.

Next, we will demonstrate the first method of converting a PDF document to HTML. This method takes the PDF and converts it to a simple HTML string object, which can then be displayed on the console, ready to be manipulated further depending on the needs of the developer. The next line demonstrates the other way, through which we convert the PDF to an HTML file, ready for more complex work or sharing compared to the simple HTML string. Both of these methods only require a single line to carry out the conversion process itself, making it straightforward to use efficiently.

Now let's look at a more advanced example wherein we take the HtmlFormatOptions class and use its methods to manipulate and customize the final HTML output. With this class, you can customize different aspects of the HTML output, such as the background color, heading (H1) color, H1 text alignment, page margins, and more. First, we need to create a new instance of this class, which we have named htmlformat.

Next, we will change the background color to white, while setting the H1 text color to blue. This is done by accessing the IronSoftware.Drawing.Color class. Then, we will adjust the H1 font size (specified in pixels) to fit our needs, setting it to 25. The next customization we want to do is specifying the H1 text alignment, and setting it to be centered. The final customization that we will make here is setting the PDF page margins in the HTML document (again in pixels) to 10.

The final step in this process is to use the same method as before to convert the PDF to HTML, although this time we have passed more parameters to it. The first is the name and location we want to save our newly generated HTML document, just as before. The second is setting a boolean, fullContentWidth, to true which will set the PDF content in the HTML to full width. The next parameter is the title for the HTML output, followed by finally applying the customization setting we created earlier to the HTML output.

Click here to view the How-to Guide, including examples, sample code, and files >