Read PDF Files in C#

Extracting text and images can facilitate data migration when transitioning from one document format to another. Extracted content can be preserved in a more accessible and editable format, reducing the risk of data loss.

Embedded images and text can be extracted independently of the PDF document. The extracted text will be in a normal string, while the extracted images will be in image buffer format and can then be exported or further processed.

Use the extractText method to extract text, and the extractRawImages method to extract images from a PDF document.

Here is a corrected and commented example of how you might do this:

In the above C# code:

  • We use the IronPDF library to load a PDF document.
  • ExtractText() method is invoked to retrieve text from the PDF. This text is output to the console.
  • ExtractImages() method is used to extract images, which are stored in byte arrays. Each image is then saved to the file system with a specified file name.

For more detailed instructions on how to use these methods, visit the IronPDF Documentation.

Discover the Code Example for Reading PDF Text with IronPDF for Node.js!

Talk to an Expert Five Star Trust Score Rating

Ready to Get Started?