Read PDF Files in C#

import {PdfDocument} from "@ironsoftware/ironpdf";

(async () => {
    // Extracting Image and Text content from Pdf Documents
    // Import existing PDF document
    const pdf = await PdfDocument.fromHtml("old_report.pdf");
    
    // Get all text to put in a search index
    const text = await pdf.extractText();
    
    // Get all Images
    const imagesBuffer = await pdf.extractRawImages();
    
    const pageCount = await pdf.getPageCount()
    // Or even find the precise text and images for each page in the document
    for (let index = 0; index < pageCount; index++) {
        text = await pdf.extractText([index]);
        imagesBuffer = await pdf.extractRawImages([index]);
    }
})();

Read PDF Files in C#

Extracting text and images can facilitate data migration when transitioning from one document format to another. Extracted content can be preserved in a more accessible and editable format, reducing the risk of data loss.

Embedded images and text can be extracted independently of the PDF document. The extracted text will be in a normal string, while the extracted images will be in image buffer format and can then be exported or further processed.

Use the extractText method to extract text, and the extractRawImages method to extract images from a PDF document.

For more detailed instructions on how to use these methods, visit the IronPDF Documentation.