Read PDF Files in C#

Screen: Indicates that CSS rules are intended for on-screen viewing. Styles are optimized for digital screens, which typically include considerations like colors and layouts that look good on computer monitors.
Print: Instructs the browser to apply styles tailored for printed documents. This means the CSS rules are optimized for printing, ensuring readability and proper formatting when the HTML content is printed on physical media, like paper.

import {PdfDocument} from "@ironsoftware/ironpdf";

(async () => {
    // Extracting Image and Text content from Pdf Documents
    // Import existing PDF document
    const pdf = await PdfDocument.fromHtml("old_report.pdf");
    
    // Get all text to put in a search index
    const text = await pdf.extractText();
    
    // Get all Images
    const imagesBuffer = await pdf.extractRawImages();
    
    const pageCount = await pdf.getPageCount()
    // Or even find the precise text and images for each page in the document
    for (let index = 0; index < pageCount; index++) {
        text = await pdf.extractText([index]);
        imagesBuffer = await pdf.extractRawImages([index]);
    }
})();

Read PDF Files in C#

Extracting text and images can facilitate data migration when transitioning from one document format to another. Extracted content can be preserved in a more accessible and editable format, reducing the risk of data loss.

Embedded images and text can be extracted independently of the PDF document. The extracted text will be in a normal string, while the extracted images will be in image buffer format and can then be exported or further processed.

Use the extractText method to extract text, and the extractRawImages method to extract images from a PDF document.