Reading PDF Text
IronPDF allows developers to easily extract the full text and images from almost any PDF file. This PDF OCR behavior is particularly useful when building search indexes.
//Rendering PDF documents to Images or Thumbnails using IronPdf; using System.Drawing; // Extracting Image and Text content from Pdf Documents // open a 128 bit encrypted PDF PdfDocument PDF = PdfDocument.FromFile("encrypted.pdf", "password"); //Get all text to put in a search index string AllText = PDF.ExtractAllText(); //Get all Images IEnumerable<System.Drawing.Image> AllImages = PDF.ExtractAllImages(); //Or even find the precise text and images for each page in the document for (var index = 0; index < PDF.PageCount; index++) { int PageNumber = index + 1; string Text = PDF.ExtractTextFromPage(index); IEnumerable<System.Drawing.Image> Images = PDF.ExtractImagesFromPage(index); ///... }
'Rendering PDF documents to Images or Thumbnails Imports IronPdf Imports System.Drawing ' Extracting Image and Text content from Pdf Documents ' open a 128 bit encrypted PDF Private PDF As PdfDocument = PdfDocument.FromFile("encrypted.pdf", "password") 'Get all text to put in a search index Private AllText As String = PDF.ExtractAllText() 'Get all Images Private AllImages As IEnumerable(Of System.Drawing.Image) = PDF.ExtractAllImages() 'Or even find the precise text and images for each page in the document For index = 0 To PDF.PageCount - 1 Dim PageNumber As Integer = index + 1 Dim Text As String = PDF.ExtractTextFromPage(index) Dim Images As IEnumerable(Of System.Drawing.Image) = PDF.ExtractImagesFromPage(index) '''... Next index
IronPDF allows developers to easily extract the full text and images from almost any PDF file. This PDF OCR behavior is particularly useful when building search indexes.