Read PDF Files in C#
The PdfDocument.ExtractAllText
method from the IronPDF C# PDF library is perfect for vanilla PDF text reading tasks. This method handles whitespace and encoding discrepancies within source PDF documents without any issue.
PdfDocument.ExtractTextFromPage
reads the text from specific pages of a PDF. In the example above, we see it used iteratively to retrieve text content from a specific range of pages.
IronPDF can also extract raw images from PDFs. For this, use either of the methods from the PdfDocument
class below:
ExtractAllImages
: returns all images embedded in a PDF asIronSoftware.Drawing.AnyBitmap
objects.ExtractAllRawImages
: retrieves all embedded images as a list of raw bytes (byte []
).ExtractImagesFromPage
: extracts the images contained on an indexed page.ExtractImagesFromPages
: same asExtractImagesFromPage
, but from a specific page range or a list of individual pages.ExtractRawImagesFromPage
andExtractRawImagesFromPages
: works the same as the previous two methods, but returns extracted images as byte arrays instead of asIronSoftware.Drawing.AnyBitmap
objects.
How to Read PDF Files in C#
- Download IronPDF Library for C#
- Extract Images or Text from PDF
- Read and Find Words in Specific Documents
- View PDF Output from your original document