Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
Extracting data from PDFs is crucial for saving time on manual inputting. This article explains how developers can use the IronPDF library to extract text and images from PDF documents.
IronPDF is a .NET library that can be used to create, edit and convert PDF files. It provides an easy-to-use API for developers to use in their applications. It is one of the most popular libraries for creating, editing, and converting PDF files globally. With IronPDF, you can create a straightforward and quick solution to PDFs. Your text will be customized for each document, your layout will be set up for easy reading, and your graphics will be designed with help from the accompanying .NET program.
The IronPDF library has a fantastic feature for extracting data from PDF files. This article will look at how to extract data using IronPDF. First, a C# Project needs to be created or opened. Let's move on to the next section.
This tutorial recommends using the latest version of Visual Studio.
Once Visual Studio is opened, follow the steps below to create a new C# Project. If there is an existing project that you would like to use, then skip these next steps and proceed to the next section directly.
Visual Studio opening UI
Create a new project
.NET Framework selection
Visual Studio will now generate a new C# .NET project.
The IronPDF library can be installed in multiple ways.
Install-Package IronPdf
Installation progress in the Package Manager Console tab
After installation, you will see the IronPDF dependency in the dependencies
section of the Solution Explorer, as shown below.
Reference IronPdf package in Solution Explorer
Another way to install the IronPDF library is by using Visual Studio's integrated NuGet Package Manager UI.
Navigate to NuGet Package Manager
IronPdf
in search, and press Enter.Install the IronPdf package from the NuGet Package Manager
Let's have a look at the following code on how to extract data using IronPDF:
//Rendering PDF documents to Images or Thumbnails
using IronPdf;
using System.Drawing;
// Extracting Image and Text content from Pdf Documents
// open a 128 bit encrypted PDF
using PdfDocument pdf = PdfDocument.FromFile("encrypted.pdf", "password");
//Get all text to put in a search index
string AllText = pdf.ExtractAllText();
//Get all Images
IEnumerable<System.Drawing.Image> AllImages = pdf.ExtractAllImages();
//Or even find the precise text and images for each page in the document
for (var index = 0; index < pdf.PageCount; index++) {
int PageNumber = index + 1;
string Text = pdf.ExtractTextFromPage(index);
IEnumerable<System.Drawing.Image> Images = pdf.ExtractImagesFromPage(index);
///...
}
//Rendering PDF documents to Images or Thumbnails
using IronPdf;
using System.Drawing;
// Extracting Image and Text content from Pdf Documents
// open a 128 bit encrypted PDF
using PdfDocument pdf = PdfDocument.FromFile("encrypted.pdf", "password");
//Get all text to put in a search index
string AllText = pdf.ExtractAllText();
//Get all Images
IEnumerable<System.Drawing.Image> AllImages = pdf.ExtractAllImages();
//Or even find the precise text and images for each page in the document
for (var index = 0; index < pdf.PageCount; index++) {
int PageNumber = index + 1;
string Text = pdf.ExtractTextFromPage(index);
IEnumerable<System.Drawing.Image> Images = pdf.ExtractImagesFromPage(index);
///...
}
'Rendering PDF documents to Images or Thumbnails
Imports IronPdf
Imports System.Drawing
' Extracting Image and Text content from Pdf Documents
' open a 128 bit encrypted PDF
Private PdfDocument As using
'Get all text to put in a search index
Private AllText As String = pdf.ExtractAllText()
'Get all Images
Private AllImages As IEnumerable(Of System.Drawing.Image) = pdf.ExtractAllImages()
'Or even find the precise text and images for each page in the document
For index = 0 To pdf.PageCount - 1
Dim PageNumber As Integer = index + 1
Dim Text As String = pdf.ExtractTextFromPage(index)
Dim Images As IEnumerable(Of System.Drawing.Image) = pdf.ExtractImagesFromPage(index)
'''...
Next index
Firstly, the FromFile
method is used to load the input PDF document in the program. An encrypted PDF file is provided, needing a password to access the file. Afterward, text data is extracted using the ExtractAllText
method to pull all text data into a String variable. From here, PdfDocument
offers a lot of functionality: output it as plain text, dump it in a TXT file, store it in a database, etc.
IronPDF can extract text from PDF tables for inclusion in one or more CSV files.
Line 11 uses the ExtractAllImages
method to extract all the embedded images from the PDF document.
IronPDF can also extract content from specific PDF pages. The remaining lines of code in the example above demonstrate how to use the ExtractTextFromPage
and ExtractImagesFromPage
methods to fetch the text and images from a subset of pages. Both methods accept an integer argument that represents the zero-based index of the desired page.
IronPDF allows developers to extract text and images from PDF files in as little as one line of code, using ExtractAllText
and ExtractAllImages
to extract a PDF file's entire contents instantly. Alternatively, calling ExtractAllImage
or ExtractAllText
will fetch text and images from just one PDF page in particular. The previous sample code showed how to use both methods to read text and images from a range of pages.
Additionally, IronPDF is also capable of rendering charts in PDFs, adding barcodes, enhancing security with passwords and watermarking, and even handling PDF forms programmatically.
IronPDF is completely free for development. While payment is needed for commercial use, you can access the free trial of IronPDF for production without any payment.
Purchase the full suite of Iron Software's document libraries for the price of two IronPDF Lite Licenses.
Download IronPDF now to start extracting data from PDFs today!
9 .NET API products for your office documents