Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
In this tutorial, we will learn how to read data from a PDF (Portable Document Format) document in C# with examples using two different tools.
There are many parser libraries/readers available online that can extract text and images from PDFs. We will extract information from a PDF file using the two most useful and best libraries with relevant services to date. We will also compare both libraries to find out which of the two is better.
We will be comparing iText 7 and IronPDF. Before going forward, we will introduce both libraries.
iText 7 library is the latest version of iTextSharp. It is used in both .NET and Java applications. It is equipped with a document engine (like Adobe Acrobat Reader), high and low-level programming capabilities, an event listener, and PDF editing capabilities. iText 7 can create, edit and enhance pages of PDF documents without any error. Other features include adding passwords, creating encoding strategies, and saving permission options to a PDF document. It is also used to add or change content or canvas images, append PDF elements [dictionaries, etc.], create watermarks and bookmarks, change font sizes, and sign sensitive data.
iText 7 allows us to build custom PDF processing applications for web, mobile, desktop, kernel, or cloud apps in .NET.
IronPDF is a library developed by Iron Software that helps C# and Java Software Engineers create, edit, and extract PDF content. It is commonly used to generate PDFs from HTML, from webpages, or from images. It is used to read PDFs and extract their text. Other features include adding headers/footers, signatures, attachments, passwords, and security questions. It provides full performance optimization with its multithreading and asynchronous features.
IronPDF has cross-platform support compatibility with .NET 5, .NET 6, and .NET 7, .NET Core, Standard, and Framework. It is also compatible with Windows, macOS, Linux, Docker, Azure, and AWS.
Now, let's see a demonstration for both of them.
We will use the following PDF file for extracting text from the PDF.
Write the following source code for extracting text using iText 7.
//assign PDF location to a string and create new StringBuilder...
string pdfPath = @"D:/TestDocument.pdf";
var pageText = new StringBuilder();
//read PDF using new PdfDocument and new PdfReader...
using (PdfDocument document = new PdfDocument(new PdfReader(pdfPath)))
{
var pageNumbers = document.GetNumberOfPages();
for (int page = 1; page <= pageNumbers; page++)
{
//new LocationTextExtractionStrategy creates a new text extraction renderer
LocationTextExtractionStrategy strategy = new LocationTextExtractionStrategy();
PdfCanvasProcessor parser = new PdfCanvasProcessor(strategy);
parser.ProcessPageContent(document.GetFirstPage());
pageText.Append(strategy.GetResultantText());
}
Console.WriteLine(pageText.ToString());
}
//assign PDF location to a string and create new StringBuilder...
string pdfPath = @"D:/TestDocument.pdf";
var pageText = new StringBuilder();
//read PDF using new PdfDocument and new PdfReader...
using (PdfDocument document = new PdfDocument(new PdfReader(pdfPath)))
{
var pageNumbers = document.GetNumberOfPages();
for (int page = 1; page <= pageNumbers; page++)
{
//new LocationTextExtractionStrategy creates a new text extraction renderer
LocationTextExtractionStrategy strategy = new LocationTextExtractionStrategy();
PdfCanvasProcessor parser = new PdfCanvasProcessor(strategy);
parser.ProcessPageContent(document.GetFirstPage());
pageText.Append(strategy.GetResultantText());
}
Console.WriteLine(pageText.ToString());
}
'assign PDF location to a string and create new StringBuilder...
Dim pdfPath As String = "D:/TestDocument.pdf"
Dim pageText = New StringBuilder()
'read PDF using new PdfDocument and new PdfReader...
Using document As New PdfDocument(New PdfReader(pdfPath))
Dim pageNumbers = document.GetNumberOfPages()
For page As Integer = 1 To pageNumbers
'new LocationTextExtractionStrategy creates a new text extraction renderer
Dim strategy As New LocationTextExtractionStrategy()
Dim parser As New PdfCanvasProcessor(strategy)
parser.ProcessPageContent(document.GetFirstPage())
pageText.Append(strategy.GetResultantText())
Next page
Console.WriteLine(pageText.ToString())
End Using
Now, let's extract text from a PDF using IronPDF.
The following source code demonstrates the example of extracting text from PDF by using IronPDF.
var pdf = PdfDocument.FromFile(@"D:/TestDocument.pdf");
string text = pdf.ExtractAllText();
Console.WriteLine(text);
var pdf = PdfDocument.FromFile(@"D:/TestDocument.pdf");
string text = pdf.ExtractAllText();
Console.WriteLine(text);
Dim pdf = PdfDocument.FromFile("D:/TestDocument.pdf")
Dim text As String = pdf.ExtractAllText()
Console.WriteLine(text)
With IronPDF, it takes two lines to extract text from PDFs. With iText 7, on the other hand, we have to write about 10 lines of code for the same task.
IronPDF provides convenient text extraction methods out of the box; but iText 7 requires us to write our own logic to do the same task.
IronPDF is efficient in terms of both performance and code readability.
Both libraries are equal in terms of accuracy, as both provide 100% accurate output.
iText 7 is available for commercial-use only. IronPDF is free for development and also provides a free trial for commercial use.
For a more in-depth comparison of IronPDF and iText 7, please read this blog post on IronPDF vs. iText 7.
9 .NET API products for your office documents