iText7 Read PDF in C# Alternatives (VS IronPDF)
PDF is a portable document format created by Adobe Acrobat Reader, widely used for sharing information digitally over the internet. It preserves the formatting of data and provides features like setting security permissions and password protection. As a C# developer, you may have encountered scenarios where integrating PDF functionality into your software application is necessary. Building it from scratch can be a time-consuming and tedious task. Therefore, considering the performance, effectiveness, and efficiency of the application, the trade-off between creating a new service from scratch or using a prebuilt library is significant.
There are several PDF libraries available for C#. In this article, we will explore two of the most popular PDF libraries for reading PDF documents in C#.
iText software
iText 7, formerly known as iText 7 Core, is a PDF library to program PDF documents in .NET C# and Java. It is available as an open source license (AGPL) and can be licensed for commercial applications.
iText Core is a high-level API that provides easy methods to generate and edit PDFs in all possible ways. With iText 7 Core, you can split, merge, annotate, fill forms, digitally sign, and do much more on PDF files. iText 7 provides an HTML to PDF converter.
IronPDF
Learn more about IronPDF is a .NET and .NET Framework C# and Java API used for generating PDF documents from HTML, CSS, and JavaScript either from a URL, HTML files, or HTML strings. IronPDF allows you to manipulate existing PDF files like splitting, merging, annotating, digitally signing, and much more.
IronPDF is enriched with 50+ features to create, read, and edit PDF files. It prioritizes speed, ease of use, and accuracy when you need to deliver high-quality, pixel-perfect professional PDF files with Adobe Acrobat Reader. The API is well documented, and a lot of sample source code can be found on its code examples page.
Create a Console Application
We are going to use Visual Studio 2022 IDE for creating an application to start with. Visual Studio is the official IDE for C# development, and you must have it installed. You can download it from the Microsoft Visual Studio website if not installed.
The following steps will create a new project named "DemoApp".
Open Visual Studio and click on "Create a New Project".
Select "Console Application" and click "Next".
Set the name of the project.
Select the .NET version. Choose the stable version .NET 6.0.
Install IronPDF Library
Once the project is created, the IronPDF library needs to be installed in the project to use it. Follow these steps to install it.
Open NuGet Package Manager, either from solution explorer or Tools.
Browse for IronPDF Library and select it for the current project. Click Install.
Add the following namespace at the top of Program.cs file:
using IronPdf;
using IronPdf;
Imports IronPdf
Install iText 7 Library
Once the project is created, the iText 7 library needs to be installed in the project to use it. Follow the steps to install it.
Open NuGet Package Manager either from solution explorer or Tools.
Browse for iText 7 Library and select it for the current project. Click install.
Add the following namespaces at the top of Program.cs file:
using iText.Kernel.Pdf.Canvas.Parser.Listener;
using iText.Kernel.Pdf.Canvas.Parser;
using iText.Kernel.Pdf;
using iText.Kernel.Pdf.Canvas.Parser.Listener;
using iText.Kernel.Pdf.Canvas.Parser;
using iText.Kernel.Pdf;
Imports iText.Kernel.Pdf.Canvas.Parser.Listener
Imports iText.Kernel.Pdf.Canvas.Parser
Imports iText.Kernel.Pdf
Open PDF files
We are going to use the following PDF file to extract text from it. It is a two-page PDF document.
Using iText library
To open a PDF file using the iText library is a two-step process. First, we create a PdfReader
object and pass the file location as a parameter. Then we use the PdfDocument
class to create a new PDF document. The code goes as follows:
// Initialize a reader instance by specifying the path of the PDF file
PdfReader pdfReader = new PdfReader("sample.pdf");
// Initialize a document instance using the PdfReader
PdfDocument pdfDoc = new PdfDocument(pdfReader);
// Initialize a reader instance by specifying the path of the PDF file
PdfReader pdfReader = new PdfReader("sample.pdf");
// Initialize a document instance using the PdfReader
PdfDocument pdfDoc = new PdfDocument(pdfReader);
' Initialize a reader instance by specifying the path of the PDF file
Dim pdfReader As New PdfReader("sample.pdf")
' Initialize a document instance using the PdfReader
Dim pdfDoc As New PdfDocument(pdfReader)
Using IronPDF
Opening PDF files using IronPDF is easy. Use the PdfDocument
class's FromFile
method to open PDFs from any file location. The following one-line code opens a PDF file for reading data:
// Open a PDF file using IronPDF and create a PdfDocument instance
var pdf = PdfDocument.FromFile("sample.pdf");
// Open a PDF file using IronPDF and create a PdfDocument instance
var pdf = PdfDocument.FromFile("sample.pdf");
' Open a PDF file using IronPDF and create a PdfDocument instance
Dim pdf = PdfDocument.FromFile("sample.pdf")
Read Data from PDF files
Using iText7 library
To read PDF data is not that straightforward in the iText 7 library. We have to manually loop through each page of the PDF document to extract text from each page. The following source code helps to extract text from the PDF document page by page:
// Iterate through each page and extract text
for (int page = 1; page <= pdfDoc.GetNumberOfPages(); page++)
{
// Define the text extraction strategy
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
// Extract text from the current page using the strategy
string pageContent = PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page), strategy);
// Output the extracted text to the console
Console.WriteLine(pageContent);
}
// Close document and reader to release resources
pdfDoc.Close();
pdfReader.Close();
// Iterate through each page and extract text
for (int page = 1; page <= pdfDoc.GetNumberOfPages(); page++)
{
// Define the text extraction strategy
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
// Extract text from the current page using the strategy
string pageContent = PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page), strategy);
// Output the extracted text to the console
Console.WriteLine(pageContent);
}
// Close document and reader to release resources
pdfDoc.Close();
pdfReader.Close();
' Iterate through each page and extract text
Dim page As Integer = 1
Do While page <= pdfDoc.GetNumberOfPages()
' Define the text extraction strategy
Dim strategy As ITextExtractionStrategy = New SimpleTextExtractionStrategy()
' Extract text from the current page using the strategy
Dim pageContent As String = PdfTextExtractor.GetTextFromPage(pdfDoc.GetPage(page), strategy)
' Output the extracted text to the console
Console.WriteLine(pageContent)
page += 1
Loop
' Close document and reader to release resources
pdfDoc.Close()
pdfReader.Close()
There is a lot going on in the code above. First, we declare the Text Extraction Strategy, and then we use the PdfExtractor
class's GetTextFromPage
method to read text. This method accepts two parameters: the first one is the PDF document page, and the second one is the strategy. To get the PDF document page, use the instance of PdfDocument
to call the GetPage
method and pass the page number as a parameter. The output is returned as a string, which is then displayed on the console output screen. Finally, the PDFReader
and PdfDocument
objects are closed. Also, look at the following code example on extracting text from PDF using iText7.
Output
Using IronPDF
Just like opening the PDF file was one line of code, similarly, reading text from a PDF file is also a one-line process. The PDFDocument
class provides the ExtractAllText
method to read the entire content from the PDF. Console.WriteLine
is used to print the text on the screen. The code is as follows:
// Extract all text from the PDF document
string text = pdf.ExtractAllText();
// Display the extracted text
Console.WriteLine(text);
// Extract all text from the PDF document
string text = pdf.ExtractAllText();
// Display the extracted text
Console.WriteLine(text);
' Extract all text from the PDF document
Dim text As String = pdf.ExtractAllText()
' Display the extracted text
Console.WriteLine(text)
Output
The output is accurate and without any errors. However, to use the ExtractAllText
method, you need to have a license as it only works in production mode. You can get your trial license key for 30 days from the IronPDF trial license page.
Comparison
In comparison, both libraries give 100% accurate results while extracting text from a PDF document. They are identical when it comes to accuracy. However, IronPDF is more efficient in terms of performance and code readability.
IronPDF only takes two lines of code to achieve the same task as iText. It provides text extraction methods out of the box without any extra logic to be implemented. iText code is a bit tricky, and you have to close the two instances created at the time of opening a PDF document. Whereas, IronPDF clears the memory automatically once the task is performed.
Summary
In this article, we looked at how to read PDF documents using the iText library in C# and then compared it with IronPDF. Both libraries give accurate results and provide numerous PDF manipulation methods to work with. You can create, edit, and read data from PDF files using both of these libraries.
iText is open source and free to use but with restrictions. It can be licensed for commercial use. IronPDF is also free to use and can be licensed for commercial activities with a 30-day free trial available.
Download IronPDF and give it a try.
Please note
Frequently Asked Questions
What is iText 7?
iText 7, formerly known as iText 7 Core, is a PDF library for programming PDF documents in .NET C# and Java. It is available under an open source license (AGPL) and can be licensed for commercial applications.
What is a .NET library used for generating and manipulating PDF documents from HTML, CSS, and JavaScript?
IronPDF is a .NET and .NET Framework C# and Java API used for generating and manipulating PDF documents from HTML, CSS, and JavaScript. It is enriched with 50+ features to create, read, and edit PDF files.
How do iText 7 and a certain .NET library compare in terms of performance?
Both iText 7 and IronPDF provide accurate results when extracting text from PDF documents. However, IronPDF is more efficient in terms of performance and code readability, requiring fewer lines of code to achieve the same tasks.
Can I use a .NET library for PDF manipulation and iText 7 for free?
iText 7 is open source and free to use with restrictions, and it can be licensed for commercial use. IronPDF is also free to use and offers a 30-day free trial with commercial licensing options available.
What are some features of iText 7?
iText 7 provides features like splitting, merging, annotating, filling forms, digitally signing PDFs, and more. It also offers an HTML to PDF converter.
What are some features of a .NET library for PDF manipulation?
IronPDF allows you to manipulate existing PDF files, split, merge, annotate, digitally sign, and more. It prioritizes speed, ease of use, and accuracy for high-quality, professional PDF outputs.
How can I install a .NET library for PDF manipulation in a C# project?
To install IronPDF in a C# project, open the NuGet Package Manager from Solution Explorer or Tools, browse for the IronPDF Library, select it for the current project, and click Install. Add 'using IronPdf;' at the top of your Program.cs file.
How can I install the iText 7 library in a C# project?
To install iText 7, open the NuGet Package Manager from Solution Explorer or Tools, browse for iText 7 Library, select it for the current project, and click install. Add the necessary namespaces like 'using iText.Kernel.Pdf;' at the top of your Program.cs file.
What is required to read a PDF using the iText 7 library?
To read a PDF using iText 7, initialize a PdfReader with the PDF file path, then create a PdfDocument instance. Use a text extraction strategy to loop through pages and extract text, then close the PdfDocument and PdfReader to release resources.
What is the process to read a PDF using a .NET library for PDF manipulation?
Reading a PDF using IronPDF is straightforward. Use the PdfDocument class's FromFile method to open a PDF, then call the ExtractAllText method to retrieve text. Display the text using Console.WriteLine.