C# Convert PDF to Text (Code Example)

A PDF file is an electronic document that mirrors a paper document. It is very popular for publications as it preserves the look of the original document. PDF files are usually created for documents that have formatting, layout, and other features that are critical to the information being conveyed.

PDF files can contain a lot of information and not all of that information can be converted to text. To extract the plain text from PDF files, we need to first convert them into plain text files. This article explains how .NET developers can convert their files from PDF to TXT files for more efficient access.

Topics Covered in Tutorial

  • IronPDF
  • Steps to Create PDF Files Programmatically
  • Step 1: Create a C# Project
  • Step 2: Install the IronPDF Library
    • Method 1: NuGet Package Manager
    • Method 2: NuGet Package Manager Console
    • Method 3: Using the DLL File
  • Step 3: Add the IronPDF Namespace
  • Step 4: Convert PDF to Text
    • Output
  • Conclusion

IronPDF

IronPDF is a .NET library for PDF generation. It is a native .NET library and does not rely on external DLL or other tools. IronPDF is a cross-platform library written in C# and .NET that offers all the functionality required to work with PDF documents. It contains document rendering, form editing, text extraction, file encryption as well as other features. All of these operations can be performed through an intuitive API that has been extensively tested on various platforms such as Windows Forms, WPF, ASP.NET MVC, etc.

The current release of IronPDF provides simple access to these features:

  • PDF generation
  • PDF encryption
  • PDF form fields with templates and validation logic
  • Text extraction from PDFs (OCR)
  • Fillable forms

Read-only access to the document internals (view, navigation, annotations):

Access to the metadata and document properties (title, keywords, author, subject):

IronPDF also contains some additional helper classes that help with converting HTML documents into fully-formatted PDF files with text extraction. IronPDF provides very simple APIs to convert PDF to a text file.

Let's start with how we can use the IronPDF library to convert a PDF document to a Text file.

Steps to Convert PDF Document to Text File

We will create a C# project in Visual Studio. You can choose any C# application template according to your needs. For this tutorial, I will use the Console Application template. You can use an already existing C# project for converting PDF to TXT file.

Step 1: Create a C# Project

Create a C# project in Visual Studio or open an existing project. I am using the Visual Studio 2022 version for this tutorial. The latest version is recommended for smooth working. Follow the given steps to create a C# Project in Visual Studio 2022.

  1. Open Visual Studio 2022.
  2. Select C# Console Application template or open an existing project.
  3. Give a proper name to the project.
  4. Select the .NET framework version 6.0. This is the latest and most stable framework of .NET, but you can choose any other framework according to your needs.

Step 2: Install the IronPDF Library

IronPDF also features an easy installation process which speeds up development time as well as reduces confusion. We can install the IronPDF library in multiple ways:

  • Using NuGet Package Manager
  • Using NuGet Package Manager Console
  • By DLL File

Method 1: NuGet Package Manager

For using NuGet Package Manager, follow the given steps to open the Package Manager Tab.

Open the C# project and click on Tools > NuGet Package Manager > Manage NuGet Packaged for the solution.

  • Now go to the Browse tab and search for IronPDF.
  • Select IronPDF from search results and click on Install. This will install the IronPDF library allowing the project to be used with the IronPDF library.

Method 2: NuGet Package Manager Console

Installation of the IronPDF library with Package Manager Console is the easiest way. Follow these simple steps:

  • Open the Package Manager Console.
  • Write the following line in the console and hit enter. This will immediately install the IronPDF library.
Install-Package IronPDF

Method 3: Using a DLL file

You can download the DLL file from this link. Once the download is complete, use it in your project as a reference.

For a more detailed installation guide, visit this link.

Installation for the IronPDF library is now complete. The following steps will guide you to convert a PDF file into a text file.

Step 3: Add the IronPDF Namespace

Now we have to use IronPDF in our project. For that purpose, we have to add the IronPDF namespace to every code file where we have to use it. Write the following line of code at the top of every related code file. It will enable you to use the features of IronPDF in your program.

using IronPDF;
using IronPDF;
Imports IronPDF
VB   C#

Step 4: Convert PDF Document to a Text file

Now we have to convert the PDF file to a TXT file or extract only text. So, write the following sample code into your code file:

using IronPdf;
using System.Drawing;

// Extracting image and text content from PDF Document

// Open a 128-bit encrypted PDF
using PdfDocument PDF = PdfDocument.FromFile("encrypted.pdf", "password");

// Get all text to put in a search index
string AllText = PDF.ExtractAllText();
Console.WriteLine(AllText);
using IronPdf;
using System.Drawing;

// Extracting image and text content from PDF Document

// Open a 128-bit encrypted PDF
using PdfDocument PDF = PdfDocument.FromFile("encrypted.pdf", "password");

// Get all text to put in a search index
string AllText = PDF.ExtractAllText();
Console.WriteLine(AllText);
Imports IronPdf
Imports System.Drawing

' Extracting image and text content from PDF Document

' Open a 128-bit encrypted PDF
Private PdfDocument As using

' Get all text to put in a search index
Private AllText As String = PDF.ExtractAllText()
Console.WriteLine(AllText)
VB   C#

In the above code, we are parsing PDF files and extracting text from a whole PDF document. First, we open the PDF document using the PdfDocument.FromFile function. In the parameters, we pass the file name and password (if any). After that, we use the ExtractAllText function. This extracts all text from the PDF file and stores it into a variable named AllText. Following that, we show the output text in the console

Output

This is the output text extracted by the IronPDF. It is the same text as seen on the PDF file, showcasing that the accuracy of the IronPDF is very precise.

Conclusion

In this article, we have seen how we can easily extract text from PDF files using IronPDF .NET PDF library. This was accomplished with only writing a few lines of code and with a high percentage of accuracy. Additionally, IronPDF also has many useful features such as HTML to PDF conversion, PDF formatting tools, and numerous more core PDF features which are imperative for PDF editing. IronPDF also eliminates the dependency on Adobe Acrobat.

IronPDF is free for development purposes and also offers a 30-day trial for production testing. IronPDF provides a variety of pricing plans which you can get according to your needs. The price of IronPDF is relatively significantly less than its competitors. With pricing variation from individual to large companies, it proves to be a compelling purchase with impressive performance.

In addition, Iron Software offers a suite of five Iron Software packages for the price of just two. Get more details from this link.

You can download the software product from this link.