Saltar al pie de página
USANDO IRONPDF

Cómo Leer Archivos PDF en C#

Programmatic PDF processing is crucial in industries like finance, healthcare, legal, and education, where critical information needs to be processed, analyzed, and extracted from PDF documents for purposes such as data analysis, document management, and automation. Despite its importance, this task can be challenging.

IronPDF: A C# PDF Library

IronPDF enables you to handle incredibly difficult tasks easily. It allows for the easy editing of text in a PDF document, in a similar way to how you can work with text files in a text document, all the while allowing you to export files in any operating system. The IronPDF application covers the complete process of viewing, modifying, and extracting content from a PDF.

Take the Right Step with IronPDF

Text can be read and written in PDF file format quickly and easily using any computer with IronPDF software. Installation is a simple task. This is the best way to learn to read PDF files in C#. You may also download IronPDF free of cost for development. If you explore IronPDF you will notice that the library provides extensive functionality that makes it very easy to use PDFs. Explore classes in your free time! There are several C# examples using HTML to create a PDF available to learn how to create an optimal output from reading PDFs.

Read PDF Files using IronPDF

Step 1: Install the IronPDF Package

To begin, you will need to install the IronPDF NuGet package into your .NET project. You can do this by opening the Package Manager Console in Visual Studio and entering the following command:

Install-Package IronPdf

Step 2: Import the IronPDF Library

Next, you need to import the IronPDF library into your code by adding the following statement at the top of your file:

using IronPdf;
using IronPdf;
Imports IronPdf
$vbLabelText   $csharpLabel

Step 3: Load the PDF Document

Once you have imported the IronPDF library, you can load a PDF document into your code by using the following code:

// Load the PDF document from file path
PdfDocument pdf = PdfDocument.FromFile(@"C:\dotnet.pdf");

// Define the output path for the saved PDF
var outputPath = "Example.pdf";

// Save the PDF document to the specified output path
pdf.SaveAs(outputPath);
// Load the PDF document from file path
PdfDocument pdf = PdfDocument.FromFile(@"C:\dotnet.pdf");

// Define the output path for the saved PDF
var outputPath = "Example.pdf";

// Save the PDF document to the specified output path
pdf.SaveAs(outputPath);
' Load the PDF document from file path
Dim pdf As PdfDocument = PdfDocument.FromFile("C:\dotnet.pdf")

' Define the output path for the saved PDF
Dim outputPath = "Example.pdf"

' Save the PDF document to the specified output path
pdf.SaveAs(outputPath)
$vbLabelText   $csharpLabel

Step 4: Extract Text from the PDF

IronPDF provides a range of methods to extract text from an existing PDF file. For example, you can begin extracting text from a PDF and print it on the console by using the following code snippet:

// Extract text from the loaded PDF document
string text = pdf.ExtractText();

// Print the extracted text to the console
Console.WriteLine(text);
// Extract text from the loaded PDF document
string text = pdf.ExtractText();

// Print the extracted text to the console
Console.WriteLine(text);
' Extract text from the loaded PDF document
Dim text As String = pdf.ExtractText()

' Print the extracted text to the console
Console.WriteLine(text)
$vbLabelText   $csharpLabel

Using the above code, you can extract text from a PDF file.

How to Read PDF Files in C#, Figure 1: Extracting Text from a PDF Using IronPDF Extracting Text from a PDF Using IronPDF

Step 5: Rasterize a PDF to Images

Let's rasterize the PDF file to Images with IronPDF using IronPDF. First, import the required libraries:

using System.Linq;
using IronPdf;
using IronSoftware.Drawing;
using System.Linq;
using IronPdf;
using IronSoftware.Drawing;
Imports System.Linq
Imports IronPdf
Imports IronSoftware.Drawing
$vbLabelText   $csharpLabel

The code then uses the RasterizeToImageFiles method to extract all the pages of the PDF document to a folder as image files. The extracted images can be saved as either PNG or JPG files, and the dimensions and page ranges of the images can also be specified.

// Extract all pages to a folder as image files with PNG format
pdf.RasterizeToImageFiles(@"C:\image\folder\*.png");

// Extract all pages to JPG images with specified dimensions
pdf.RasterizeToImageFiles(@"C:\image\folder\example_pdf_image_*.jpg", 100, 80);
// Extract all pages to a folder as image files with PNG format
pdf.RasterizeToImageFiles(@"C:\image\folder\*.png");

// Extract all pages to JPG images with specified dimensions
pdf.RasterizeToImageFiles(@"C:\image\folder\example_pdf_image_*.jpg", 100, 80);
' Extract all pages to a folder as image files with PNG format
pdf.RasterizeToImageFiles("C:\image\folder\*.png")

' Extract all pages to JPG images with specified dimensions
pdf.RasterizeToImageFiles("C:\image\folder\example_pdf_image_*.jpg", 100, 80)
$vbLabelText   $csharpLabel

Finally, the code uses the ToBitmap method to extract all pages of the PDF document as AnyBitmap objects, which can be processed and manipulated further within the code.

// Extract all pages as AnyBitmap objects for further processing
AnyBitmap[] pdfBitmaps = pdf.ToBitmap();
// Extract all pages as AnyBitmap objects for further processing
AnyBitmap[] pdfBitmaps = pdf.ToBitmap();
' Extract all pages as AnyBitmap objects for further processing
Dim pdfBitmaps() As AnyBitmap = pdf.ToBitmap()
$vbLabelText   $csharpLabel

The above code demonstrates how to extract the contents of a PDF file using IronPDF and save the extracted data as image files or AnyBitmap objects for further processing.

Step 7: Manipulate PDF Pages

Let's learn how to manipulate the pages of a PDF document by working with IronPDF.

The code first removes pages two and three from the PDF document using the RemovePages method:

// Remove pages two and three from the PDF document
pdf.RemovePages(1, 2);
// Remove pages two and three from the PDF document
pdf.RemovePages(1, 2);
' Remove pages two and three from the PDF document
pdf.RemovePages(1, 2)
$vbLabelText   $csharpLabel

The RemovePages method takes two arguments: the starting page to remove (in this case, page 2, represented as 1 since page numbering starts at 0) and the number of pages to remove (in this case, 2 pages).

Step 6: Save the PDF

Finally, you can save the PDF file to your local system using the SaveAs method. The code for saving the PDF file is as follows:

// Save the PDF document to a specified output path
pdf.SaveAs(outputPath);
// Save the PDF document to a specified output path
pdf.SaveAs(outputPath);
' Save the PDF document to a specified output path
pdf.SaveAs(outputPath)
$vbLabelText   $csharpLabel

IronPDF Compatibility

IronPDF is highly compatible with all the latest .NET Frameworks including the .NET 7. It also supports .NET Blazor and .NET MAUI, which are the latest offerings from Microsoft for web development. The library's compatibility with these frameworks makes it possible for developers to seamlessly integrate IronPDF into their applications and take advantage of its powerful features.

One of the main features of IronPDF is its ability to read PDF files in .NET Blazor and .NET MAUI. This feature enables developers to quickly and easily read and extract data from PDF files and use them in the .NET applications. This capability can be especially helpful when working with a large volume of data. Developers don't need any other library to use the IronPDF in their .NET project.

Get more information about IronPDF working with .NET Blazor in this tutorial and learn about integrating IronPDF with .NET MAUI on IronPDF's website.

Conclusion

In conclusion, reading PDF files programmatically is crucial in various industries. IronPDF provides a comprehensive solution to handle this task by offering extensive functionality to read, modify, and extract content from a PDF file. IronPDF is easy to install and use with just a few simple steps.

The library offers methods to extract text from PDF documents, rasterize a PDF to an image, manipulate pages, and save PDF files. Whether you are new to programmatic PDF processing or an experienced developer, IronPDF is the perfect tool to take your skills to the next level.

If you are looking for a reliable and efficient solution for reading PDF files in C#, IronPDF is worth exploring, especially with its license options and pricing information, and a free trial available. You can see more plans provided by IronPDF in the image below. You can select the package that matches your needs.

How to Read PDF Files in C#, Figure 2: IronPDF Licensing Prices IronPDF Licensing Prices

Preguntas Frecuentes

¿Cómo puedo leer archivos PDF en C#?

Puedes usar IronPDF instalándolo primero a través del administrador de paquetes NuGet en tu proyecto .NET. Luego, importa la biblioteca y úsala para cargar y leer documentos PDF, extrayendo texto y mostrándolo en la consola.

¿Qué industrias se benefician del procesamiento de PDF programático?

Industrias como finanzas, sanidad, legal y educativa se benefician significativamente del procesamiento de PDF programático, ya que permite un análisis eficiente de datos, gestión de documentos y automatización de tareas utilizando herramientas como IronPDF.

¿Cómo extraigo datos de un documento PDF usando C#?

Usando IronPDF, puedes extraer datos de un documento PDF cargando el PDF y utilizando métodos como ExtractText para leer y procesar el contenido programáticamente.

¿Puedo convertir archivos PDF a imágenes en C#?

Sí, con IronPDF, puedes convertir archivos PDF a imágenes usando el método RasterizeToImageFiles, lo que te permite guardar páginas como archivos de imagen en formatos como PNG o JPG.

¿Es compatible IronPDF con los últimos frameworks de .NET?

IronPDF es compatible con todos los últimos frameworks de .NET, incluyendo .NET 7. También soporta .NET Blazor y .NET MAUI, permitiendo la integración en varios tipos de aplicaciones.

¿Cómo puedo modificar y guardar un archivo PDF usando C#?

Después de hacer modificaciones a un archivo PDF usando IronPDF, puedes guardar los cambios utilizando el método SaveAs, especificando la ruta de salida para el documento modificado.

¿Qué pasos están involucrados en usar una biblioteca PDF en un proyecto .NET?

Para usar IronPDF en un proyecto .NET, instala la biblioteca a través de NuGet, impórtala en tu proyecto, y luego utiliza sus funcionalidades para cargar, leer y manipular documentos PDF de manera programática.

¿IronPDF requiere otras bibliotecas para el procesamiento de PDF en .NET?

No, IronPDF es una biblioteca independiente que no requiere bibliotecas adicionales, lo que facilita su integración en tu proyecto .NET para un procesamiento exhaustivo de PDF.

¿Cuáles son las características clave de IronPDF para el procesamiento de PDF?

IronPDF ofrece características como extracción de texto, rasterización de PDF a imágenes, manipulación de páginas y compatibilidad con los últimos frameworks de .NET, lo que lo convierte en una herramienta poderosa para manejar archivos PDF en C#.

¿IronPDF es totalmente compatible con .NET 10?

Sí, IronPDF es compatible con .NET 10 (y versiones anteriores como .NET 9, 8, 7 y 6) de fábrica. Puede crear aplicaciones con IronPDF en .NET 10 sin necesidad de configuraciones especiales ni soluciones alternativas.

Curtis Chau
Escritor Técnico

Curtis Chau tiene una licenciatura en Ciencias de la Computación (Carleton University) y se especializa en el desarrollo front-end con experiencia en Node.js, TypeScript, JavaScript y React. Apasionado por crear interfaces de usuario intuitivas y estéticamente agradables, disfruta trabajando con frameworks modernos y creando manuales bien ...

Leer más