Skip to footer content
USING IRONPDF

How to Read Data from PDF Files in ASP.NET Core

IronPDF simplifies PDF data extraction in ASP.NET Core by providing methods to read text, form data, and tables from PDF files using straightforward C# code without complex dependencies or manual parsing.

Working with PDF files in .NET Core applications can be more challenging than it appears. You might need to extract text from uploaded invoices, retrieve form data from surveys, or parse tables for your database. Many projects slow down because developers use overly complex libraries. That's where IronPDF is beneficial. It allows you to read and process PDF documents without dealing with messy dependencies or writing extensive custom parsing code.

Whether you're handling simple text, digital signatures, or structured data, IronPDF makes it easy. This guide shows you how to read data from PDF files in ASP.NET, handle IFormFile, work with byte arrays, and even return files to the user or render them as HTML strings. You can also integrate it into your containerized deployments, display outputs in the browser, or store them in a cloud-based database.

How Do You Set Up IronPDF in ASP.NET Core?

Getting started with IronPDF in your ASP.NET Core project is quick. Install the IronPDF NuGet package via the NuGet Package Manager Console with the following command:

Install-Package IronPdf

Or through the .NET CLI:

dotnet add package IronPdf

Once installed, add the IronPDF namespace to your Program class, controller, or services:

using IronPdf;
using IronPdf;
$vbLabelText   $csharpLabel

For detailed installation options including Docker deployment, Azure setup, and additional Linux compatibility, check the complete documentation. The library works smoothly in containerized environments with minimal configuration, making it ideal for microservices architectures. You can also configure it for AWS Lambda environments, Windows servers, or macOS systems. The installation overview provides platform-specific guidance, while advanced NuGet options cover enterprise deployment scenarios.

How Can You Extract Text from PDF Files?

IronPDF's ExtractAllText method provides instant access to all text content within a PDF document. This method handles various text encodings and maintains the reading order of the original document, ensuring accurate data extraction from PDF files in ASP.NET Core applications. The extraction process is thread-safe and improved for high-performance scenarios. It supports UTF-8 encoding for international languages.

// Load a PDF document
var pdf = PdfDocument.FromFile("document.pdf");
// Extract all text
string allText = pdf.ExtractAllText();
// Extract text from specific page (0-indexed)
string pageText = pdf.ExtractTextFromPage(0); // current page
// Load a PDF document
var pdf = PdfDocument.FromFile("document.pdf");
// Extract all text
string allText = pdf.ExtractAllText();
// Extract text from specific page (0-indexed)
string pageText = pdf.ExtractTextFromPage(0); // current page
$vbLabelText   $csharpLabel

The ExtractAllText method returns a string containing all readable text from the PDF, preserving line breaks and spacing. For page-specific extraction, ExtractTextFromPage allows targeting individual pages using zero-based indexing. This approach works smoothly with encrypted PDFs when you provide the correct password. The text extraction supports UTF-8 encoding and international languages. You can also parse specific regions or work with PDF DOM objects for more granular control.

Here's a practical ASP.NET Core controller implementation that demonstrates how to read data from PDF files using minimal memory allocation:

What's the Best Way to Implement Text Extraction in a Controller?

[ApiController]
[Route("api/[controller]")]
public class PdfController : ControllerBase
{
    [HttpPost("extract-text")]
    public IActionResult ExtractText(IFormFile pdfFile)
    {
        using var stream = new MemoryStream();
        pdfFile.CopyTo(stream);
        var pdf = new PdfDocument(stream.ToArray());
        var extractedText = pdf.ExtractAllText();
        return Ok(new { text = extractedText });
    }
}
[ApiController]
[Route("api/[controller]")]
public class PdfController : ControllerBase
{
    [HttpPost("extract-text")]
    public IActionResult ExtractText(IFormFile pdfFile)
    {
        using var stream = new MemoryStream();
        pdfFile.CopyTo(stream);
        var pdf = new PdfDocument(stream.ToArray());
        var extractedText = pdf.ExtractAllText();
        return Ok(new { text = extractedText });
    }
}
$vbLabelText   $csharpLabel

This sample code handles uploaded PDF files efficiently. The IFormFile parameter works with Razor Pages or MVC controllers, while the MemoryStream ensures smooth processing without creating temporary files, ultimately improving response time. You can download, save, or process the extracted text for database storage, generating reports, or displaying content in the browser using responsive HTML rendering. Consider implementing async patterns for better scalability and custom logging for monitoring extraction operations. For Blazor Server applications, the same approach applies with minor adjustments to the component model.

How Do You Read PDF Form Data?

PDF forms contain interactive fields that users fill out. IronPDF simplifies extracting this form data through its complete forms API, supporting all standard AcroForm field types. You can easily extract all form field data, including text boxes, checkboxes, and content type details. The library handles digital signatures and form validation automatically. It also supports PDF/A compliance for archival requirements and Section 508 accessibility standards.

The response can then be saved to a database, returned to the user, or integrated into your ASP.NET application workflow. For Azure deployments, consider using blob storage for processed form data. The following code demonstrates how to do just this:

[HttpPost("extract-form")]
        public IActionResult ExtractForm([FromForm] IFormFile pdfFile)
        {
            if (pdfFile == null || pdfFile.Length == 0)
            {
                return BadRequest("No PDF file uploaded.");
            }
            using var stream = new MemoryStream();
            pdfFile.CopyTo(stream);
            var pdf = new PdfDocument(stream.ToArray());
            var formData = new Dictionary<string, string>();
            if (pdf.Form != null)
            {
                foreach (var field in pdf.Form)
                {
                    formData[field.Name] = field.Value;
                }
            }
            return Ok(new { formFields = formData });
        }
[HttpPost("extract-form")]
        public IActionResult ExtractForm([FromForm] IFormFile pdfFile)
        {
            if (pdfFile == null || pdfFile.Length == 0)
            {
                return BadRequest("No PDF file uploaded.");
            }
            using var stream = new MemoryStream();
            pdfFile.CopyTo(stream);
            var pdf = new PdfDocument(stream.ToArray());
            var formData = new Dictionary<string, string>();
            if (pdf.Form != null)
            {
                foreach (var field in pdf.Form)
                {
                    formData[field.Name] = field.Value;
                }
            }
            return Ok(new { formFields = formData });
        }
$vbLabelText   $csharpLabel

The ExtractForm endpoint uses the Form property of PdfDocument to read interactive fields from an uploaded PDF. Each field has a Name and Value, which are collected into a dictionary and returned as JSON. This makes it easy to capture data from text boxes, checkboxes, and other inputs, allowing PDF form submissions to be processed and integrated directly into your applications or databases. For custom logging of form processing events, integrate with your preferred logging framework. You can also flatten forms to prevent further editing or add new form fields programmatically.

Why Does Form Extraction Return JSON Format?

API response showing JSON data extracted from a PDF form with Name, Email, and Address fields displayed in Postman testing interface with 200 OK status

JSON format ensures compatibility with modern web APIs and microservices architectures. This standardized format works smoothly with RESTful services, message queues, and cloud storage solutions. The lightweight structure minimizes network overhead in distributed systems. It's also ideal for AJAX requests and Angular applications. The format enables async operations and works well with OpenAI integration for intelligent document processing.

How Can You Extract Table Data from PDF Documents?

Tables in PDFs are essentially formatted text, requiring parsing logic to extract structured data. IronPDF extracts the text while preserving spacing, which you can then parse to read data from PDF files in ASP.NET. For complex tables, consider using DOM object access to go to the document structure programmatically. The library handles multi-column layouts and preserves font formatting during extraction:

[HttpPost("extract-table")]
public IActionResult ExtractTable([FromForm] IFormFile pdfFile)
{
    if (pdfFile == null || pdfFile.Length == 0)
        return BadRequest("No PDF file uploaded.");
    using var memoryStream = new MemoryStream();
    pdfFile.CopyTo(memoryStream);
    // Load PDF from byte array
    var pdf = new PdfDocument(memoryStream.ToArray());
    // Extract all text
    string text = pdf.ExtractAllText();
    // Split text into lines (rows)
    string[] lines = text.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
    var tableData = new List<string[]>();
    foreach (string line in lines)
    {
        // Split line into columns using tab character
        string[] columns = line
            .Split('\t')
            .Where(c => !string.IsNullOrWhiteSpace(c))
            .ToArray();
        if (columns.Length > 0)
            tableData.Add(columns);
    }
    var table = tableData.Select(r => string.Join(" | ", r)).ToList();
    return Ok(new { Table = table });
}
[HttpPost("extract-table")]
public IActionResult ExtractTable([FromForm] IFormFile pdfFile)
{
    if (pdfFile == null || pdfFile.Length == 0)
        return BadRequest("No PDF file uploaded.");
    using var memoryStream = new MemoryStream();
    pdfFile.CopyTo(memoryStream);
    // Load PDF from byte array
    var pdf = new PdfDocument(memoryStream.ToArray());
    // Extract all text
    string text = pdf.ExtractAllText();
    // Split text into lines (rows)
    string[] lines = text.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
    var tableData = new List<string[]>();
    foreach (string line in lines)
    {
        // Split line into columns using tab character
        string[] columns = line
            .Split('\t')
            .Where(c => !string.IsNullOrWhiteSpace(c))
            .ToArray();
        if (columns.Length > 0)
            tableData.Add(columns);
    }
    var table = tableData.Select(r => string.Join(" | ", r)).ToList();
    return Ok(new { Table = table });
}
$vbLabelText   $csharpLabel

This code extracts text and splits it into potential table rows based on spacing patterns. For more complex tables, you might need to identify table boundaries using keywords or implement more sophisticated parsing logic based on your specific PDF structure. Consider using parallel processing for large documents with multiple tables. You can also convert tables to HTML for easier manipulation or export to Excel for further analysis.

This output can be downloaded, displayed in the browser, or processed for additional information. You can integrate CSS formatting or HTML string rendering to display tables dynamically in your solution. For high-performance scenarios, cache parsed table data to avoid repeated processing. Consider compression techniques to reduce file sizes when storing extracted data.

When Should You Parse Tables Manually vs Using Built-in Methods?

API response displaying structured invoice data extracted from PDF including customer details, invoice metadata, and itemized products with pricing in JSON format

Manual parsing provides flexibility for non-standard table formats, while built-in methods offer better performance for standard layouts. Choose manual parsing when dealing with merged cells, nested tables, or custom formatting. Use built-in extraction for standard tabular data with consistent column spacing. For complex layouts, consider preprocessing with JavaScript or using custom rendering options. The Chrome rendering engine ensures accurate text positioning for most table formats.

How Do You Handle Uploaded PDF Files in ASP.NET Core?

Processing uploaded PDFs requires converting the IFormFile to a format IronPDF can read. This approach works smoothly with Razor Pages and MVC controllers. For containerized applications, ensure proper memory allocation settings. The process supports large files and batch operations:

[HttpPost("process-upload")]
        public async Task<IActionResult> ProcessPdf([FromForm] IFormFile file)
        {
            if (file == null || file.Length == 0)
                return BadRequest("No PDF file uploaded.");
            using var ms = new MemoryStream();
            await file.CopyToAsync(ms);
            // Load PDF from byte array
            var pdf = new PdfDocument(ms.ToArray());
            // Extract text and page count
            var text = pdf.ExtractAllText();
            var pageCount = pdf.PageCount;
            return Ok(new
            {
                text = text,
                pages = pageCount
            });
        }
[HttpPost("process-upload")]
        public async Task<IActionResult> ProcessPdf([FromForm] IFormFile file)
        {
            if (file == null || file.Length == 0)
                return BadRequest("No PDF file uploaded.");
            using var ms = new MemoryStream();
            await file.CopyToAsync(ms);
            // Load PDF from byte array
            var pdf = new PdfDocument(ms.ToArray());
            // Extract text and page count
            var text = pdf.ExtractAllText();
            var pageCount = pdf.PageCount;
            return Ok(new
            {
                text = text,
                pages = pageCount
            });
        }
$vbLabelText   $csharpLabel

This asynchronous task ensures non-blocking processing and works with event handlers and standard .NET libraries. Use ReturnFile with Content-Disposition headers to allow users to download processed PDF files securely. For additional security, consider implementing file validation before processing. The async pattern improves scalability in cloud deployments. You can also implement custom watermarks or digital signatures during processing. For MAUI applications, similar patterns apply with platform-specific adjustments.

How Can You Improve File Upload Performance?

Implement streaming uploads for large files to reduce memory usage. Configure appropriate request size limits in your IIS settings or Kestrel configuration. For AWS Lambda deployments, consider using pre-signed S3 URLs for direct uploads, bypassing your API entirely. Use render delays for JavaScript-heavy content and custom timeouts for large documents. Enable linearization for faster web viewing and implement progressive rendering for better user experience. Consider IronPDF.Slim for reduced deployment sizes in serverless environments.

What Are the Next Steps for PDF Data Extraction?

IronPDF makes it easy to read, extract, process, and save PDF documents in ASP.NET Core and other .NET Core applications. Whether you're working with forms, tables, plain text, or digital signatures, this .NET library simplifies tasks that normally take hours into just a few lines of code. You can create, convert, access, and display outputs in HTML, browser, or even image formats. The library supports PDF/A compliance for long-term archiving and Section 508 standards for accessibility.

For production deployments, consider implementing health check endpoints to monitor PDF processing services. Use custom logging to track extraction performance and errors. Implement retry policies for handling transient failures in distributed systems. Configure rendering options for optimal performance and implement caching strategies for frequently accessed documents. The library integrates well with CI/CD pipelines and supports headless rendering for server environments.

Start with a free trial to explore IronPDF's full capabilities in your ASP.NET Core projects. You can build and test your PDF extraction workflows before committing to a license. For production, IronPDF offers flexible licensing options suitable for solo developers or large teams. The library supports containerized deployments and provides complete documentation for teams. Using IronPDF is one of the fastest ways to handle PDF files in ASP.NET Core without the usual deployment headaches. Check out the quickstart guide for immediate implementation or explore advanced features like OCR capabilities and barcode generation for complete document processing solutions.

Frequently Asked Questions

What challenges can arise when working with PDF files in .NET Core applications?

Working with PDF files in .NET Core can be tricky due to the need to extract text, grab form data, or parse tables without overly complex libraries.

How can IronPDF help simplify reading data from PDF files in ASP.NET?

IronPDF simplifies reading and processing PDF documents by eliminating the need for messy dependencies or extensive custom parsing code.

Why is it important to avoid overly complex libraries when handling PDFs?

Using overly complex libraries can slow down projects and increase development time, whereas simpler solutions like IronPDF streamline the process.

What types of data can IronPDF extract from PDF files?

IronPDF can extract text, form data, and tables from PDF files, making it versatile for various data handling needs.

Can IronPDF be used to process uploaded invoices in ASP.NET applications?

Yes, IronPDF can efficiently read and process text from uploaded invoices in ASP.NET applications.

Is it necessary to write custom parsing code when using IronPDF?

No, IronPDF allows you to process PDF documents without the need for extensive custom parsing code.

What are the benefits of using IronPDF in .NET Core applications?

IronPDF provides a straightforward way to read and process PDF files, enhancing data handling capabilities without complex dependencies.

.NET 10 — Is IronPDF fully compatible with it?

Yes. IronPDF is designed to be fully compatible with .NET 10 (as well as .NET 9, 8, 7, 6, 5, Core, Standard, and Framework 4.6.2+), ensuring that you can run all its PDF reading and writing features without special workarounds on the latest .NET platform.

Does IronPDF support the latest APIs in .NET 10 for reading streamed PDF content?

Yes. In .NET 10, IronPDF can process PDF data from byte arrays or memory streams—using APIs like Stream and MemoryStream—allowing you to read PDFs without saving temporary files. This makes it suitable for high-performance server scenarios and for uploading or processing PDF data in web APIs.

Curtis Chau
Technical Writer

Curtis Chau holds a Bachelor’s degree in Computer Science (Carleton University) and specializes in front-end development with expertise in Node.js, TypeScript, JavaScript, and React. Passionate about crafting intuitive and aesthetically pleasing user interfaces, Curtis enjoys working with modern frameworks and creating well-structured, visually appealing manuals.

...

Read More