USING IRONPDF

How to Read Data from a PDF in ASP.NET Core

Updated:September 21, 2025

Working with PDF files in .NET Core applications can be trickier than it seems. You might need to extract text from uploaded invoices, grab form data from surveys, or parse tables for your database. I’ve seen plenty of projects slowed down because developers used overly complex libraries. That’s where IronPDF comes in. It lets you read and process PDF documents without wrestling with messy dependencies or writing tons of custom parsing code.

Whether you’re handling simple text, digital signatures, or structured data, IronPDF makes it easy. This guide shows you how to read data from PDF file in ASP.NET, handle IFormFile, work with byte arrays, and even return files to the user or render it as an HTML string. You can also integrate it into your solution, display outputs in the browser, or store them in a database.

How Do You Set Up IronPDF in ASP.NET Core?

Getting started with IronPDF in your ASP.NET Core project takes just minutes. Install the IronPDF NuGet package via the NuGet Package Manager Console with the following command:

Install-Package IronPdf

Or through the .NET CLI:

dotnet add package IronPdf

Once installed, add the IronPDF namespace to your class Program, controller, or services:

using IronPdf;

using IronPdf;

Imports IronPdf

$vbLabelText $csharpLabel

For detailed installation options including Docker deployment, Azure setup, and additional information, check the comprehensive documentation.

How Can You Extract Text from PDF Files?

IronPDF's ExtractAllText method provides instant access to all text content within a PDF document. This method handles various text encodings and maintains the reading order of the original document, ensuring accurate data extraction from PDF files in ASP.NET Core applications.

// Load a PDF document
var pdf = PdfDocument.FromFile("document.pdf");
// Extract all text
string allText = pdf.ExtractAllText();
// Extract text from specific page (0-indexed)
string pageText = pdf.ExtractTextFromPage(0); // current page

// Load a PDF document
var pdf = PdfDocument.FromFile("document.pdf");
// Extract all text
string allText = pdf.ExtractAllText();
// Extract text from specific page (0-indexed)
string pageText = pdf.ExtractTextFromPage(0); // current page

IRON VB CONVERTER ERROR developers@ironsoftware.com

$vbLabelText $csharpLabel

The ExtractAllText method returns a string containing all readable text from the PDF, preserving line breaks and spacing. For page-specific extraction, ExtractTextFromPage allows targeting individual pages using zero-based indexing. This approach works seamlessly with encrypted PDFs when you provide the correct password.

Here's a practical ASP.NET Core controller implementation that demonstrates how to read data from PDF files using this PDF:

Example Code

[ApiController]
[Route("api/[controller]")]
public class PdfController : ControllerBase
{
    [HttpPost("extract-text")]
    public IActionResult ExtractText(IFormFile pdfFile)
    {
        using var stream = new MemoryStream();
        pdfFile.CopyTo(stream);
        var pdf = new PdfDocument(stream.ToArray());
        var extractedText = pdf.ExtractAllText();
        return Ok(new { text = extractedText });
    }
}

[ApiController]
[Route("api/[controller]")]
public class PdfController : ControllerBase
{
    [HttpPost("extract-text")]
    public IActionResult ExtractText(IFormFile pdfFile)
    {
        using var stream = new MemoryStream();
        pdfFile.CopyTo(stream);
        var pdf = new PdfDocument(stream.ToArray());
        var extractedText = pdf.ExtractAllText();
        return Ok(new { text = extractedText });
    }
}

IRON VB CONVERTER ERROR developers@ironsoftware.com

$vbLabelText $csharpLabel

This sample code handles uploaded PDF files efficiently. The IFormFile parameter works with Razor Pages or MVC controllers, while the MemoryStream ensures smooth processing without creating temporary files, ultimately improving the response time . Developers can download, save, or process the extracted text for database storage, generating reports, or displaying content in the browser.

How Do You Read PDF Form Data?

PDF forms contain interactive fields that users fill out. IronPDF simplifies extracting this form data through its comprehensive forms API, supporting all standard AcroForm field types. You can easily extract all form field data, including text boxes, checkboxes, and content type details.

The response can then be saved to a database, returned to the user, or integrated into your ASP.NET application workflow. The following code demonstrates how to do just this:

[HttpPost("extract-form")]
        public IActionResult ExtractForm([FromForm] IFormFile pdfFile)
        {
            if (pdfFile == null || pdfFile.Length == 0)
            {
                return BadRequest("No PDF file uploaded.");
            }
            using var stream = new MemoryStream();
            pdfFile.CopyTo(stream);
            var pdf = new PdfDocument(stream.ToArray());
            var formData = new Dictionary<string, string>();
            if (pdf.Form != null)
            {
                foreach (var field in pdf.Form)
                {
                    formData[field.Name] = field.Value;
                }
            }
            return Ok(new { formFields = formData });
        }

[HttpPost("extract-form")]
        public IActionResult ExtractForm([FromForm] IFormFile pdfFile)
        {
            if (pdfFile == null || pdfFile.Length == 0)
            {
                return BadRequest("No PDF file uploaded.");
            }
            using var stream = new MemoryStream();
            pdfFile.CopyTo(stream);
            var pdf = new PdfDocument(stream.ToArray());
            var formData = new Dictionary<string, string>();
            if (pdf.Form != null)
            {
                foreach (var field in pdf.Form)
                {
                    formData[field.Name] = field.Value;
                }
            }
            return Ok(new { formFields = formData });
        }

IRON VB CONVERTER ERROR developers@ironsoftware.com

$vbLabelText $csharpLabel

The ExtractForm endpoint uses the Form property of PdfDocument to read interactive fields from an uploaded PDF. Each field has a Name and Value, which are collected into a dictionary and returned as JSON. This makes it easy to capture data from text boxes, checkboxes, and other inputs, allowing PDF form submissions to be processed and integrated directly into your applications or databases.

Output

How to Read Data from a PDF in ASP.NET Core: Figure 3 - HTTP Postman

How Can You Extract Table Data from PDF Documents?

Tables in PDFs are essentially formatted text, requiring parsing logic to extract structured data. IronPDF extracts the text while preserving spacing, which you can then parse to read data from PDF file in ASP.NET:

[HttpPost("extract-table")]
public IActionResult ExtractTable([FromForm] IFormFile pdfFile)
{
    if (pdfFile == null || pdfFile.Length == 0)
        return BadRequest("No PDF file uploaded.");
    using var memoryStream = new MemoryStream();
    pdfFile.CopyTo(memoryStream);
    // Load PDF from byte array
    var pdf = new PdfDocument(memoryStream.ToArray());
    // Extract all text
    string text = pdf.ExtractAllText();
    // Split text into lines (rows)
    string[] lines = text.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
    var tableData = new List<string[]>();
    foreach (string line in lines)
    {
        // Split line into columns using tab character
        string[] columns = line
            .Split('\t')
            .Where(c => !string.IsNullOrWhiteSpace(c))
            .ToArray();
        if (columns.Length > 0)
            tableData.Add(columns);
    }
    var table = tableData.Select(r => string.Join(" | ", r)).ToList();
    return Ok(new { Table = table });
}

[HttpPost("extract-table")]
public IActionResult ExtractTable([FromForm] IFormFile pdfFile)
{
    if (pdfFile == null || pdfFile.Length == 0)
        return BadRequest("No PDF file uploaded.");
    using var memoryStream = new MemoryStream();
    pdfFile.CopyTo(memoryStream);
    // Load PDF from byte array
    var pdf = new PdfDocument(memoryStream.ToArray());
    // Extract all text
    string text = pdf.ExtractAllText();
    // Split text into lines (rows)
    string[] lines = text.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
    var tableData = new List<string[]>();
    foreach (string line in lines)
    {
        // Split line into columns using tab character
        string[] columns = line
            .Split('\t')
            .Where(c => !string.IsNullOrWhiteSpace(c))
            .ToArray();
        if (columns.Length > 0)
            tableData.Add(columns);
    }
    var table = tableData.Select(r => string.Join(" | ", r)).ToList();
    return Ok(new { Table = table });
}

IRON VB CONVERTER ERROR developers@ironsoftware.com

$vbLabelText $csharpLabel

This code extracts text and splits it into potential table rows based on spacing patterns. For more complex tables, you might need to identify table boundaries using keywords or implement more sophisticated parsing logic based on your specific PDF structure.

This output can be downloaded, displayed in the browser, or processed for additional information. You can integrate CSS formatting or HTML string rendering to display tables dynamically in your solution.

Output

How to Read Data from a PDF in ASP.NET Core: Figure 4

How Do You Handle Uploaded PDF Files in ASP.NET Core?

Processing uploaded PDFs requires converting the IFormFile to a format IronPDF can read. This approach works seamlessly with Razor Pages and MVC controllers:

[HttpPost("process-upload")]
        public async Task<IActionResult> ProcessPdf([FromForm] IFormFile file)
        {
            if (file == null || file.Length == 0)
                return BadRequest("No PDF file uploaded.");
            using var ms = new MemoryStream();
            await file.CopyToAsync(ms);
            // Load PDF from byte array
            var pdf = new PdfDocument(ms.ToArray());
            // Extract text and page count
            var text = pdf.ExtractAllText();
            var pageCount = pdf.PageCount;
            return Ok(new
            {
                text = text,
                pages = pageCount
            });
        }

[HttpPost("process-upload")]
        public async Task<IActionResult> ProcessPdf([FromForm] IFormFile file)
        {
            if (file == null || file.Length == 0)
                return BadRequest("No PDF file uploaded.");
            using var ms = new MemoryStream();
            await file.CopyToAsync(ms);
            // Load PDF from byte array
            var pdf = new PdfDocument(ms.ToArray());
            // Extract text and page count
            var text = pdf.ExtractAllText();
            var pageCount = pdf.PageCount;
            return Ok(new
            {
                text = text,
                pages = pageCount
            });
        }

IRON VB CONVERTER ERROR developers@ironsoftware.com

$vbLabelText $csharpLabel

This asynchronous task ensures non-blocking processing and works with object sender, eventargs e, and standard .NET libraries. Use ReturnFile with Content-Disposition headers to allow users to download processed PDF files securely. For additional security, consider implementing file validation before processing.

Conclusion

IronPDF makes it easy to read, extract, process, and save PDF documents in ASP.NET Core and other .NET Core applications. Whether you’re working with forms, tables, plain text, or digital signatures, this .NET library simplifies tasks that normally take hours into just a few lines of code. You can create, convert, access, and display outputs in HTML, browser, or even image formats.

Start with a free trial to explore IronPDF's full capabilities in your ASP.NET Core. You can build and test your PDF extraction workflows before committing to a license. For production, IronPDF offers flexible options suitable for solo developers or large solutions. Honestly, using IronPDF is one of the fastest ways I’ve found to handle PDF files in ASP.NET Core without the usual headaches.

Frequently Asked Questions

What challenges can arise when working with PDF files in .NET Core applications?

Working with PDF files in .NET Core can be tricky due to the need to extract text, grab form data, or parse tables without overly complex libraries.

How can IronPDF help simplify reading data from PDF files in ASP.NET?

IronPDF simplifies reading and processing PDF documents by eliminating the need for messy dependencies or extensive custom parsing code.

Why is it important to avoid overly complex libraries when handling PDFs?

Using overly complex libraries can slow down projects and increase development time, whereas simpler solutions like IronPDF streamline the process.

What types of data can IronPDF extract from PDF files?

IronPDF can extract text, form data, and tables from PDF files, making it versatile for various data handling needs.

Can IronPDF be used to process uploaded invoices in ASP.NET applications?

Yes, IronPDF can efficiently read and process text from uploaded invoices in ASP.NET applications.

Is it necessary to write custom parsing code when using IronPDF?

No, IronPDF allows you to process PDF documents without the need for extensive custom parsing code.

What are the benefits of using IronPDF in .NET Core applications?

IronPDF provides a straightforward way to read and process PDF files, enhancing data handling capabilities without complex dependencies.

Curtis Chau

Chat with engineering team now

Technical Writer

Curtis Chau holds a Bachelor’s degree in Computer Science (Carleton University) and specializes in front-end development with expertise in Node.js, TypeScript, JavaScript, and React. Passionate about crafting intuitive and aesthetically pleasing user interfaces, Curtis enjoys working with modern frameworks and creating well-structured, visually appealing manuals.

...