How to Read Data from PDF Files in ASP.NET Core
IronPDF simplifies PDF data extraction in ASP.NET Core by providing methods to read text, form data, and tables from PDF files using straightforward C# code without complex dependencies or manual parsing.
Working with PDF files in .NET applications can be more challenging than it first appears. You might need to extract text from uploaded invoices, retrieve form data from surveys, or parse tables for your database. Many projects slow down because developers reach for overly complex libraries that require extensive custom parsing code. IronPDF offers a straightforward alternative, letting you read and process PDF documents with minimal setup.
Whether you're handling simple text, interactive form fields, or structured tabular data, IronPDF's API gives you direct access to PDF content without low-level parsing. This guide walks through how to read data from PDF files in ASP.NET Core, covering text extraction, form data retrieval, table parsing, and asynchronous file upload handling -- all with C# code you can drop into your project.
How Do You Set Up IronPDF in an ASP.NET Core Project?
Getting started is straightforward. Install the IronPDF NuGet package from the NuGet Package Manager Console or the .NET CLI using either of these commands:
Install-Package IronPdf
dotnet add package IronPdf
Install-Package IronPdf
dotnet add package IronPdf
Once the package is installed, add the IronPDF namespace at the top of any file that works with PDF documents:
using IronPdf;
using IronPdf;
Imports IronPdf
That's all the setup required for most projects. IronPDF does not depend on external rendering processes or additional native dependencies on Windows. For Linux or Docker environments, consult the IronPDF documentation for platform-specific guidance.
A free trial license lets you test the full feature set before you commit to production use. You can get a trial license directly from the IronPDF site and apply it in a single line of code before your first PDF operation.
How Do You Extract Text from a PDF File?
Text extraction is the most common PDF reading task. IronPDF provides ExtractAllText to pull all readable text from a document and ExtractTextFromPage for page-level access. Both methods preserve reading order and handle standard text encodings.
// Load a PDF document from disk
var pdf = PdfDocument.FromFile("document.pdf");
// Extract all text from every page
string allText = pdf.ExtractAllText();
// Extract text from a specific page (zero-based index)
string pageOneText = pdf.ExtractTextFromPage(0);
Console.WriteLine(allText);
// Load a PDF document from disk
var pdf = PdfDocument.FromFile("document.pdf");
// Extract all text from every page
string allText = pdf.ExtractAllText();
// Extract text from a specific page (zero-based index)
string pageOneText = pdf.ExtractTextFromPage(0);
Console.WriteLine(allText);
Imports System
' Load a PDF document from disk
Dim pdf = PdfDocument.FromFile("document.pdf")
' Extract all text from every page
Dim allText As String = pdf.ExtractAllText()
' Extract text from a specific page (zero-based index)
Dim pageOneText As String = pdf.ExtractTextFromPage(0)
Console.WriteLine(allText)
ExtractAllText returns the complete text content as a single string, preserving line breaks. ExtractTextFromPage targets a single page using a zero-based index, which is useful when you only need content from a specific section of a multi-page document.
For an in-depth look at text and image extraction options, the extract text from PDF guide covers advanced scenarios including region-based extraction.
How Do You Wire Text Extraction Into an ASP.NET Core Controller?
The following controller action accepts an uploaded PDF via IFormFile, reads it into a MemoryStream, and returns the extracted text as JSON:
using IronPdf;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using System.IO;
[ApiController]
[Route("api/[controller]")]
public class PdfController : ControllerBase
{
[HttpPost("extract-text")]
public IActionResult ExtractText(IFormFile pdfFile)
{
if (pdfFile == null || pdfFile.Length == 0)
return BadRequest("No PDF file uploaded.");
using var stream = new MemoryStream();
pdfFile.CopyTo(stream);
var pdf = new PdfDocument(stream.ToArray());
string extractedText = pdf.ExtractAllText();
return Ok(new { text = extractedText });
}
}
using IronPdf;
using Microsoft.AspNetCore.Http;
using Microsoft.AspNetCore.Mvc;
using System.IO;
[ApiController]
[Route("api/[controller]")]
public class PdfController : ControllerBase
{
[HttpPost("extract-text")]
public IActionResult ExtractText(IFormFile pdfFile)
{
if (pdfFile == null || pdfFile.Length == 0)
return BadRequest("No PDF file uploaded.");
using var stream = new MemoryStream();
pdfFile.CopyTo(stream);
var pdf = new PdfDocument(stream.ToArray());
string extractedText = pdf.ExtractAllText();
return Ok(new { text = extractedText });
}
}
Imports IronPdf
Imports Microsoft.AspNetCore.Http
Imports Microsoft.AspNetCore.Mvc
Imports System.IO
<ApiController>
<Route("api/[controller]")>
Public Class PdfController
Inherits ControllerBase
<HttpPost("extract-text")>
Public Function ExtractText(pdfFile As IFormFile) As IActionResult
If pdfFile Is Nothing OrElse pdfFile.Length = 0 Then
Return BadRequest("No PDF file uploaded.")
End If
Using stream As New MemoryStream()
pdfFile.CopyTo(stream)
Dim pdf As New PdfDocument(stream.ToArray())
Dim extractedText As String = pdf.ExtractAllText()
Return Ok(New With {.text = extractedText})
End Using
End Function
End Class
This endpoint converts the uploaded file to a byte array and passes it directly to PdfDocument. No temporary files are written to disk, which keeps the code clean and avoids unnecessary storage overhead. The IFormFile interface works naturally with both multipart form submissions and API clients like Postman.
How Do You Read PDF Form Data in ASP.NET Core?
PDF forms -- also called AcroForms -- contain interactive fields that users fill in. IronPDF exposes form fields through the Form property of PdfDocument, giving you the name and value of every field in the document.
The following endpoint reads an uploaded form PDF and returns all field values as a JSON dictionary:
[HttpPost("extract-form")]
public IActionResult ExtractForm([FromForm] IFormFile pdfFile)
{
if (pdfFile == null || pdfFile.Length == 0)
return BadRequest("No PDF file uploaded.");
using var stream = new MemoryStream();
pdfFile.CopyTo(stream);
var pdf = new PdfDocument(stream.ToArray());
var formData = new Dictionary<string, string>();
if (pdf.Form != null)
{
foreach (var field in pdf.Form)
{
formData[field.Name] = field.Value;
}
}
return Ok(new { formFields = formData });
}
[HttpPost("extract-form")]
public IActionResult ExtractForm([FromForm] IFormFile pdfFile)
{
if (pdfFile == null || pdfFile.Length == 0)
return BadRequest("No PDF file uploaded.");
using var stream = new MemoryStream();
pdfFile.CopyTo(stream);
var pdf = new PdfDocument(stream.ToArray());
var formData = new Dictionary<string, string>();
if (pdf.Form != null)
{
foreach (var field in pdf.Form)
{
formData[field.Name] = field.Value;
}
}
return Ok(new { formFields = formData });
}
Imports Microsoft.AspNetCore.Mvc
Imports System.IO
<HttpPost("extract-form")>
Public Function ExtractForm(<FromForm> pdfFile As IFormFile) As IActionResult
If pdfFile Is Nothing OrElse pdfFile.Length = 0 Then
Return BadRequest("No PDF file uploaded.")
End If
Using stream As New MemoryStream()
pdfFile.CopyTo(stream)
Dim pdf = New PdfDocument(stream.ToArray())
Dim formData As New Dictionary(Of String, String)()
If pdf.Form IsNot Nothing Then
For Each field In pdf.Form
formData(field.Name) = field.Value
Next
End If
Return Ok(New With {.formFields = formData})
End Using
End Function
Each field in pdf.Form has a Name property (the field identifier set in the PDF authoring tool) and a Value property (the text or selection the user entered). Text boxes, checkboxes, radio buttons, and dropdowns all appear in this collection.
The JSON response makes it easy to forward form submissions to a database, a third-party API, or a message queue without any additional parsing. For workflows that involve creating or editing PDF forms programmatically, the PDF forms guide shows how to add fields and pre-fill values.
What Does a Typical Form Extraction Response Look Like?

The response above shows a 200 OK result containing the field names and values from a sample contact form PDF. The structure is a flat key-value map, which maps cleanly to most database schemas or REST payloads.
How Do You Extract Table Data from a PDF?
Tables in PDF files are stored as positioned text -- there is no native table data structure in the PDF format. Extracting tabular data therefore means extracting the raw text and then applying parsing logic to reconstruct rows and columns.
IronPDF's ExtractAllText preserves whitespace and tab characters, which makes it possible to split lines into columns programmatically. The following controller action demonstrates this approach:
[HttpPost("extract-table")]
public IActionResult ExtractTable([FromForm] IFormFile pdfFile)
{
if (pdfFile == null || pdfFile.Length == 0)
return BadRequest("No PDF file uploaded.");
using var memoryStream = new MemoryStream();
pdfFile.CopyTo(memoryStream);
var pdf = new PdfDocument(memoryStream.ToArray());
string text = pdf.ExtractAllText();
// Split into lines, then split each line into columns
string[] lines = text.Split(
new[] { '\r', '\n' },
StringSplitOptions.RemoveEmptyEntries
);
var tableData = new List<string[]>();
foreach (string line in lines)
{
string[] columns = line
.Split('\t')
.Where(c => !string.IsNullOrWhiteSpace(c))
.ToArray();
if (columns.Length > 0)
tableData.Add(columns);
}
var table = tableData.Select(r => string.Join(" | ", r)).ToList();
return Ok(new { Table = table });
}
[HttpPost("extract-table")]
public IActionResult ExtractTable([FromForm] IFormFile pdfFile)
{
if (pdfFile == null || pdfFile.Length == 0)
return BadRequest("No PDF file uploaded.");
using var memoryStream = new MemoryStream();
pdfFile.CopyTo(memoryStream);
var pdf = new PdfDocument(memoryStream.ToArray());
string text = pdf.ExtractAllText();
// Split into lines, then split each line into columns
string[] lines = text.Split(
new[] { '\r', '\n' },
StringSplitOptions.RemoveEmptyEntries
);
var tableData = new List<string[]>();
foreach (string line in lines)
{
string[] columns = line
.Split('\t')
.Where(c => !string.IsNullOrWhiteSpace(c))
.ToArray();
if (columns.Length > 0)
tableData.Add(columns);
}
var table = tableData.Select(r => string.Join(" | ", r)).ToList();
return Ok(new { Table = table });
}
Imports Microsoft.AspNetCore.Mvc
Imports System.IO
Imports System.Linq
<HttpPost("extract-table")>
Public Function ExtractTable(<FromForm> pdfFile As IFormFile) As IActionResult
If pdfFile Is Nothing OrElse pdfFile.Length = 0 Then
Return BadRequest("No PDF file uploaded.")
End If
Using memoryStream As New MemoryStream()
pdfFile.CopyTo(memoryStream)
Dim pdf As New PdfDocument(memoryStream.ToArray())
Dim text As String = pdf.ExtractAllText()
' Split into lines, then split each line into columns
Dim lines As String() = text.Split(New Char() {ControlChars.Cr, ControlChars.Lf}, StringSplitOptions.RemoveEmptyEntries)
Dim tableData As New List(Of String())()
For Each line As String In lines
Dim columns As String() = line.Split(ControlChars.Tab).Where(Function(c) Not String.IsNullOrWhiteSpace(c)).ToArray()
If columns.Length > 0 Then
tableData.Add(columns)
End If
Next
Dim table = tableData.Select(Function(r) String.Join(" | ", r)).ToList()
Return Ok(New With {.Table = table})
End Using
End Function
This approach works well for PDFs whose tables use consistent tab-separated columns. For documents where columns are separated by variable whitespace, you may need to apply a minimum-gap heuristic or inspect character positions. The merge or split PDFs guide is useful when you need to isolate specific pages that contain tables before extraction.
When Should You Parse Tables Manually?

Manual parsing is the right choice when the PDF was not generated from HTML or a structured data source -- for example, scanned invoices or documents created in desktop publishing tools. The tab-split approach handles many standard PDFs reliably. When column boundaries are irregular, you can refine the logic by inspecting raw character coordinates through IronPDF's DOM access API.
For documents generated from HTML, consider round-tripping through an HTML intermediary. Generating your PDF from a data-driven HTML template (covered in the HTML string to PDF guide) means the text positions will be predictable and extraction will be straightforward.
How Do You Handle Asynchronous PDF File Uploads?
Production ASP.NET Core applications should handle file uploads asynchronously to avoid blocking the thread pool. The IFormFile.CopyToAsync method combined with async/await keeps the controller non-blocking:
[HttpPost("process-upload")]
public async Task<IActionResult> ProcessPdf([FromForm] IFormFile file)
{
if (file == null || file.Length == 0)
return BadRequest("No PDF file uploaded.");
using var ms = new MemoryStream();
await file.CopyToAsync(ms);
var pdf = new PdfDocument(ms.ToArray());
string text = pdf.ExtractAllText();
int pageCount = pdf.PageCount;
return Ok(new
{
text,
pages = pageCount
});
}
[HttpPost("process-upload")]
public async Task<IActionResult> ProcessPdf([FromForm] IFormFile file)
{
if (file == null || file.Length == 0)
return BadRequest("No PDF file uploaded.");
using var ms = new MemoryStream();
await file.CopyToAsync(ms);
var pdf = new PdfDocument(ms.ToArray());
string text = pdf.ExtractAllText();
int pageCount = pdf.PageCount;
return Ok(new
{
text,
pages = pageCount
});
}
Imports System.IO
Imports Microsoft.AspNetCore.Mvc
<HttpPost("process-upload")>
Public Async Function ProcessPdf(<FromForm> file As IFormFile) As Task(Of IActionResult)
If file Is Nothing OrElse file.Length = 0 Then
Return BadRequest("No PDF file uploaded.")
End If
Using ms As New MemoryStream()
Await file.CopyToAsync(ms)
Dim pdf As New PdfDocument(ms.ToArray())
Dim text As String = pdf.ExtractAllText()
Dim pageCount As Integer = pdf.PageCount
Return Ok(New With {
.text = text,
.pages = pageCount
})
End Using
End Function
The PdfDocument constructor is synchronous, but the upload step -- often the slowest part of the pipeline -- runs asynchronously. This pattern scales well under concurrent load and is compatible with minimal API endpoints, Razor Pages handlers, and gRPC services.
How Do You Limit Upload File Size?
ASP.NET Core enforces a default request body size limit of 30 MB. For larger PDFs, increase the limit in Program.cs:
builder.Services.Configure<FormOptions>(options =>
{
options.MultipartBodyLengthLimit = 100 * 1024 * 1024; // 100 MB
});
builder.Services.Configure<FormOptions>(options =>
{
options.MultipartBodyLengthLimit = 100 * 1024 * 1024; // 100 MB
});
Imports Microsoft.Extensions.DependencyInjection
Imports Microsoft.AspNetCore.Http
builder.Services.Configure(Of FormOptions)(Sub(options)
options.MultipartBodyLengthLimit = 100 * 1024 * 1024 ' 100 MB
End Sub)
Kestrel has its own limit that you may also need to raise:
builder.WebHost.ConfigureKestrel(options =>
{
options.Limits.MaxRequestBodySize = 100 * 1024 * 1024;
});
builder.WebHost.ConfigureKestrel(options =>
{
options.Limits.MaxRequestBodySize = 100 * 1024 * 1024;
});
builder.WebHost.ConfigureKestrel(Sub(options)
options.Limits.MaxRequestBodySize = 100 * 1024 * 1024
End Sub)
Set these values based on the realistic maximum size of the PDFs your application will process. Always validate the uploaded file's MIME type and extension before passing it to IronPDF to guard against unexpected input.
How Do You Convert Extracted PDF Content to Other Formats?
Once you have text or form data, you can pipe it into any downstream process your application requires -- database writes, search indexing, report generation, or API calls. IronPDF also supports converting in the other direction: rendering HTML to PDF.
For cases where you want to display extracted content visually, you can render the original PDF as images using the PDF to image conversion guide. This is useful for document preview features where you want to show page thumbnails without loading the full PDF in the browser.
If you need to protect the output documents before delivering them to users, IronPDF supports digital signatures and watermarks as post-processing steps. Adding headers and footers -- covered in the headers and footers guide -- is similarly straightforward.
| Scenario | IronPDF Method / Property | Notes |
|---|---|---|
| Extract all page text | pdf.ExtractAllText() |
Returns full document text in reading order |
| Extract text from one page | pdf.ExtractTextFromPage(n) |
Zero-based page index |
| Read AcroForm fields | pdf.Form |
Enumerate field.Name and field.Value |
| Parse table rows | ExtractAllText() + split logic |
Split on tab or whitespace gaps |
| Count pages | pdf.PageCount |
Useful for pagination and validation |
| Load from byte array | new PdfDocument(bytes) |
No temporary files required |
| Load from file path | PdfDocument.FromFile(path) |
For server-side file access |
What Are the Next Steps After Setting Up PDF Data Extraction?
You now have working patterns for text extraction, form data reading, table parsing, and asynchronous uploads. Here are a few directions to explore next based on your application's requirements.
If you need to generate PDF reports alongside your extraction workflow, the IronPDF features overview covers HTML-to-PDF rendering, stamp overlays, and page manipulation. For applications that merge reports from multiple sources, the merge or split PDFs guide walks through combining and splitting documents.
For secure document delivery, digital signatures let you certify PDFs before sending them to clients. Custom watermarks add visual branding or draft labels to generated documents.
If your project extracts data from scanned PDFs (images rather than searchable text), you will need an OCR step before calling ExtractAllText. IronOCR from Iron Software integrates with IronPDF to handle scanned document workflows.
IronPDF is available under flexible licensing options for individual developers and teams. Start with a free trial to test all features without restrictions. The complete documentation includes API reference, getting started guides, and deployment notes for Windows, Linux, Docker, and cloud environments.
Reading data from PDF files in ASP.NET Core no longer requires low-level parsing code or heavyweight dependencies. With IronPDF, the path from uploaded file to extracted content is a handful of lines that fit naturally into any controller or service layer.
Frequently Asked Questions
What challenges can arise when working with PDF files in .NET Core applications?
Working with PDF files in .NET Core can be tricky due to the need to extract text, grab form data, or parse tables without overly complex libraries.
How can IronPDF help simplify reading data from PDF files in ASP.NET?
IronPDF simplifies reading and processing PDF documents by eliminating the need for messy dependencies or extensive custom parsing code.
Why is it important to avoid overly complex libraries when handling PDFs?
Using overly complex libraries can slow down projects and increase development time, whereas simpler solutions like IronPDF streamline the process.
What types of data can IronPDF extract from PDF files?
IronPDF can extract text, form data, and tables from PDF files, making it versatile for various data handling needs.
Can IronPDF be used to process uploaded invoices in ASP.NET applications?
Yes, IronPDF can efficiently read and process text from uploaded invoices in ASP.NET applications.
Is it necessary to write custom parsing code when using IronPDF?
No, IronPDF allows you to process PDF documents without the need for extensive custom parsing code.
What are the benefits of using IronPDF in .NET Core applications?
IronPDF provides a straightforward way to read and process PDF files, enhancing data handling capabilities without complex dependencies.
.NET 10 — Is IronPDF fully compatible with it?
Yes. IronPDF is designed to be fully compatible with .NET 10 (as well as .NET 9, 8, 7, 6, 5, Core, Standard, and Framework 4.6.2+), ensuring that you can run all its PDF reading and writing features without special workarounds on the latest .NET platform.
Does IronPDF support the latest APIs in .NET 10 for reading streamed PDF content?
Yes. In .NET 10, IronPDF can process PDF data from byte arrays or memory streams—using APIs like Stream and MemoryStream—allowing you to read PDFs without saving temporary files. This makes it suitable for high-performance server scenarios and for uploading or processing PDF data in web APIs.




