How to Read Data from a PDF in ASP.NET Core
Working with PDF files in .NET Core applications can be trickier than it seems. You might need to extract text from uploaded invoices, grab form data from surveys, or parse tables for your database. I’ve seen plenty of projects slowed down because developers used overly complex libraries. That’s where IronPDF comes in. It lets you read and process PDF documents without wrestling with messy dependencies or writing tons of custom parsing code.
Whether you’re handling simple text, digital signatures, or structured data, IronPDF makes it easy. This guide shows you how to read data from PDF file in ASP.NET, handle IFormFile, work with byte arrays, and even return files to the user or render it as an HTML string. You can also integrate it into your solution, display outputs in the browser, or store them in a database.
How Do You Set Up IronPDF in ASP.NET Core?
Getting started with IronPDF in your ASP.NET Core project takes just minutes. Install the IronPDF NuGet package via the NuGet Package Manager Console with the following command:
Install-Package IronPdf
Or through the .NET CLI:
dotnet add package IronPdf
Once installed, add the IronPDF namespace to your class Program, controller, or services:
using IronPdf;
using IronPdf;
Imports IronPdf
For detailed installation options including Docker deployment, Azure setup, and additional information, check the comprehensive documentation.
How Can You Extract Text from PDF Files?
IronPDF's ExtractAllText method provides instant access to all text content within a PDF document. This method handles various text encodings and maintains the reading order of the original document, ensuring accurate data extraction from PDF files in ASP.NET Core applications.
// Load a PDF document
var pdf = PdfDocument.FromFile("document.pdf");
// Extract all text
string allText = pdf.ExtractAllText();
// Extract text from specific page (0-indexed)
string pageText = pdf.ExtractTextFromPage(0); // current page
// Load a PDF document
var pdf = PdfDocument.FromFile("document.pdf");
// Extract all text
string allText = pdf.ExtractAllText();
// Extract text from specific page (0-indexed)
string pageText = pdf.ExtractTextFromPage(0); // current page
IRON VB CONVERTER ERROR developers@ironsoftware.com
The ExtractAllText method returns a string containing all readable text from the PDF, preserving line breaks and spacing. For page-specific extraction, ExtractTextFromPage allows targeting individual pages using zero-based indexing. This approach works seamlessly with encrypted PDFs when you provide the correct password.
Here's a practical ASP.NET Core controller implementation that demonstrates how to read data from PDF files using this PDF:
Example Code
[ApiController]
[Route("api/[controller]")]
public class PdfController : ControllerBase
{
[HttpPost("extract-text")]
public IActionResult ExtractText(IFormFile pdfFile)
{
using var stream = new MemoryStream();
pdfFile.CopyTo(stream);
var pdf = new PdfDocument(stream.ToArray());
var extractedText = pdf.ExtractAllText();
return Ok(new { text = extractedText });
}
}
[ApiController]
[Route("api/[controller]")]
public class PdfController : ControllerBase
{
[HttpPost("extract-text")]
public IActionResult ExtractText(IFormFile pdfFile)
{
using var stream = new MemoryStream();
pdfFile.CopyTo(stream);
var pdf = new PdfDocument(stream.ToArray());
var extractedText = pdf.ExtractAllText();
return Ok(new { text = extractedText });
}
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
This sample code handles uploaded PDF files efficiently. The IFormFile parameter works with Razor Pages or MVC controllers, while the MemoryStream ensures smooth processing without creating temporary files, ultimately improving the response time . Developers can download, save, or process the extracted text for database storage, generating reports, or displaying content in the browser.
How Do You Read PDF Form Data?
PDF forms contain interactive fields that users fill out. IronPDF simplifies extracting this form data through its comprehensive forms API, supporting all standard AcroForm field types. You can easily extract all form field data, including text boxes, checkboxes, and content type details.
The response can then be saved to a database, returned to the user, or integrated into your ASP.NET application workflow. The following code demonstrates how to do just this:
[HttpPost("extract-form")]
public IActionResult ExtractForm([FromForm] IFormFile pdfFile)
{
if (pdfFile == null || pdfFile.Length == 0)
{
return BadRequest("No PDF file uploaded.");
}
using var stream = new MemoryStream();
pdfFile.CopyTo(stream);
var pdf = new PdfDocument(stream.ToArray());
var formData = new Dictionary<string, string>();
if (pdf.Form != null)
{
foreach (var field in pdf.Form)
{
formData[field.Name] = field.Value;
}
}
return Ok(new { formFields = formData });
}
[HttpPost("extract-form")]
public IActionResult ExtractForm([FromForm] IFormFile pdfFile)
{
if (pdfFile == null || pdfFile.Length == 0)
{
return BadRequest("No PDF file uploaded.");
}
using var stream = new MemoryStream();
pdfFile.CopyTo(stream);
var pdf = new PdfDocument(stream.ToArray());
var formData = new Dictionary<string, string>();
if (pdf.Form != null)
{
foreach (var field in pdf.Form)
{
formData[field.Name] = field.Value;
}
}
return Ok(new { formFields = formData });
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
The ExtractForm endpoint uses the Form property of PdfDocument to read interactive fields from an uploaded PDF. Each field has a Name and Value, which are collected into a dictionary and returned as JSON. This makes it easy to capture data from text boxes, checkboxes, and other inputs, allowing PDF form submissions to be processed and integrated directly into your applications or databases.
Output
How Can You Extract Table Data from PDF Documents?
Tables in PDFs are essentially formatted text, requiring parsing logic to extract structured data. IronPDF extracts the text while preserving spacing, which you can then parse to read data from PDF file in ASP.NET:
[HttpPost("extract-table")]
public IActionResult ExtractTable([FromForm] IFormFile pdfFile)
{
if (pdfFile == null || pdfFile.Length == 0)
return BadRequest("No PDF file uploaded.");
using var memoryStream = new MemoryStream();
pdfFile.CopyTo(memoryStream);
// Load PDF from byte array
var pdf = new PdfDocument(memoryStream.ToArray());
// Extract all text
string text = pdf.ExtractAllText();
// Split text into lines (rows)
string[] lines = text.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
var tableData = new List<string[]>();
foreach (string line in lines)
{
// Split line into columns using tab character
string[] columns = line
.Split('\t')
.Where(c => !string.IsNullOrWhiteSpace(c))
.ToArray();
if (columns.Length > 0)
tableData.Add(columns);
}
var table = tableData.Select(r => string.Join(" | ", r)).ToList();
return Ok(new { Table = table });
}
[HttpPost("extract-table")]
public IActionResult ExtractTable([FromForm] IFormFile pdfFile)
{
if (pdfFile == null || pdfFile.Length == 0)
return BadRequest("No PDF file uploaded.");
using var memoryStream = new MemoryStream();
pdfFile.CopyTo(memoryStream);
// Load PDF from byte array
var pdf = new PdfDocument(memoryStream.ToArray());
// Extract all text
string text = pdf.ExtractAllText();
// Split text into lines (rows)
string[] lines = text.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
var tableData = new List<string[]>();
foreach (string line in lines)
{
// Split line into columns using tab character
string[] columns = line
.Split('\t')
.Where(c => !string.IsNullOrWhiteSpace(c))
.ToArray();
if (columns.Length > 0)
tableData.Add(columns);
}
var table = tableData.Select(r => string.Join(" | ", r)).ToList();
return Ok(new { Table = table });
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
This code extracts text and splits it into potential table rows based on spacing patterns. For more complex tables, you might need to identify table boundaries using keywords or implement more sophisticated parsing logic based on your specific PDF structure.
This output can be downloaded, displayed in the browser, or processed for additional information. You can integrate CSS formatting or HTML string rendering to display tables dynamically in your solution.
Output
Add from PixabayUpload
or drag and drop an image here
Add image alt text
How Do You Handle Uploaded PDF Files in ASP.NET Core?
Processing uploaded PDFs requires converting the IFormFile to a format IronPDF can read. This approach works seamlessly with Razor Pages and MVC controllers:
[HttpPost("process-upload")]
public async Task<IActionResult> ProcessPdf([FromForm] IFormFile file)
{
if (file == null || file.Length == 0)
return BadRequest("No PDF file uploaded.");
using var ms = new MemoryStream();
await file.CopyToAsync(ms);
// Load PDF from byte array
var pdf = new PdfDocument(ms.ToArray());
// Extract text and page count
var text = pdf.ExtractAllText();
var pageCount = pdf.PageCount;
return Ok(new
{
text = text,
pages = pageCount
});
}
[HttpPost("process-upload")]
public async Task<IActionResult> ProcessPdf([FromForm] IFormFile file)
{
if (file == null || file.Length == 0)
return BadRequest("No PDF file uploaded.");
using var ms = new MemoryStream();
await file.CopyToAsync(ms);
// Load PDF from byte array
var pdf = new PdfDocument(ms.ToArray());
// Extract text and page count
var text = pdf.ExtractAllText();
var pageCount = pdf.PageCount;
return Ok(new
{
text = text,
pages = pageCount
});
}
IRON VB CONVERTER ERROR developers@ironsoftware.com
This asynchronous task ensures non-blocking processing and works with object sender, eventargs e, and standard .NET libraries. Use ReturnFile with Content-Disposition headers to allow users to download processed PDF files securely. For additional security, consider implementing file validation before processing.
Conclusion
IronPDF makes it easy to read, extract, process, and save PDF documents in ASP.NET Core and other .NET Core applications. Whether you’re working with forms, tables, plain text, or digital signatures, this .NET library simplifies tasks that normally take hours into just a few lines of code. You can create, convert, access, and display outputs in HTML, browser, or even image formats.
Start with a free trial to explore IronPDF's full capabilities in your ASP.NET Core. You can build and test your PDF extraction workflows before committing to a license. For production, IronPDF offers flexible options suitable for solo developers or large solutions. Honestly, using IronPDF is one of the fastest ways I’ve found to handle PDF files in ASP.NET Core without the usual headaches.