Class PdfExtractor

Provides methods to extract tables and text from PDF documents with various options.

Supports both synchronous and asynchronous extraction operations.

------------------------------------------------

Usage:

// Extract entire document
var result = PdfExtractor.Extract("document.pdf");
// Extract with custom options
var options = new PdfExtractionOptions
{
TableStrategy = TableDetectionStrategy.Hybrid,
EnableTextExtraction = false
};
var tablesOnly = PdfExtractor.Extract("document.pdf", options);
// Extract specific page
var pageResult = PdfExtractor.ExtractPage("document.pdf", 5);
// Extract specific table
var table = PdfExtractor.ExtractTable("document.pdf", 5, 0);

------------------------------------------------

Inheritance

System.Object

PdfExtractor

Namespace: IronPdf.Extractions

Assembly: IronPdf.dll

Syntax

public static class PdfExtractor : Object

Remarks

Important Considerations:

Performance: For large documents, consider using the async methods to avoid blocking the UI thread.

Note: The ExtractTable method will throw an exception if the specified table index doesn't exist on the page.

Related Documentation:

How-To Guide: Getting Started with PDF Extraction

API Reference: Full API Documentation

Methods

Extract(String, PdfExtractionOptions)

Extract tables and text from a PDF file with custom or default options

Processes all pages in the document and returns both tables and text content.

Declaration

public static PdfExtractionResult Extract(string pdfPath, PdfExtractionOptions options = null)

Parameters

Type	Name	Description
System.String	pdfPath	Path to the PDF file
PdfExtractionOptions	options	Extraction options (null for default options)

Returns

Type	Description
PdfExtractionResult	PdfExtractionResult containing all extracted tables and text

Exceptions

Type	Condition
System.InvalidOperationException	Thrown when `pdfPath` is invalid or when no file exists at the specified path.

ExtractAsync(String, PdfExtractionOptions, IProgress<ExtractionProgress>)

Extract tables and text asynchronously (for large documents)

Processes all pages in the document asynchronously and returns both tables and text content.

Declaration

public static Task<PdfExtractionResult> ExtractAsync(string pdfPath, PdfExtractionOptions options = null, IProgress<ExtractionProgress> progress = null)

Parameters

Type	Name	Description
System.String	pdfPath	Path to the PDF file
PdfExtractionOptions	options	Extraction options (null for default options)
System.IProgress<ExtractionProgress>	progress	Optional progress reporter

Returns

Type	Description
System.Threading.Tasks.Task<PdfExtractionResult>	Task that resolves to PdfExtractionResult containing all extracted tables and text

Exceptions

Type	Condition
System.InvalidOperationException	Thrown when `pdfPath` is invalid or when no file exists at the specified path.

ExtractPage(String, Int32, PdfExtractionOptions)

Extract tables and text from a specific page

Processes only the specified page and returns tables and text content.

Declaration

public static PdfExtractionResult ExtractPage(string pdfPath, int pageNumber, PdfExtractionOptions options = null)

Parameters

Type	Name	Description
System.String	pdfPath	Path to the PDF file
System.Int32	pageNumber	Page number to extract (1-based)
PdfExtractionOptions	options	Extraction options (null for default options)

Returns

Type	Description
PdfExtractionResult	PdfExtractionResult containing tables and text from the specified page

Exceptions

Type	Condition
System.InvalidOperationException	Thrown when `pdfPath` is invalid or when no file exists at the specified path.
System.ArgumentOutOfRangeException	Thrown when `pageNumber` is less than 1 or greater than the number of pages in the document.

ExtractPageAsync(String, Int32, PdfExtractionOptions, IProgress<ExtractionProgress>)

Extract a specific page asynchronously

Processes only the specified page asynchronously and returns tables and text content.

Declaration

public static Task<PdfExtractionResult> ExtractPageAsync(string pdfPath, int pageNumber, PdfExtractionOptions options = null, IProgress<ExtractionProgress> progress = null)

Parameters

Type	Name	Description
System.String	pdfPath	Path to the PDF file
System.Int32	pageNumber	Page number to extract (1-based)
PdfExtractionOptions	options	Extraction options (null for default options)
System.IProgress<ExtractionProgress>	progress	Optional progress reporter

Returns

Type	Description
System.Threading.Tasks.Task<PdfExtractionResult>	Task that resolves to PdfExtractionResult containing tables and text from the specified page

Exceptions

Type	Condition
System.InvalidOperationException	Thrown when `pdfPath` is invalid or when no file exists at the specified path.
System.ArgumentOutOfRangeException	Thrown when `pageNumber` is less than 1 or greater than the number of pages in the document.

ExtractPages(String, Int32, Int32, PdfExtractionOptions)

Extract tables and text from a range of pages

Processes only the pages in the specified range and returns tables and text content.

Declaration

public static PdfExtractionResult ExtractPages(string pdfPath, int startPage, int endPage, PdfExtractionOptions options = null)

Parameters

Type	Name	Description
System.String	pdfPath	Path to the PDF file
System.Int32	startPage	Starting page number (1-based, inclusive)
System.Int32	endPage	Ending page number (1-based, inclusive)
PdfExtractionOptions	options	Extraction options (null for default options)

Returns

Type	Description
PdfExtractionResult	PdfExtractionResult containing tables and text from the specified page range

Exceptions

Type	Condition
System.InvalidOperationException	Thrown when `pdfPath` is invalid or when no file exists at the specified path.
System.ArgumentOutOfRangeException	Thrown when `startPage` or `endPage` is outside the valid page range.

ExtractTable(String, Int32, Int32, PdfExtractionOptions)

Extract a specific table from a specific page

Returns only the specified table from the specified page.

Declaration

public static TableObject ExtractTable(string pdfPath, int pageNumber, int tableIndex, PdfExtractionOptions options = null)

Parameters

Type	Name	Description
System.String	pdfPath	Path to the PDF file
System.Int32	pageNumber	Page number containing the table (1-based)
System.Int32	tableIndex	Index of the table on the page (0-based)
PdfExtractionOptions	options	Extraction options (null for default options)

Returns

Type	Description
TableObject	TableObject representing the specified table

Exceptions

Type	Condition
System.InvalidOperationException	Thrown when `pdfPath` is invalid or when no file exists at the specified path.
System.ArgumentOutOfRangeException	Thrown when `pageNumber` is less than 1 or greater than the number of pages, or when `tableIndex` is negative.