Class PdfExtractor
Provides methods to extract tables and text from PDF documents with various options.
Supports both synchronous and asynchronous extraction operations.
------------------------------------------------
Usage:
// Extract entire document
var result = PdfExtractor.Extract("document.pdf");
// Extract with custom options
var options = new PdfExtractionOptions
{
TableStrategy = TableDetectionStrategy.Hybrid,
EnableTextExtraction = false
};
var tablesOnly = PdfExtractor.Extract("document.pdf", options);
// Extract specific page
var pageResult = PdfExtractor.ExtractPage("document.pdf", 5);
// Extract specific table
var table = PdfExtractor.ExtractTable("document.pdf", 5, 0);
------------------------------------------------
Inheritance
Namespace: IronPdf.Extractions
Assembly: IronPdf.dll
Syntax
public static class PdfExtractor : Object
Remarks
Important Considerations:
Performance: For large documents, consider using the async methods to avoid blocking the UI thread.
Note: The ExtractTable method will throw an exception if the specified table index doesn't exist on the page.
Related Documentation:
How-To Guide: Getting Started with PDF Extraction
API Reference: Full API Documentation
Methods
Extract(String, PdfExtractionOptions)
Extract tables and text from a PDF file with custom or default options
Processes all pages in the document and returns both tables and text content.
Declaration
public static PdfExtractionResult Extract(string pdfPath, PdfExtractionOptions options = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | pdfPath | Path to the PDF file |
| PdfExtractionOptions | options | Extraction options (null for default options) |
Returns
| Type | Description |
|---|---|
| PdfExtractionResult | PdfExtractionResult containing all extracted tables and text |
Exceptions
| Type | Condition |
|---|---|
| System.InvalidOperationException | Thrown when |
ExtractAsync(String, PdfExtractionOptions, IProgress<ExtractionProgress>)
Extract tables and text asynchronously (for large documents)
Processes all pages in the document asynchronously and returns both tables and text content.
Declaration
public static Task<PdfExtractionResult> ExtractAsync(string pdfPath, PdfExtractionOptions options = null, IProgress<ExtractionProgress> progress = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | pdfPath | Path to the PDF file |
| PdfExtractionOptions | options | Extraction options (null for default options) |
| System.IProgress<ExtractionProgress> | progress | Optional progress reporter |
Returns
| Type | Description |
|---|---|
| System.Threading.Tasks.Task<PdfExtractionResult> | Task that resolves to PdfExtractionResult containing all extracted tables and text |
Exceptions
| Type | Condition |
|---|---|
| System.InvalidOperationException | Thrown when |
ExtractPage(String, Int32, PdfExtractionOptions)
Extract tables and text from a specific page
Processes only the specified page and returns tables and text content.
Declaration
public static PdfExtractionResult ExtractPage(string pdfPath, int pageNumber, PdfExtractionOptions options = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | pdfPath | Path to the PDF file |
| System.Int32 | pageNumber | Page number to extract (1-based) |
| PdfExtractionOptions | options | Extraction options (null for default options) |
Returns
| Type | Description |
|---|---|
| PdfExtractionResult | PdfExtractionResult containing tables and text from the specified page |
Exceptions
| Type | Condition |
|---|---|
| System.InvalidOperationException | Thrown when |
| System.ArgumentOutOfRangeException | Thrown when |
ExtractPageAsync(String, Int32, PdfExtractionOptions, IProgress<ExtractionProgress>)
Extract a specific page asynchronously
Processes only the specified page asynchronously and returns tables and text content.
Declaration
public static Task<PdfExtractionResult> ExtractPageAsync(string pdfPath, int pageNumber, PdfExtractionOptions options = null, IProgress<ExtractionProgress> progress = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | pdfPath | Path to the PDF file |
| System.Int32 | pageNumber | Page number to extract (1-based) |
| PdfExtractionOptions | options | Extraction options (null for default options) |
| System.IProgress<ExtractionProgress> | progress | Optional progress reporter |
Returns
| Type | Description |
|---|---|
| System.Threading.Tasks.Task<PdfExtractionResult> | Task that resolves to PdfExtractionResult containing tables and text from the specified page |
Exceptions
| Type | Condition |
|---|---|
| System.InvalidOperationException | Thrown when |
| System.ArgumentOutOfRangeException | Thrown when |
ExtractPages(String, Int32, Int32, PdfExtractionOptions)
Extract tables and text from a range of pages
Processes only the pages in the specified range and returns tables and text content.
Declaration
public static PdfExtractionResult ExtractPages(string pdfPath, int startPage, int endPage, PdfExtractionOptions options = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | pdfPath | Path to the PDF file |
| System.Int32 | startPage | Starting page number (1-based, inclusive) |
| System.Int32 | endPage | Ending page number (1-based, inclusive) |
| PdfExtractionOptions | options | Extraction options (null for default options) |
Returns
| Type | Description |
|---|---|
| PdfExtractionResult | PdfExtractionResult containing tables and text from the specified page range |
Exceptions
| Type | Condition |
|---|---|
| System.InvalidOperationException | Thrown when |
| System.ArgumentOutOfRangeException | Thrown when |
ExtractTable(String, Int32, Int32, PdfExtractionOptions)
Extract a specific table from a specific page
Returns only the specified table from the specified page.
Declaration
public static TableObject ExtractTable(string pdfPath, int pageNumber, int tableIndex, PdfExtractionOptions options = null)
Parameters
| Type | Name | Description |
|---|---|---|
| System.String | pdfPath | Path to the PDF file |
| System.Int32 | pageNumber | Page number containing the table (1-based) |
| System.Int32 | tableIndex | Index of the table on the page (0-based) |
| PdfExtractionOptions | options | Extraction options (null for default options) |
Returns
| Type | Description |
|---|---|
| TableObject | TableObject representing the specified table |
Exceptions
| Type | Condition |
|---|---|
| System.InvalidOperationException | Thrown when |
| System.ArgumentOutOfRangeException | Thrown when |