Search Results for

    Show / Hide Table of Contents

    Class PdfExtractor

    Provides methods to extract tables and text from PDF documents with various options.

    Supports both synchronous and asynchronous extraction operations.

    ------------------------------------------------

    Usage:

    // Extract entire document
    var result = PdfExtractor.Extract("document.pdf");
    

    // Extract with custom options var options = new PdfExtractionOptions { TableStrategy = TableDetectionStrategy.Hybrid, EnableTextExtraction = false }; var tablesOnly = PdfExtractor.Extract("document.pdf", options);

    // Extract specific page var pageResult = PdfExtractor.ExtractPage("document.pdf", 5);

    // Extract specific table var table = PdfExtractor.ExtractTable("document.pdf", 5, 0);

    ------------------------------------------------

    Inheritance
    System.Object
    PdfExtractor
    Namespace: IronPdf.Extractions
    Assembly: IronPdf.dll
    Syntax
    public static class PdfExtractor : Object
    Remarks

    Important Considerations:

    Performance: For large documents, consider using the async methods to avoid blocking the UI thread.

    Note: The ExtractTable method will throw an exception if the specified table index doesn't exist on the page.

    Related Documentation:

    How-To Guide: Getting Started with PDF Extraction

    API Reference: Full API Documentation

    Methods

    Extract(String, PdfExtractionOptions)

    Extract tables and text from a PDF file with custom or default options

    Processes all pages in the document and returns both tables and text content.

    Declaration
    public static PdfExtractionResult Extract(string pdfPath, PdfExtractionOptions options = null)
    Parameters
    Type Name Description
    System.String pdfPath

    Path to the PDF file

    PdfExtractionOptions options

    Extraction options (null for default options)

    Returns
    Type Description
    PdfExtractionResult

    PdfExtractionResult containing all extracted tables and text

    Exceptions
    Type Condition
    System.InvalidOperationException

    Thrown when pdfPath is invalid or when no file exists at the specified path.

    ExtractAsync(String, PdfExtractionOptions, IProgress<ExtractionProgress>)

    Extract tables and text asynchronously (for large documents)

    Processes all pages in the document asynchronously and returns both tables and text content.

    Declaration
    public static Task<PdfExtractionResult> ExtractAsync(string pdfPath, PdfExtractionOptions options = null, IProgress<ExtractionProgress> progress = null)
    Parameters
    Type Name Description
    System.String pdfPath

    Path to the PDF file

    PdfExtractionOptions options

    Extraction options (null for default options)

    System.IProgress<ExtractionProgress> progress

    Optional progress reporter

    Returns
    Type Description
    System.Threading.Tasks.Task<PdfExtractionResult>

    Task that resolves to PdfExtractionResult containing all extracted tables and text

    Exceptions
    Type Condition
    System.InvalidOperationException

    Thrown when pdfPath is invalid or when no file exists at the specified path.

    ExtractPage(String, Int32, PdfExtractionOptions)

    Extract tables and text from a specific page

    Processes only the specified page and returns tables and text content.

    Declaration
    public static PdfExtractionResult ExtractPage(string pdfPath, int pageNumber, PdfExtractionOptions options = null)
    Parameters
    Type Name Description
    System.String pdfPath

    Path to the PDF file

    System.Int32 pageNumber

    Page number to extract (1-based)

    PdfExtractionOptions options

    Extraction options (null for default options)

    Returns
    Type Description
    PdfExtractionResult

    PdfExtractionResult containing tables and text from the specified page

    Exceptions
    Type Condition
    System.InvalidOperationException

    Thrown when pdfPath is invalid or when no file exists at the specified path.

    System.ArgumentOutOfRangeException

    Thrown when pageNumber is less than 1 or greater than the number of pages in the document.

    ExtractPageAsync(String, Int32, PdfExtractionOptions, IProgress<ExtractionProgress>)

    Extract a specific page asynchronously

    Processes only the specified page asynchronously and returns tables and text content.

    Declaration
    public static Task<PdfExtractionResult> ExtractPageAsync(string pdfPath, int pageNumber, PdfExtractionOptions options = null, IProgress<ExtractionProgress> progress = null)
    Parameters
    Type Name Description
    System.String pdfPath

    Path to the PDF file

    System.Int32 pageNumber

    Page number to extract (1-based)

    PdfExtractionOptions options

    Extraction options (null for default options)

    System.IProgress<ExtractionProgress> progress

    Optional progress reporter

    Returns
    Type Description
    System.Threading.Tasks.Task<PdfExtractionResult>

    Task that resolves to PdfExtractionResult containing tables and text from the specified page

    Exceptions
    Type Condition
    System.InvalidOperationException

    Thrown when pdfPath is invalid or when no file exists at the specified path.

    System.ArgumentOutOfRangeException

    Thrown when pageNumber is less than 1 or greater than the number of pages in the document.

    ExtractPages(String, Int32, Int32, PdfExtractionOptions)

    Extract tables and text from a range of pages

    Processes only the pages in the specified range and returns tables and text content.

    Declaration
    public static PdfExtractionResult ExtractPages(string pdfPath, int startPage, int endPage, PdfExtractionOptions options = null)
    Parameters
    Type Name Description
    System.String pdfPath

    Path to the PDF file

    System.Int32 startPage

    Starting page number (1-based, inclusive)

    System.Int32 endPage

    Ending page number (1-based, inclusive)

    PdfExtractionOptions options

    Extraction options (null for default options)

    Returns
    Type Description
    PdfExtractionResult

    PdfExtractionResult containing tables and text from the specified page range

    Exceptions
    Type Condition
    System.InvalidOperationException

    Thrown when pdfPath is invalid or when no file exists at the specified path.

    System.ArgumentOutOfRangeException

    Thrown when startPage or endPage is outside the valid page range.

    ExtractTable(String, Int32, Int32, PdfExtractionOptions)

    Extract a specific table from a specific page

    Returns only the specified table from the specified page.

    Declaration
    public static TableObject ExtractTable(string pdfPath, int pageNumber, int tableIndex, PdfExtractionOptions options = null)
    Parameters
    Type Name Description
    System.String pdfPath

    Path to the PDF file

    System.Int32 pageNumber

    Page number containing the table (1-based)

    System.Int32 tableIndex

    Index of the table on the page (0-based)

    PdfExtractionOptions options

    Extraction options (null for default options)

    Returns
    Type Description
    TableObject

    TableObject representing the specified table

    Exceptions
    Type Condition
    System.InvalidOperationException

    Thrown when pdfPath is invalid or when no file exists at the specified path.

    System.ArgumentOutOfRangeException

    Thrown when pageNumber is less than 1 or greater than the number of pages, or when tableIndex is negative.

    ☀
    ☾
    Downloads
    • Download with Nuget
    • Start for Free
    In This Article
    Back to top
    Install with Nuget
    Want to deploy IronPDF to a live project for FREE?
    What’s included?
    30 days of fully-functional product
    Test and share in a live environment
    No watermarks in production
    Get your free 30-day Trial Key instantly.
    No credit card or account creation required
    Your Trial License Key has been emailed to you.
    Download IronPDF free to apply
    your Trial Licenses Key
    Install with NuGet View Licenses
    Licenses from $499. Have a question? Get in touch.