Search Results for

    Show / Hide Table of Contents

    Class PdfExtractionOptions

    Configuration options for PDF extraction behavior

    Provides control over how tables and text are extracted from PDF documents.

    Use this class to customize extraction parameters such as text mode, table detection strategy,

    and various tolerance values that affect extraction accuracy.

    ------------------------------------------------

    Usage:

    var options = new PdfExtractionOptions
    {
    TextMode = TextExtractionMode.Stream,
    TableStrategy = TableDetectionStrategy.Hybrid,
    EnableTableExtraction = true,
    EnableTextExtraction = true,
    CellMergeThreshold = 2.0,
    ColumnDetectionSensitivity = 15.0
    };
    var result = PdfExtractor.Extract("document.pdf", options);

    ------------------------------------------------

    Inheritance
    System.Object
    PdfExtractionOptions
    Namespace: IronPdf.Extractions
    Assembly: IronPdf.dll
    Syntax
    public class PdfExtractionOptions : Object
    Remarks

    Important Considerations:

    Text Mode Selection: Stream mode is preferred for multi-column layouts, while PositionBased works better for single-column documents.

    Note: Table detection strategies have different strengths; Nurminen for bordered tables, Spreadsheet for borderless tables, and Hybrid for best overall results.

    Related Documentation:

    How-To Guide: Configuring PDF Extraction

    API Reference: Full API Documentation

    Constructors

    PdfExtractionOptions()

    Declaration
    public PdfExtractionOptions()

    Properties

    CellMergeThreshold

    Controls how close cell boundaries need to be to merge (in points).

    • Lower values = stricter alignment

    • Higher values = more forgiving

    Default value: 2.0

    Declaration
    public double CellMergeThreshold { get; set; }
    Property Value
    Type Description
    System.Double

    CellWordInclusionTolerance

    Positional tolerance (in points) for deciding if a word lies within a table cell.

    Declaration
    public double CellWordInclusionTolerance { get; set; }
    Property Value
    Type Description
    System.Double
    Remarks

    Expands cell boundaries slightly to avoid losing near-edge words.

    • Lower values = stricter geometric adherence

    • Higher values = better for noisy PDFs

    Default value: 3.0.

    ColumnDetectionSensitivity

    Gap size (in points) required to split text into separate columns.

    • Lower values = more aggressive column detection

    • Higher values = less aggressive

    Default value: 15.0

    Declaration
    public double ColumnDetectionSensitivity { get; set; }
    Property Value
    Type Description
    System.Double

    ColumnGapMultiplier

    Multiplier applied to the average inter-word gap to detect

    if a gap represents a column break within a table cell.

    Declaration
    public double ColumnGapMultiplier { get; set; }
    Property Value
    Type Description
    System.Double
    Remarks

    Helps distinguish normal spacing from real column separators.

    • Lower values = more sensitive, risk of false splits

    • Higher values = more conservative

    Default value: 3.0

    EnableTableExtraction

    Enable or disable table extraction

    When set to false, only text will be extracted from the PDF.

    Default value: true

    Declaration
    public bool EnableTableExtraction { get; set; }
    Property Value
    Type Description
    System.Boolean

    EnableTextExtraction

    Enable or disable text extraction

    When set to false, only tables will be extracted from the PDF.

    Default value: true

    Declaration
    public bool EnableTextExtraction { get; set; }
    Property Value
    Type Description
    System.Boolean

    LineHeightGroupingTolerance

    Vertical tolerance (in points) for grouping words into text lines.

    Declaration
    public double LineHeightGroupingTolerance { get; set; }
    Property Value
    Type Description
    System.Double
    Remarks

    This tolerance defines the maximum allowed difference between the

    vertical midpoints of two words for them to be considered part of

    the same textual line.

    • Lower values good for clean digital PDFs

    • Higher values recommended for noisy docs

    Default value: 3.0.

    OwnerPassword

    Optional owner password if the PDF document is protected by owner restrictions.

    This password may be required to bypass restrictions such as printing or modification.

    Declaration
    public string OwnerPassword { get; set; }
    Property Value
    Type Description
    System.String

    Password

    Optional user password if the PDF document is encrypted.

    This password is required to open the document.

    Declaration
    public string Password { get; set; }
    Property Value
    Type Description
    System.String

    TableBoundaryExpansion

    Expand table boundaries (in points) to capture edge words.

    Helps capture content at table edges, especially bottom rows

    Default value: 5.0

    Declaration
    public double TableBoundaryExpansion { get; set; }
    Property Value
    Type Description
    System.Double

    TableDeduplicationTolerance

    Positional tolerance used when determining whether two detected

    tables should be considered duplicates during post-processing.

    Declaration
    public double TableDeduplicationTolerance { get; set; }
    Property Value
    Type Description
    System.Double
    Remarks

    If all edges fall within this tolerance, the tables are treated as

    duplicates and only the first instance is retained.

    • Lower values = stricter matching

    • Higher values = more forgiving

    Default value: 5.0

    TableStrategy

    Table detection strategy (Nurminen, Spreadsheet, or Hybrid)

    Determines which algorithm(s) to use for detecting tables in the PDF.

    Default value: Hybrid

    Declaration
    public TableDetectionStrategy TableStrategy { get; set; }
    Property Value
    Type Description
    TableDetectionStrategy

    TextMode

    Text extraction mode. Stream mode is preferred for multi-column layouts.

    PositionBased mode extracts text positioned top-to-bottom, left-to-right.

    Default value: PositionBased

    Declaration
    public TextExtractionMode TextMode { get; set; }
    Property Value
    Type Description
    TextExtractionMode

    UseFirstRowAsHeader

    If true, the first row will be treated as a header row when converting/exporting the table.

    Default value: false

    Declaration
    public bool UseFirstRowAsHeader { get; set; }
    Property Value
    Type Description
    System.Boolean

    WordBoundaryJitterTolerance

    Tolerance (in points) for assigning words to detected column regions.

    Declaration
    public double WordBoundaryJitterTolerance { get; set; }
    Property Value
    Type Description
    System.Double
    Remarks

    Compensates for extraction jitter or slight coordinate drift.

    • Lower values = stricter boundaries

    • Higher values = more forgiving

    Default value: 0.5.

    WordOverlapTolerance

    Allowed negative horizontal overlap (in points) before suppressing a space.

    Declaration
    public double WordOverlapTolerance { get; set; }
    Property Value
    Type Description
    System.Double
    Remarks

    Handles overlap in bold or italic word boxes.

    • Less negative = stricter, risk of unwanted concatenation

    • More negative = more forgiving spacing

    Default value: -25.0.

    ☀
    ☾
    Downloads
    • Download with Nuget
    • Start for Free
    In This Article
    Back to top
    Install with Nuget
    Want to deploy IronPDF to a live project for FREE?
    What’s included?
    30 days of fully-functional product
    Test and share in a live environment
    No watermarks in production
    Get your free 30-day Trial Key instantly.
    No credit card or account creation required
    Your Trial License Key has been emailed to you.
    Download IronPDF free to apply
    your Trial Licenses Key
    Install with NuGet View Licenses
    Licenses from $499. Have a question? Get in touch.