Class PdfExtractionOptions
Configuration options for PDF extraction behavior
Provides control over how tables and text are extracted from PDF documents.
Use this class to customize extraction parameters such as text mode, table detection strategy,
and various tolerance values that affect extraction accuracy.
------------------------------------------------
Usage:
var options = new PdfExtractionOptions
{
TextMode = TextExtractionMode.Stream,
TableStrategy = TableDetectionStrategy.Hybrid,
EnableTableExtraction = true,
EnableTextExtraction = true,
CellMergeThreshold = 2.0,
ColumnDetectionSensitivity = 15.0
};
var result = PdfExtractor.Extract("document.pdf", options);
------------------------------------------------
Inheritance
Namespace: IronPdf.Extractions
Assembly: IronPdf.dll
Syntax
public class PdfExtractionOptions : Object
Remarks
Important Considerations:
Text Mode Selection: Stream mode is preferred for multi-column layouts, while PositionBased works better for single-column documents.
Note: Table detection strategies have different strengths; Nurminen for bordered tables, Spreadsheet for borderless tables, and Hybrid for best overall results.
Related Documentation:
How-To Guide: Configuring PDF Extraction
API Reference: Full API Documentation
Constructors
PdfExtractionOptions()
Declaration
public PdfExtractionOptions()
Properties
CellMergeThreshold
Controls how close cell boundaries need to be to merge (in points).
• Lower values = stricter alignment
• Higher values = more forgiving
Default value: 2.0
Declaration
public double CellMergeThreshold { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Double |
CellWordInclusionTolerance
Positional tolerance (in points) for deciding if a word lies within a table cell.
Declaration
public double CellWordInclusionTolerance { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Double |
Remarks
Expands cell boundaries slightly to avoid losing near-edge words.
• Lower values = stricter geometric adherence
• Higher values = better for noisy PDFs
Default value: 3.0.
ColumnDetectionSensitivity
Gap size (in points) required to split text into separate columns.
• Lower values = more aggressive column detection
• Higher values = less aggressive
Default value: 15.0
Declaration
public double ColumnDetectionSensitivity { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Double |
ColumnGapMultiplier
Multiplier applied to the average inter-word gap to detect
if a gap represents a column break within a table cell.
Declaration
public double ColumnGapMultiplier { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Double |
Remarks
Helps distinguish normal spacing from real column separators.
• Lower values = more sensitive, risk of false splits
• Higher values = more conservative
Default value: 3.0
EnableTableExtraction
Enable or disable table extraction
When set to false, only text will be extracted from the PDF.
Default value: true
Declaration
public bool EnableTableExtraction { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Boolean |
EnableTextExtraction
Enable or disable text extraction
When set to false, only tables will be extracted from the PDF.
Default value: true
Declaration
public bool EnableTextExtraction { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Boolean |
LineHeightGroupingTolerance
Vertical tolerance (in points) for grouping words into text lines.
Declaration
public double LineHeightGroupingTolerance { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Double |
Remarks
This tolerance defines the maximum allowed difference between the
vertical midpoints of two words for them to be considered part of
the same textual line.
• Lower values good for clean digital PDFs
• Higher values recommended for noisy docs
Default value: 3.0.
OwnerPassword
Optional owner password if the PDF document is protected by owner restrictions.
This password may be required to bypass restrictions such as printing or modification.
Declaration
public string OwnerPassword { get; set; }
Property Value
| Type | Description |
|---|---|
| System.String |
Password
Optional user password if the PDF document is encrypted.
This password is required to open the document.
Declaration
public string Password { get; set; }
Property Value
| Type | Description |
|---|---|
| System.String |
TableBoundaryExpansion
Expand table boundaries (in points) to capture edge words.
Helps capture content at table edges, especially bottom rows
Default value: 5.0
Declaration
public double TableBoundaryExpansion { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Double |
TableDeduplicationTolerance
Positional tolerance used when determining whether two detected
tables should be considered duplicates during post-processing.
Declaration
public double TableDeduplicationTolerance { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Double |
Remarks
If all edges fall within this tolerance, the tables are treated as
duplicates and only the first instance is retained.
• Lower values = stricter matching
• Higher values = more forgiving
Default value: 5.0
TableStrategy
Table detection strategy (Nurminen, Spreadsheet, or Hybrid)
Determines which algorithm(s) to use for detecting tables in the PDF.
Default value: Hybrid
Declaration
public TableDetectionStrategy TableStrategy { get; set; }
Property Value
| Type | Description |
|---|---|
| TableDetectionStrategy |
TextMode
Text extraction mode. Stream mode is preferred for multi-column layouts.
PositionBased mode extracts text positioned top-to-bottom, left-to-right.
Default value: PositionBased
Declaration
public TextExtractionMode TextMode { get; set; }
Property Value
| Type | Description |
|---|---|
| TextExtractionMode |
UseFirstRowAsHeader
If true, the first row will be treated as a header row when converting/exporting the table.
Default value: false
Declaration
public bool UseFirstRowAsHeader { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Boolean |
WordBoundaryJitterTolerance
Tolerance (in points) for assigning words to detected column regions.
Declaration
public double WordBoundaryJitterTolerance { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Double |
Remarks
Compensates for extraction jitter or slight coordinate drift.
• Lower values = stricter boundaries
• Higher values = more forgiving
Default value: 0.5.
WordOverlapTolerance
Allowed negative horizontal overlap (in points) before suppressing a space.
Declaration
public double WordOverlapTolerance { get; set; }
Property Value
| Type | Description |
|---|---|
| System.Double |
Remarks
Handles overlap in bold or italic word boxes.
• Less negative = stricter, risk of unwanted concatenation
• More negative = more forgiving spacing
Default value: -25.0.