Namespace IronPdf.Extractions

Classes

CsvExportOptions

Configuration options for CSV exports

Provides control over delimiters, quoting, and cell formatting specific to CSV format.

------------------------------------------------

Usage:

var options = new CsvExportOptions
{
CsvDelimiter = ";",
CsvQuoteStrings = true,
CsvNewlineReplacement = " ",
IncludeHeaders = true
};

------------------------------------------------

DocumentMetadata

Document level metadata

Contains information about the entire document, such as total pages and table counts.

Also includes per-page metadata.

ExportConfiguration

Configuration for batch export operations

Controls how tables and text are exported from a complete extraction result.

Provides options for file organization and table option selection.

------------------------------------------------

Usage:

var config = new ExportConfiguration
{
ExportTables = true,
ExportText = true,
TableOptions = new CsvExportOptions(),
SeparateFilePerTable = true,
FileNamePattern = "table_{page}_{index}"
};
ExportManager.ExportResult(result, "output", config);

------------------------------------------------

ExportFormat

Supported export formats

Defines the file formats available for exporting extracted data.

ExportManager

Provides methods to export tables and text to various formats.

Acts as a factory for format-specific exporters.

The export format is automatically determined by the type of ExportOptions provided.

------------------------------------------------

Usage:

// Export a single table with default options (uses format parameter)
ExportManager.ExportTable(table, "output.csv", ExportFormat.Csv);
// Export multiple tables with custom options (format inferred from options type)
var csvOptions = new CsvExportOptions { CsvDelimiter = ";" };
ExportManager.ExportTables(tables, "output.csv", csvOptions);
// Export with custom HTML options
var htmlOptions = new HtmlExportOptions { HtmlResponsive = true };
ExportManager.ExportTable(table, "output.html", htmlOptions);
// Export entire extraction result
var config = new ExportConfiguration
{
ExportTables = true,
ExportText = true,
TableOptions = new JsonExportOptions(),
SeparateFilePerTable = true
};
ExportManager.ExportResult(result, "output", config);

------------------------------------------------

ExportOptionsBase

Base configuration options for exporting extracted data

Contains common options applicable to all export formats.

------------------------------------------------

Usage:

// Use base options for generic export
var options = new ExportOptionsBase
{
IncludeHeaders = true,
SpanMode = SpanHandlingMode.Repeat
};
// Or use format-specific options
var csvOptions = new CsvExportOptions
{
CsvDelimiter = ";",
IncludeHeaders = true
};

------------------------------------------------

ExtractionProgress

Information about the progress of an asynchronous extraction operation.

Used to report progress to the caller during long-running extraction operations.

HtmlExportOptions

Configuration options for HTML exports

Controls styling, responsiveness, and CSS class application for HTML table output.

------------------------------------------------

Usage:

var options = new HtmlExportOptions
{
HtmlIncludeStyles = true,
HtmlResponsive = true,
HtmlTableClass = "custom-table"
};

------------------------------------------------

JsonExportOptions

Configuration options for JSON exports

Currently inherits all options from ExportOptionsBase.

PageMetadata

Per-page metadata

Contains information about a specific page, such as page number, table count, and word count.

PageText

Text content for a single page

Contains text extracted from a single page of a PDF document.

Includes both the raw text and positioned lines for layout reconstruction.

PdfExtractionOptions

Configuration options for PDF extraction behavior

Provides control over how tables and text are extracted from PDF documents.

Use this class to customize extraction parameters such as text mode, table detection strategy,

and various tolerance values that affect extraction accuracy.

------------------------------------------------

Usage:

var options = new PdfExtractionOptions
{
TextMode = TextExtractionMode.Stream,
TableStrategy = TableDetectionStrategy.Hybrid,
EnableTableExtraction = true,
EnableTextExtraction = true,
CellMergeThreshold = 2.0,
ColumnDetectionSensitivity = 15.0
};
var result = PdfExtractor.Extract("document.pdf", options);

------------------------------------------------

PdfExtractionResult

Represents the output of a PDF extraction operation, containing all extracted

tables and text content, along with document metadata.

Provides convenient methods to access specific portions of the extracted content.

------------------------------------------------

Usage:

var result = PdfExtractor.Extract("document.pdf");
// Access all tables
foreach (var table in result.Tables)
{
Console.WriteLine($"Table on page {table.PageNumber} with {table.RowCount} rows");
}
// Get tables from a specific page
var pageTables = result.GetTablesByPage(5);
// Get text from a specific page
var pageText = result.GetRawTextByPage(5);
// Get full text including tables
var fullText = result.FullText;

------------------------------------------------

PdfExtractor

Provides methods to extract tables and text from PDF documents with various options.

Supports both synchronous and asynchronous extraction operations.

------------------------------------------------

Usage:

// Extract entire document
var result = PdfExtractor.Extract("document.pdf");
// Extract with custom options
var options = new PdfExtractionOptions
{
TableStrategy = TableDetectionStrategy.Hybrid,
EnableTextExtraction = false
};
var tablesOnly = PdfExtractor.Extract("document.pdf", options);
// Extract specific page
var pageResult = PdfExtractor.ExtractPage("document.pdf", 5);
// Extract specific table
var table = PdfExtractor.ExtractTable("document.pdf", 5, 0);

------------------------------------------------

SpanHandlingMode

Enumeration of how to handle cells with rowspan/colspan in exports

Controls how merged cells are represented in different export formats.

TableCell

Represents a table cell with span support

Contains the content and metadata for a cell in a table.

Supports merged cells (spans) across rows and columns.

TableDetectionStrategy

Table detection strategies

Determines which algorithm(s) to use for detecting tables in PDF documents.

TableObject

Represents an extracted table with structural information

Contains the data and metadata for a table extracted from a PDF document.

Provides convenient methods to access table data and structure.

TableRow

Represents a table row

Contains a collection of cells that make up a row in a table.

TextContent

Extracted text content outside of tables

Provides methods to access text content for the entire document or specific pages.

TextExtractionMode

Text extraction modes

Determines how text is extracted from PDF documents.

TxtExportOptions

Configuration options for plain text exports

Currently inherits all options from ExportOptionsBase.

The TXT exporter does not support different span-handling strategies. Regardless of the value provided in SpanMode, the TXT export always behaves as though SpanHandlingMode.Merge is used.

Other values (Repeat, Empty, Annotate) have no effect on TXT output.

XmlExportOptions

Configuration options for XML exports

Controls XML schema inclusion and formatting options.

------------------------------------------------

Usage:

var options = new XmlExportOptions
{
XmlIncludeSchema = true,
XmlPrettyPrint = true
};

------------------------------------------------