Namespace IronPdf.Extractions
Classes
CsvExportOptions
Configuration options for CSV exports
Provides control over delimiters, quoting, and cell formatting specific to CSV format.
------------------------------------------------
Usage:
var options = new CsvExportOptions
{
CsvDelimiter = ";",
CsvQuoteStrings = true,
CsvNewlineReplacement = " ",
IncludeHeaders = true
};
------------------------------------------------
DocumentMetadata
Document level metadata
Contains information about the entire document, such as total pages and table counts.
Also includes per-page metadata.
ExportConfiguration
Configuration for batch export operations
Controls how tables and text are exported from a complete extraction result.
Provides options for file organization and table option selection.
------------------------------------------------
Usage:
var config = new ExportConfiguration
{
ExportTables = true,
ExportText = true,
TableOptions = new CsvExportOptions(),
SeparateFilePerTable = true,
FileNamePattern = "table_{page}_{index}"
};
ExportManager.ExportResult(result, "output", config);
------------------------------------------------
ExportFormat
Supported export formats
Defines the file formats available for exporting extracted data.
ExportManager
Provides methods to export tables and text to various formats.
Acts as a factory for format-specific exporters.
The export format is automatically determined by the type of ExportOptions provided.
------------------------------------------------
Usage:
// Export a single table with default options (uses format parameter)
ExportManager.ExportTable(table, "output.csv", ExportFormat.Csv);
// Export multiple tables with custom options (format inferred from options type)
var csvOptions = new CsvExportOptions { CsvDelimiter = ";" };
ExportManager.ExportTables(tables, "output.csv", csvOptions);
// Export with custom HTML options
var htmlOptions = new HtmlExportOptions { HtmlResponsive = true };
ExportManager.ExportTable(table, "output.html", htmlOptions);
// Export entire extraction result
var config = new ExportConfiguration
{
ExportTables = true,
ExportText = true,
TableOptions = new JsonExportOptions(),
SeparateFilePerTable = true
};
ExportManager.ExportResult(result, "output", config);
------------------------------------------------
ExportOptionsBase
Base configuration options for exporting extracted data
Contains common options applicable to all export formats.
------------------------------------------------
Usage:
// Use base options for generic export
var options = new ExportOptionsBase
{
IncludeHeaders = true,
SpanMode = SpanHandlingMode.Repeat
};
// Or use format-specific options
var csvOptions = new CsvExportOptions
{
CsvDelimiter = ";",
IncludeHeaders = true
};
------------------------------------------------
ExtractionProgress
Information about the progress of an asynchronous extraction operation.
Used to report progress to the caller during long-running extraction operations.
HtmlExportOptions
Configuration options for HTML exports
Controls styling, responsiveness, and CSS class application for HTML table output.
------------------------------------------------
Usage:
var options = new HtmlExportOptions
{
HtmlIncludeStyles = true,
HtmlResponsive = true,
HtmlTableClass = "custom-table"
};
------------------------------------------------
JsonExportOptions
Configuration options for JSON exports
Currently inherits all options from ExportOptionsBase.
PageMetadata
Per-page metadata
Contains information about a specific page, such as page number, table count, and word count.
PageText
Text content for a single page
Contains text extracted from a single page of a PDF document.
Includes both the raw text and positioned lines for layout reconstruction.
PdfExtractionOptions
Configuration options for PDF extraction behavior
Provides control over how tables and text are extracted from PDF documents.
Use this class to customize extraction parameters such as text mode, table detection strategy,
and various tolerance values that affect extraction accuracy.
------------------------------------------------
Usage:
var options = new PdfExtractionOptions
{
TextMode = TextExtractionMode.Stream,
TableStrategy = TableDetectionStrategy.Hybrid,
EnableTableExtraction = true,
EnableTextExtraction = true,
CellMergeThreshold = 2.0,
ColumnDetectionSensitivity = 15.0
};
var result = PdfExtractor.Extract("document.pdf", options);
------------------------------------------------
PdfExtractionResult
Represents the output of a PDF extraction operation, containing all extracted
tables and text content, along with document metadata.
Provides convenient methods to access specific portions of the extracted content.
------------------------------------------------
Usage:
var result = PdfExtractor.Extract("document.pdf");
// Access all tables
foreach (var table in result.Tables)
{
Console.WriteLine($"Table on page {table.PageNumber} with {table.RowCount} rows");
}
// Get tables from a specific page
var pageTables = result.GetTablesByPage(5);
// Get text from a specific page
var pageText = result.GetRawTextByPage(5);
// Get full text including tables
var fullText = result.FullText;
------------------------------------------------
PdfExtractor
Provides methods to extract tables and text from PDF documents with various options.
Supports both synchronous and asynchronous extraction operations.
------------------------------------------------
Usage:
// Extract entire document
var result = PdfExtractor.Extract("document.pdf");
// Extract with custom options
var options = new PdfExtractionOptions
{
TableStrategy = TableDetectionStrategy.Hybrid,
EnableTextExtraction = false
};
var tablesOnly = PdfExtractor.Extract("document.pdf", options);
// Extract specific page
var pageResult = PdfExtractor.ExtractPage("document.pdf", 5);
// Extract specific table
var table = PdfExtractor.ExtractTable("document.pdf", 5, 0);
------------------------------------------------
SpanHandlingMode
Enumeration of how to handle cells with rowspan/colspan in exports
Controls how merged cells are represented in different export formats.
TableCell
Represents a table cell with span support
Contains the content and metadata for a cell in a table.
Supports merged cells (spans) across rows and columns.
TableDetectionStrategy
Table detection strategies
Determines which algorithm(s) to use for detecting tables in PDF documents.
TableObject
Represents an extracted table with structural information
Contains the data and metadata for a table extracted from a PDF document.
Provides convenient methods to access table data and structure.
TableRow
Represents a table row
Contains a collection of cells that make up a row in a table.
TextContent
Extracted text content outside of tables
Provides methods to access text content for the entire document or specific pages.
TextExtractionMode
Text extraction modes
Determines how text is extracted from PDF documents.
TxtExportOptions
Configuration options for plain text exports
Currently inherits all options from ExportOptionsBase.
The TXT exporter does not support different span-handling strategies. Regardless of the value provided in SpanMode, the TXT export always behaves as though SpanHandlingMode.Merge is used.
Other values (Repeat, Empty, Annotate) have no effect on TXT output.
XmlExportOptions
Configuration options for XML exports
Controls XML schema inclusion and formatting options.
------------------------------------------------
Usage:
var options = new XmlExportOptions
{
XmlIncludeSchema = true,
XmlPrettyPrint = true
};
------------------------------------------------