Extract Text From PDF
As part of IronPDF's extensive collection of PDF creation and editing fucntions, IronPDf also faciliates granular proceessing of a PDF document's content through its content extraction methods.
Available on all
PdfDocument objects is the
extractAllText method. The
extractAllText returns holds all the text contained on every page in the PDF.
This method is a convenient way to perform document-level extraction of text from PDFs containing many pages. To extract text on a page-level (i.e. just from a specific set of pages), use the
extractTextFromPage method instead.
The brief code snippet below pulls the text from the first page of a PDF document.
PdfDocument document = PdfDocument.fromFile(Paths.get("sample.pdf"));
String firstPageText = document.extractTextFromPage(PageSelection.firstPage());