IronPDF How-Tos Extract Text & Images How to Extract Embedded Text and Images from PDFs Chaknith Bin Updated:July 28, 2025 Extracting embedded text and images involves retrieving textual content and graphical elements within the document. This process allows users to access and repurpose the content for editing, searching, or converting text to other formats and saving images for reuse or analysis. To extract text and images from a PDF, use IronPdf. The extracted image can be saved to the disk or converted to another image format and embedded in the newly rendered document. Your business is spending too much on yearly subscriptions for PDF security and compliance. Consider IronSecureDoc, which provides solutions for managing SaaS services like digital signing, redaction, encryption, and protection, all for one-time payment. Explore IronSecureDoc Documentation Get started with IronPDF Start using IronPDF in your project today with a free trial. First Step: Start for Free How to Extract Embedded Text and Images from PDFs Download the IronPdf C# Library Prepare the PDF document for text and image extraction Use the ExtractAllText method to extract text Use the ExtractAllImages method to extract images Specify the particular pages from which to extract text and images Extract Text Example Text extraction can be performed on both newly rendered and existing PDF documents. Use the ExtractAllText method to extract the embedded text from the document. The method will return a string containing all the text in the given PDF. Pages are separated by four consecutive new line characters. Let's use a sample PDF that I have rendered from the Wikipedia website. :path=/static-assets/pdf/content-code-examples/how-to/extract-text-and-images-extract-text.cs using IronPdf; using System.IO; PdfDocument pdf = PdfDocument.FromFile("sample.pdf"); // Extract text string text = pdf.ExtractAllText(); // Export the extracted text to a text file File.WriteAllText("extractedText.txt", text); Imports IronPdf Imports System.IO Private pdf As PdfDocument = PdfDocument.FromFile("sample.pdf") ' Extract text Private text As String = pdf.ExtractAllText() ' Export the extracted text to a text file File.WriteAllText("extractedText.txt", text) $vbLabelText $csharpLabel Extract Text by Line and Character Within each PDF page, it is possible to retrieve the coordinates of text lines and characters. First, select a page from the PDF and access the Lines and Characters properties. The coordinates are laid out as Top, Right, Bottom, and Left values, representing the position of the text. :path=/static-assets/pdf/content-code-examples/how-to/extract-text-and-images-extract-text-by-line-character.cs using IronPdf; using System.IO; using System.Linq; // Open PDF from file PdfDocument pdf = PdfDocument.FromFile("sample.pdf"); // Extract text by lines var lines = pdf.Pages[0].Lines; // Extract text by characters var characters = pdf.Pages[0].Characters; File.WriteAllLines("lines.txt", lines.Select(l => $"at Y={l.BoundingBox.Bottom:F2}: {l.Contents}")); Imports IronPdf Imports System.IO Imports System.Linq ' Open PDF from file Private pdf As PdfDocument = PdfDocument.FromFile("sample.pdf") ' Extract text by lines Private lines = pdf.Pages(0).Lines ' Extract text by characters Private characters = pdf.Pages(0).Characters File.WriteAllLines("lines.txt", lines.Select(Function(l) $"at Y={l.BoundingBox.Bottom:F2}: {l.Contents}")) $vbLabelText $csharpLabel Extract Images Example Use the ExtractAllImages method to extract all images embedded in the document. The method will return the images as a list of AnyBitmap objects. Using the same document from our previous example, we extracted the images and exported them to the 'images' folder. :path=/static-assets/pdf/content-code-examples/how-to/extract-text-and-images-extract-image.cs using IronPdf; PdfDocument pdf = PdfDocument.FromFile("sample.pdf"); // Extract images var images = pdf.ExtractAllImages(); for(int i = 0; i < images.Count; i++) { // Export the extracted images images[i].SaveAs($"images/image{i}.png"); } Imports IronPdf Private pdf As PdfDocument = PdfDocument.FromFile("sample.pdf") ' Extract images Private images = pdf.ExtractAllImages() For i As Integer = 0 To images.Count - 1 ' Export the extracted images images(i).SaveAs($"images/image{i}.png") Next i $vbLabelText $csharpLabel In addition to the ExtractAllImages method shown above, the user can use the ExtractAllBitmaps and ExtractAllRawImages methods to extract image information from the document. While the ExtractAllBitmaps method will return a List of AnyBitmap, like the code example, the ExtractAllRawImages method extracts all images from a PDF document and returns them as raw data in the form of Byte Arrays (byte[]). Extract Text and Images on Specific Pages Both text and image extraction can be performed on single or multiple specified pages. Use the ExtractTextFromPage and ExtractTextFromPages methods to extract text from a single page or multiple pages, respectively. For extracting images, use the ExtractImagesFromPage and ExtractImagesFromPages methods. :path=/static-assets/pdf/content-code-examples/how-to/extract-text-and-images-extract-text-single-multiple.cs using IronPdf; PdfDocument pdf = PdfDocument.FromFile("sample.pdf"); // Extract text from page 1 string textFromPage1 = pdf.ExtractTextFromPage(0); int[] pages = new[] { 0, 2 }; // Extract text from pages 1 & 3 string textFromPage1_3 = pdf.ExtractTextFromPages(pages); Imports IronPdf Private pdf As PdfDocument = PdfDocument.FromFile("sample.pdf") ' Extract text from page 1 Private textFromPage1 As String = pdf.ExtractTextFromPage(0) Private pages() As Integer = { 0, 2 } ' Extract text from pages 1 & 3 Private textFromPage1_3 As String = pdf.ExtractTextFromPages(pages) $vbLabelText $csharpLabel Frequently Asked Questions How can I extract embedded text from a PDF in .NET C#? You can use the ExtractAllText method from the IronPdf library to extract embedded text from a PDF. This method returns a string containing the text separated by four consecutive new line characters for each page. What steps are involved in extracting images from a PDF using C#? To extract images from a PDF in C#, first download the IronPdf library via NuGet. Then use the ExtractAllImages method, which will return a list of AnyBitmap objects representing the images. Can I extract text from specific pages of a PDF document? Yes, you can use the ExtractTextFromPage and ExtractTextFromPages methods in IronPdf to extract text from specific pages or multiple pages of a PDF document. What is the purpose of extracting text by line and character coordinates? Extracting text by line and character coordinates allows you to retrieve the exact position of text within a PDF page. This can be done using the **Lines** and **Characters** properties in IronPdf, which provide Top, Right, Bottom, and Left values. How do I extract images in raw format from a PDF? To extract images in raw format, use the ExtractAllRawImages method in IronPdf. This method returns the images as Byte Arrays, allowing you to access the original image data. What are the benefits of using IronPdf for extracting text and images? Using IronPdf for extracting text and images from PDFs is cost-effective as it offers a one-time payment solution. It helps in repurposing content for editing, searching, conversion to other formats, and reusing images for analysis. How can I begin using IronPdf for PDF content extraction? To start using IronPdf, download the IronPdf C# Library from NuGet, and follow the guide to prepare your PDF document and use methods like ExtractAllText and ExtractAllImages for content extraction. Is it possible to extract both text and images from a single PDF page? Yes, IronPdf allows you to extract both text and images from a single PDF page using the ExtractTextFromPage and ExtractImagesFromPage methods. What methods are available for extracting images from multiple pages? You can use the ExtractImagesFromPages method in IronPdf to extract images from multiple pages of a PDF document. Chaknith Bin Chat with engineering team now Software Engineer Chaknith works on IronXL and IronBarcode. He has deep expertise in C# and .NET, helping improve the software and support customers. His insights from user interactions contribute to better products, documentation, and overall experience. Ready to Get Started? Free NuGet Download Total downloads: 14,993,319 View Licenses