Test in production without watermarks.
Works wherever you need it to.
Get 30 days of fully functional product.
Have it up and running in minutes.
Full access to our support engineering team during your product trial
This tutorial introduces how to programmatically extract texts and images from PDF files with first-class support from IronPDF.
Efficient PDF conversion. Almost anything a machine can do, IronPDF can as well. Thanks to this PDF library, developers can quickly create, read text content, write, load, and manipulate PDF.
IronPDF converts HTML into a PDF record with the aid of using the Chrome engine. Along with Windows Forms, HTML, ASPX, Razor HTML, .NET Core, ASP.NET, Windows Forms, and WPF. IronPDF also supports Xamarin, Blazor, Unity, and HoloLens applications. IronPDF supports both Microsoft .NET and .NET Core applications (Both ASP.NET Web packages and conventional Windows packages). IronPDF can be used to make aesthetically appealing PDFs.
IronPDF can create a PDF using HTML5, JavaScript, CSS, and images. IronPDF also has a powerful HTML-to-PDF converter that integrates with PDF. A strong PDF conversion mechanism is present in IronPDF using the Chromium rendering engine. It is also unconnected to any outside sources.
For more details, visit this IronPDF licensing information page for a free limited key and professional version.
IronPDF- Font formatting
IronPDF can also read and extract text from PDF files with the help of the IronPDF libraries. Below is a pattern of IronPDF code that may be used to examine present PDF files.
The code example below demonstrates the first method to acquire all the PDF content as a string with just a few lines.
Imports IronPdf
Module Program
Sub Main(args As String())
' Create a PDF Document object from an existing PDF file
Dim pdfdoc = PdfDocument.FromFile("result.pdf")
' Extract all the text from the PDF
Dim AllText As String = pdfdoc.ExtractAllText()
' Output the extracted text to the console
Console.WriteLine(AllText)
End Sub
End Module
Imports IronPdf
Module Program
Sub Main(args As String())
' Create a PDF Document object from an existing PDF file
Dim pdfdoc = PdfDocument.FromFile("result.pdf")
' Extract all the text from the PDF
Dim AllText As String = pdfdoc.ExtractAllText()
' Output the extracted text to the console
Console.WriteLine(AllText)
End Sub
End Module
The sample code above demonstrates how to use the FromFile
method to read a PDF from an existing file and convert it into a PDF document object. The object provides a method called ExtractAllText
that will extract plain text from the PDF and turn it into a string.
The sample code below shows how to extract data from a PDF file using the page number.
Imports IronPdf
Module Program
Sub Main(args As String())
' Create a PDF Document object from an existing PDF file
Dim pdfdoc = PdfDocument.FromFile("result.pdf")
' Extract text from the first page (page numbers are zero-based)
Dim AllText As String = pdfdoc.ExtractTextFromPage(0)
' Output the extracted text to the console
Console.WriteLine(AllText)
End Sub
End Module
Imports IronPdf
Module Program
Sub Main(args As String())
' Create a PDF Document object from an existing PDF file
Dim pdfdoc = PdfDocument.FromFile("result.pdf")
' Extract text from the first page (page numbers are zero-based)
Dim AllText As String = pdfdoc.ExtractTextFromPage(0)
' Output the extracted text to the console
Console.WriteLine(AllText)
End Sub
End Module
The code above shows how to read a PDF from an existing file and turn it into a PDF document object using the FromFile
function. Texts and images can be accessed on the PDF using this object. The object offers a method called ExtractTextFromPage
that allows you to send a page number as a parameter to get a string that contains every word that was on that page of the PDF.
The below code shows how to extract the data between multiple pages.
Imports IronPdf
Module Program
Sub Main(args As String())
' Define a list of page numbers from which to extract text
Dim Pages As List(Of Integer) = New List(Of Integer) From {3, 5, 7}
' Create a PDF Document object from an existing PDF file
Dim pdfdoc = PdfDocument.FromFile("result.pdf")
' Extract text from the specified pages
Dim AllText As String = pdfdoc.ExtractTextFromPages(Pages)
' Output the extracted text to the console
Console.WriteLine(AllText)
End Sub
End Module
Imports IronPdf
Module Program
Sub Main(args As String())
' Define a list of page numbers from which to extract text
Dim Pages As List(Of Integer) = New List(Of Integer) From {3, 5, 7}
' Create a PDF Document object from an existing PDF file
Dim pdfdoc = PdfDocument.FromFile("result.pdf")
' Extract text from the specified pages
Dim AllText As String = pdfdoc.ExtractTextFromPages(Pages)
' Output the extracted text to the console
Console.WriteLine(AllText)
End Sub
End Module
The code above demonstrates how to use the FromFile
method to read a PDF from an existing file and convert it into a PDF document object. This object allows examining the text and images in the PDF. The object has a method called ExtractTextFromPages
that can be used to get a string that includes all the text content on given pages of the document by passing a list of page numbers as a parameter. Below the left side is the source PDF and the right side is the data extracted.
Extract text between pages output
IronPDF provides a list of methods to extract images such as:
ExtractBitmapsFromPage
ExtractBitmapsFromPages
ExtractImagesFromPage
ExtractImagesFromPages
ExtractRawImagesFromPage
ExtractRawImagesFromPages
Each method allows extracting images from a page or multiple pages of the document.
Imports IronPdf
Imports System.Drawing
Module Program
Sub Main(args As String())
' Create a PDF Document object from an existing PDF file
Dim pdfdoc = PdfDocument.FromFile("result.pdf")
' Extract raw images from the first page
Dim images = pdfdoc.ExtractRawImagesFromPage(1)
' Iterate over extracted images
For Each imgData As Byte() In images
' Create a memory stream from byte data
Using ms As New IO.MemoryStream(imgData)
' Create a Bitmap object from the memory stream
Dim image = New Bitmap(ms)
' Save the image to the specified output directory
image.Save("output/test.jpg")
End Using
Next
End Sub
End Module
Imports IronPdf
Imports System.Drawing
Module Program
Sub Main(args As String())
' Create a PDF Document object from an existing PDF file
Dim pdfdoc = PdfDocument.FromFile("result.pdf")
' Extract raw images from the first page
Dim images = pdfdoc.ExtractRawImagesFromPage(1)
' Iterate over extracted images
For Each imgData As Byte() In images
' Create a memory stream from byte data
Using ms As New IO.MemoryStream(imgData)
' Create a Bitmap object from the memory stream
Dim image = New Bitmap(ms)
' Save the image to the specified output directory
image.Save("output/test.jpg")
End Using
Next
End Sub
End Module
The code above shows how to read a document from an existing file and turn it into a PDF document object using the FromFile
function. By passing a page number to the object's ExtractRawImagesFromPage
method, a list of bytes can be obtained that contains every picture that was present on that page of the document. Using a For Each
loop, each byte stream is handled and turned into a memory stream, then into a Bitmap
, which aids in picture saving. The below image shows the output from the above code.
Extract Images from PDF output
To know more about the IronPDF API code tutorial, refer to the IronPDF documentation. You can also visit other tutorials to learn how to parse PDF text using C#.
The development license for the library IronPDF is gratis. If using IronPDF in a production environment, different licenses can be bought depending on the developer's needs. The Lite plan starts at $749 and has no ongoing costs. SaaS and OEM redistribution alternatives are also provided. All licenses include updates, a year of product support, and a permanent license. They are also useful for manufacturing, staging, and development. It is a one-time purchase. There are additional free, time-limited licenses accessible. Visit the comprehensive IronPDF licensing information to read the complete pricing and licensing details for IronPDF. IronPDF also provides free licenses for copy protection.
IronPDF is a library that provides first-class support for parsing PDF files in VB.NET. It allows developers to extract embedded text and images using versatile APIs.
You can extract text from all pages of a PDF by using the IronPDF library and its `ExtractAllText` method on a PDF document object created from a file.
Yes, IronPDF provides the `ExtractTextFromPages` method that allows extraction of text from specified pages by passing a list of page numbers.
IronPDF provides multiple methods like `ExtractRawImagesFromPage` to extract images from specific pages of a PDF. It returns image data as byte arrays which can be converted to image files.
Yes, IronPDF supports conversion of HTML, including HTML5, ASPX, and Razor/MVC View, into PDF using its powerful HTML-to-PDF converter based on the Chromium rendering engine.
IronPDF is compatible with various platforms and applications including Microsoft .NET, .NET Core, ASP.NET Web, Windows Forms, WPF, Xamarin, Blazor, Unity, and HoloLens.
IronPDF offers a development license for free. For production environments, there are various licenses available, including Lite, SaaS, and OEM redistribution. All licenses include updates and a year of support.
To start using IronPDF in your VB.NET project, you need to download the IronPDF library from NuGet and integrate it into your project to access its PDF parsing functions.
Yes, IronPDF can handle both text and image extraction from PDF files using its API methods designed for these purposes.
No, IronPDF's conversion capabilities are not tied to external resources as it uses the Chromium rendering engine internally for PDF conversion.