Saltar al pie de página
USANDO IRONPDF

Cómo Analizar un Archivo PDF en VB.NET

This tutorial introduces how to programmatically extract texts and images from PDF files with first-class support from IronPDF.

IronPDF

Features

Efficient PDF conversion. Almost anything a machine can do, IronPDF can as well. Thanks to this PDF library, developers can quickly create, read text content, write, load, and manipulate PDF.

IronPDF converts HTML into a PDF record with the aid of using the Chrome engine. Along with Windows Forms, HTML, ASPX, Razor HTML, .NET Core, ASP.NET, Windows Forms, and WPF. IronPDF also supports Xamarin, Blazor, Unity, and HoloLens applications. IronPDF supports both Microsoft .NET and .NET Core applications (Both ASP.NET Web packages and conventional Windows packages). IronPDF can be used to make aesthetically appealing PDFs.

IronPDF can create a PDF using HTML5, JavaScript, CSS, and images. IronPDF also has a powerful HTML-to-PDF converter that integrates with PDF. A strong PDF conversion mechanism is present in IronPDF using the Chromium rendering engine. It is also unconnected to any outside sources.

  • A PDF image can be created from a variety of sources, including HTML, HTML5, ASPX, and Razor/MVC View. Both HTML and image assets can be converted to PDF.
  • Tools that can be used to work with interactive PDFs include filling out and submitting interactive forms.
  • Merge and divide PDFs, extract text and pictures from PDF files, search text in PDF files, rasterize PDFs to images, change font size and convert PDF files.
  • It allows for the verification of HTML login forms using user-agents, proxies, cookies, HTTP headers, and form variables.
  • Accessing secured documents is made possible by IronPDF by giving user names and passwords.
  • IronPDF is a program that reads text in PDF and completes the gaps.
  • Allows to add text, images, bookmarks, watermarks, and more.
  • You can create a PDF file from a CSS file.

For more details, visit this IronPDF licensing information page for a free limited key and professional version.

How to Parse PDF File in VB.NET, Figure 1: IronPDF- Font formatting IronPDF- Font formatting

Extract text from PDF file

IronPDF can also read and extract text from PDF files with the help of the IronPDF libraries. Below is a pattern of IronPDF code that may be used to examine present PDF files.

Extract Text From All Pages

The code example below demonstrates the first method to acquire all the PDF content as a string with just a few lines.

Imports IronPdf

Module Program
    Sub Main(args As String())
        ' Create a PDF Document object from an existing PDF file
        Dim pdfdoc = PdfDocument.FromFile("result.pdf")

        ' Extract all the text from the PDF
        Dim AllText As String = pdfdoc.ExtractAllText()

        ' Output the extracted text to the console
        Console.WriteLine(AllText)
    End Sub
End Module
Imports IronPdf

Module Program
    Sub Main(args As String())
        ' Create a PDF Document object from an existing PDF file
        Dim pdfdoc = PdfDocument.FromFile("result.pdf")

        ' Extract all the text from the PDF
        Dim AllText As String = pdfdoc.ExtractAllText()

        ' Output the extracted text to the console
        Console.WriteLine(AllText)
    End Sub
End Module
VB .NET

The sample code above demonstrates how to use the FromFile method to read a PDF from an existing file and convert it into a PDF document object. The object provides a method called ExtractAllText that will extract plain text from the PDF and turn it into a string.

Extract Text by Page Number

The sample code below shows how to extract data from a PDF file using the page number.

Imports IronPdf

Module Program
    Sub Main(args As String())
        ' Create a PDF Document object from an existing PDF file
        Dim pdfdoc = PdfDocument.FromFile("result.pdf")

        ' Extract text from the first page (page numbers are zero-based)
        Dim AllText As String = pdfdoc.ExtractTextFromPage(0)

        ' Output the extracted text to the console
        Console.WriteLine(AllText)
    End Sub
End Module
Imports IronPdf

Module Program
    Sub Main(args As String())
        ' Create a PDF Document object from an existing PDF file
        Dim pdfdoc = PdfDocument.FromFile("result.pdf")

        ' Extract text from the first page (page numbers are zero-based)
        Dim AllText As String = pdfdoc.ExtractTextFromPage(0)

        ' Output the extracted text to the console
        Console.WriteLine(AllText)
    End Sub
End Module
VB .NET

The code above shows how to read a PDF from an existing file and turn it into a PDF document object using the FromFile function. Texts and images can be accessed on the PDF using this object. The object offers a method called ExtractTextFromPage that allows you to send a page number as a parameter to get a string that contains every word that was on that page of the PDF.

Extract Text Between Pages

The below code shows how to extract the data between multiple pages.

Imports IronPdf

Module Program
    Sub Main(args As String())
        ' Define a list of page numbers from which to extract text
        Dim Pages As List(Of Integer) = New List(Of Integer) From {3, 5, 7}

        ' Create a PDF Document object from an existing PDF file
        Dim pdfdoc = PdfDocument.FromFile("result.pdf")

        ' Extract text from the specified pages
        Dim AllText As String = pdfdoc.ExtractTextFromPages(Pages)

        ' Output the extracted text to the console
        Console.WriteLine(AllText)
    End Sub
End Module
Imports IronPdf

Module Program
    Sub Main(args As String())
        ' Define a list of page numbers from which to extract text
        Dim Pages As List(Of Integer) = New List(Of Integer) From {3, 5, 7}

        ' Create a PDF Document object from an existing PDF file
        Dim pdfdoc = PdfDocument.FromFile("result.pdf")

        ' Extract text from the specified pages
        Dim AllText As String = pdfdoc.ExtractTextFromPages(Pages)

        ' Output the extracted text to the console
        Console.WriteLine(AllText)
    End Sub
End Module
VB .NET

The code above demonstrates how to use the FromFile method to read a PDF from an existing file and convert it into a PDF document object. This object allows examining the text and images in the PDF. The object has a method called ExtractTextFromPages that can be used to get a string that includes all the text content on given pages of the document by passing a list of page numbers as a parameter. Below the left side is the source PDF and the right side is the data extracted.

How to Parse PDF File in VB.NET, Figure 2: Extract text between pages output Extract text between pages output

Extract Image from PDF file

IronPDF provides a list of methods to extract images such as:

Each method allows extracting images from a page or multiple pages of the document.

Imports IronPdf
Imports System.Drawing

Module Program
    Sub Main(args As String())
        ' Create a PDF Document object from an existing PDF file
        Dim pdfdoc = PdfDocument.FromFile("result.pdf")

        ' Extract raw images from the first page
        Dim images = pdfdoc.ExtractRawImagesFromPage(1)

        ' Iterate over extracted images
        For Each imgData As Byte() In images
            ' Create a memory stream from byte data
            Using ms As New IO.MemoryStream(imgData)
                ' Create a Bitmap object from the memory stream
                Dim image = New Bitmap(ms)

                ' Save the image to the specified output directory
                image.Save("output/test.jpg")
            End Using
        Next
    End Sub
End Module
Imports IronPdf
Imports System.Drawing

Module Program
    Sub Main(args As String())
        ' Create a PDF Document object from an existing PDF file
        Dim pdfdoc = PdfDocument.FromFile("result.pdf")

        ' Extract raw images from the first page
        Dim images = pdfdoc.ExtractRawImagesFromPage(1)

        ' Iterate over extracted images
        For Each imgData As Byte() In images
            ' Create a memory stream from byte data
            Using ms As New IO.MemoryStream(imgData)
                ' Create a Bitmap object from the memory stream
                Dim image = New Bitmap(ms)

                ' Save the image to the specified output directory
                image.Save("output/test.jpg")
            End Using
        Next
    End Sub
End Module
VB .NET

The code above shows how to read a document from an existing file and turn it into a PDF document object using the FromFile function. By passing a page number to the object's ExtractRawImagesFromPage method, a list of bytes can be obtained that contains every picture that was present on that page of the document. Using a For Each loop, each byte stream is handled and turned into a memory stream, then into a Bitmap, which aids in picture saving. The below image shows the output from the above code.

How to Parse PDF File in VB.NET, Figure 3: Extract Images from PDF output Extract Images from PDF output

To know more about the IronPDF API code tutorial, refer to the IronPDF documentation. You can also visit other tutorials to learn how to parse PDF text using C#.

Conclusion

The development license for the library IronPDF is gratis. If using IronPDF in a production environment, different licenses can be bought depending on the developer's needs. The Lite plan starts at $799 and has no ongoing costs. SaaS and OEM redistribution alternatives are also provided. All licenses include updates, a year of product support, and a permanent license. They are also useful for manufacturing, staging, and development. It is a one-time purchase. There are additional free, time-limited licenses accessible. Visit the comprehensive IronPDF licensing information to read the complete pricing and licensing details for IronPDF. IronPDF also provides free licenses for copy protection.

Preguntas Frecuentes

¿Cómo puedo extraer texto de un PDF en VB.NET?

Usando la biblioteca IronPDF, puedes extraer texto de un PDF utilizando el método ExtractAllText. Esto te permite recuperar texto de todas las páginas de un documento PDF en tu proyecto VB.NET.

¿Es posible extraer imágenes de páginas específicas de un PDF usando VB.NET?

Sí, IronPDF te permite extraer imágenes de páginas específicas usando su método ExtractRawImagesFromPage. Este método devuelve los datos de imagen como matrices de bytes, que se pueden convertir en archivos de imagen.

¿Cómo puedo convertir contenido HTML en un documento PDF en VB.NET?

IronPDF ofrece una potente conversión de HTML a PDF utilizando el motor de renderizado Chromium. Puedes usar métodos como RenderHtmlAsPdf para convertir cadenas o archivos HTML en documentos PDF de manera eficiente.

¿Cuáles son los beneficios de usar IronPDF para parsear PDFs en aplicaciones VB.NET?

IronPDF proporciona APIs versátiles para extraer texto e imágenes, soporta la conversión de HTML a PDF, y es compatible con varias plataformas .NET, incluyendo ASP.NET, Windows Forms y Blazor. También ofrece diferentes opciones de licencia para satisfacer las necesidades de desarrollo y producción.

¿Cómo integro IronPDF en mi proyecto VB.NET?

Para integrar IronPDF, descarga la biblioteca desde NuGet y añádela a tu proyecto VB.NET. Esto te permitirá acceder a sus métodos para parsear y manipular archivos PDF programáticamente.

¿Puede IronPDF manejar tanto tareas de parseo como de conversión?

Sí, IronPDF está diseñado para manejar tanto tareas de parseo (extracción de texto e imágenes) como de conversión (como HTML a PDF) de manera eficiente, convirtiéndolo en una solución integral para la manipulación de PDF en VB.NET.

¿Qué opciones de licencia están disponibles para IronPDF?

IronPDF ofrece una licencia de desarrollo gratuita y varias licencias de producción, incluyendo Lite, SaaS y redistribución OEM. Estas licencias incluyen actualizaciones y soporte por un año, adaptándose a diferentes necesidades de proyecto.

¿Depende IronPDF de recursos externos para su funcionalidad?

No, IronPDF es autónomo y utiliza internamente el motor de renderizado Chromium, asegurando una funcionalidad robusta sin dependencia de recursos externos para la conversión y parseo de PDF.

¿IronPDF es compatible con .NET 10 y cómo beneficia a los desarrolladores de VB.NET?

Sí, IronPDF es totalmente compatible con .NET 10, así como con versiones anteriores como .NET 9, 8, 7, 6, Core, Standard y Framework. Esto significa que los proyectos VB.NET que utilizan .NET 10 pueden usar IronPDF sin necesidad de configuración adicional. Los desarrolladores se benefician de las nuevas mejoras de rendimiento en tiempo de ejecución de .NET 10, como la reducción de las asignaciones de montón, un mejor tiempo de ejecución y optimizaciones JIT, que optimizan la generación de PDF, la extracción de texto/imagen y la representación de HTML a PDF.

Curtis Chau
Escritor Técnico

Curtis Chau tiene una licenciatura en Ciencias de la Computación (Carleton University) y se especializa en el desarrollo front-end con experiencia en Node.js, TypeScript, JavaScript y React. Apasionado por crear interfaces de usuario intuitivas y estéticamente agradables, disfruta trabajando con frameworks modernos y creando manuales bien ...

Leer más