Test in a live environment
Test in production without watermarks.
Works wherever you need it to.
In today’s development world, working with PDFs is a common requirement for applications that need to handle documents, forms, or reports. Whether you're building an e-commerce platform, document management system, or just need to process invoices, extracting and searching text from PDFs can be crucial. This article will guide you through how to use C# string.Contains() with IronPDF to search and extract text from PDF files in your .NET projects.
When performing searches, you may need to perform string comparison based on specific string substring requirements. In such cases, C# offers options such as string.Contains(), which is one of the simplest forms of comparison.
If you need to specify whether you want to ignore case sensitivity or not, you can use the StringComparison enumeration. This allows you to choose the type of string comparison you want—such as ordinal comparison or case-insensitive comparison.
If you want to work with specific positions in the string, such as the first character position or last character position, you can always use Substring to isolate certain portions of the string for further processing.
If you're looking for empty string checks or other edge cases, make sure to handle these scenarios within your logic.
If you're dealing with large documents, it’s useful to optimize the starting position of your text extraction, to only extract relevant portions rather than the entire document. This can be particularly useful if you are trying to avoid overloading memory and processing time.
If you're unsure of the best approach for comparison rules, consider the specific method performs and how you want your search to behave in different scenarios (e.g., matching multiple terms, handling spaces, etc.).
If your needs go beyond simple substring checks and require more advanced pattern matching, consider using regular expressions, which offer significant flexibility when working with PDFs.
If you haven’t already, try IronPDF’s free trial today to explore its capabilities and see how it can streamline your PDF handling tasks. Whether you’re building a document management system, processing invoices, or just need to extract data from PDFs, IronPDF is the perfect tool for the job.
IronPDF is a powerful library designed to help developers working with PDFs in the .NET ecosystem. It enables you to create, read, edit, and manipulate PDF files easily without having to rely on external tools or complex configurations.
IronPDF provides a wide range of features for working with PDFs in C# applications. Some key features include:
IronPDF is designed to be simple to use, but also flexible enough to handle complex scenarios involving PDFs. It works seamlessly with .NET Core and .NET Framework, making it a perfect fit for any .NET-based project.
To use IronPDF, install it via NuGet Package Manager in Visual Studio:
Install-Package IronPdf
Install-Package IronPdf
'INSTANT VB TODO TASK: The following line uses invalid syntax:
'Install-Package IronPdf
Before diving into searching PDFs, let's first understand how to extract text from a PDF using IronPDF.
IronPDF provides a simple API to extract text from PDF documents. This allows you to easily search for specific content within PDFs.
The following example demonstrates how to extract text from a PDF using IronPDF:
using IronPdf;
using System;
public class Program
{
public static void Main(string[] args)
{
PdfDocument pdf = PdfDocument.FromFile("invoice.pdf");
string str = pdf.ExtractAllText();
}
}
using IronPdf;
using System;
public class Program
{
public static void Main(string[] args)
{
PdfDocument pdf = PdfDocument.FromFile("invoice.pdf");
string str = pdf.ExtractAllText();
}
}
Imports IronPdf
Imports System
Public Class Program
Public Shared Sub Main(ByVal args() As String)
Dim pdf As PdfDocument = PdfDocument.FromFile("invoice.pdf")
Dim str As String = pdf.ExtractAllText()
End Sub
End Class
In this example, the ExtractAllText() method extracts all the text from the PDF document. This text can then be processed to search for specific keywords or phrases.
Once you've extracted the text from the PDF, you can use C#'s built-in string.Contains() method to search for specific words or phrases.
The string.Contains() method returns a Boolean value indicating whether a specified string exists within a string. This is particularly useful for basic text searching.
Here’s how you can use string.Contains() to search for a keyword within the extracted text:
bool isFound = text.Contains("search term", StringComparison.OrdinalIgnoreCase);
bool isFound = text.Contains("search term", StringComparison.OrdinalIgnoreCase);
Dim isFound As Boolean = text.Contains("search term", StringComparison.OrdinalIgnoreCase)
Let’s break this down further with a practical example. Suppose you want to find whether a specific invoice number exists in a PDF invoice document.
Here’s a full example of how you could implement this:
using IronPdf;
public class Program
{
public static void Main(string[] args)
{
string searchTerm = "INV-12345";
PdfDocument pdf = PdfDocument.FromFile("exampleInvoice.pdf");
string text = pdf.ExtractAllText();
bool isFound = text.Contains(searchTerm, StringComparison.OrdinalIgnoreCase);
if (isFound)
{
Console.WriteLine($"Invoice number: {searchTerm} found in the document");
}
else
{
Console.WriteLine($"Invoice number {searchTerm} not found in the document");
}
}
}
using IronPdf;
public class Program
{
public static void Main(string[] args)
{
string searchTerm = "INV-12345";
PdfDocument pdf = PdfDocument.FromFile("exampleInvoice.pdf");
string text = pdf.ExtractAllText();
bool isFound = text.Contains(searchTerm, StringComparison.OrdinalIgnoreCase);
if (isFound)
{
Console.WriteLine($"Invoice number: {searchTerm} found in the document");
}
else
{
Console.WriteLine($"Invoice number {searchTerm} not found in the document");
}
}
}
Imports IronPdf
Public Class Program
Public Shared Sub Main(ByVal args() As String)
Dim searchTerm As String = "INV-12345"
Dim pdf As PdfDocument = PdfDocument.FromFile("exampleInvoice.pdf")
Dim text As String = pdf.ExtractAllText()
Dim isFound As Boolean = text.Contains(searchTerm, StringComparison.OrdinalIgnoreCase)
If isFound Then
Console.WriteLine($"Invoice number: {searchTerm} found in the document")
Else
Console.WriteLine($"Invoice number {searchTerm} not found in the document")
End If
End Sub
End Class
In this example:
While string.Contains() works for simple substring searches, you might want to perform more complex searches, such as finding a pattern or a series of keywords. For this, you can use regular expressions.
Here’s an example using a regular expression to search for any valid invoice number format in the PDF text:
using IronPdf;
using System.Text.RegularExpressions;
public class Program
{
public static void Main(string[] args)
{
// Define a regex pattern for a typical invoice number format (e.g., INV-12345)
string pattern = @"INV-\d{5}";
PdfDocument pdf = PdfDocument.FromFile("exampleInvoice.pdf");
string text = pdf.ExtractAllText();
// Perform the regex search
Match match = Regex.Match(text, pattern);
}
}
using IronPdf;
using System.Text.RegularExpressions;
public class Program
{
public static void Main(string[] args)
{
// Define a regex pattern for a typical invoice number format (e.g., INV-12345)
string pattern = @"INV-\d{5}";
PdfDocument pdf = PdfDocument.FromFile("exampleInvoice.pdf");
string text = pdf.ExtractAllText();
// Perform the regex search
Match match = Regex.Match(text, pattern);
}
}
Imports IronPdf
Imports System.Text.RegularExpressions
Public Class Program
Public Shared Sub Main(ByVal args() As String)
' Define a regex pattern for a typical invoice number format (e.g., INV-12345)
Dim pattern As String = "INV-\d{5}"
Dim pdf As PdfDocument = PdfDocument.FromFile("exampleInvoice.pdf")
Dim text As String = pdf.ExtractAllText()
' Perform the regex search
Dim match As Match = Regex.Match(text, pattern)
End Sub
End Class
This code will search for any invoice numbers that follow the pattern INV-XXXXX, where XXXXX is a series of digits.
When working with PDFs, especially large or complex documents, there are a few best practices to keep in mind:
IronPDF integrates easily with .NET projects. After downloading and installing the IronPDF library via NuGet, simply import it into your C# codebase, as shown in the examples above.
IronPDF’s flexibility allows you to build sophisticated document processing workflows, such as:
IronPDF makes working with PDFs easy and efficient, especially when you need to extract and search text in PDFs. By combining C#'s string.Contains() method with IronPDF’s text extraction capabilities, you can quickly search and process PDFs in your .NET applications.
If you haven’t already, try IronPDF’s free trial today to explore its capabilities and see how it can streamline your PDF handling tasks. Whether you’re building a document management system, processing invoices, or just need to extract data from PDFs, IronPDF is the perfect tool for the job.
To get started with IronPDF, download the free trial and experience its powerful PDF manipulation features firsthand. Visit IronPDF’s website to get started today.
9 .NET API products for your office documents