푸터 콘텐츠로 바로가기
JAVA용 IRONPDF 사용

Java에서 PDF 파일을 읽는 방법

This article will explore how to create a PDF reader, to open a PDF file in your software application programmatically. To perform this task effectively, IronPDF for Java is one such system library which helps open and read PDF files using the filename in Java programs.

IronPDF

The IronPDF - Java Library is built on top of the already successful .NET Framework. This makes IronPDF a versatile tool for working with PDF documents compared to other class libraries such as Apache PDFBox. It provides the facility to extract and parse content, load text, and load images. It also provides options to customize the PDF pages such as page layout, margins, header and footer, page orientation, and much more.

In addition to this, IronPDF also supports conversion from other file formats, protecting PDFs with a password, digital signing, merging, and splitting PDF documents.

How to Read PDF Files in Java

Prerequisites

To use IronPDF to make a Java PDF reader, it is necessary to ensure that the following components are installed on the computer:

  1. JDK - Java Development Kit is required for building and running Java programs. If it is not installed, download it from the Oracle Website.
  2. IDE - Integrated Development Environment is software that helps write, edit, and debug a program. Download any IDE for Java, e.g., Eclipse, NetBeans, IntelliJ.
  3. Maven - Maven is an automation tool that helps download libraries from the Central Repository. Download it from the Apache Maven Website.
  4. IronPDF - Finally, IronPDF is required to read the PDF file in Java. This needs to be added as a dependency in your Java Maven Project. Include the IronPDF artifact along with the slf4j dependency in the pom.xml file as shown in the example below:

<dependencies>

    <dependency>
        <groupId>com.ironsoftware</groupId>
        <artifactId>ironpdf</artifactId>
        <version>your-version-here</version>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.7.32</version>
    </dependency>
</dependencies>

<dependencies>

    <dependency>
        <groupId>com.ironsoftware</groupId>
        <artifactId>ironpdf</artifactId>
        <version>your-version-here</version>
    </dependency>

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>1.7.32</version>
    </dependency>
</dependencies>
XML

Adding Necessary Imports

Firstly, add the following code on top of the Java source file to reference all the required methods from IronPDF:

import com.ironsoftware.ironpdf.*;
// Necessary imports from IronPDF library
import com.ironsoftware.ironpdf.*;
// Necessary imports from IronPDF library
JAVA

Next, configure IronPDF with a valid license key to use its methods. Invoke the setLicenseKey method in the main method.

License.setLicenseKey("Your license key");
// Set your IronPDF license key - required for full version
License.setLicenseKey("Your license key");
// Set your IronPDF license key - required for full version
JAVA

Note: You can get a free trial license key to create, read, and print PDFs.

Read Existing PDF File in Java

To read PDF files, there must be PDF files, or one can be created. This article will use an already created PDF file. The code is simple and a two-step process to extract text from the document:

// Load the PDF document from file
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
// Extract all text from the PDF
String text = pdf.extractAllText();
// Print the extracted text
System.out.println(text);
// Load the PDF document from file
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
// Extract all text from the PDF
String text = pdf.extractAllText();
// Print the extracted text
System.out.println(text);
JAVA

In the above code, fromFile opens a PDF document. The Paths.get method gets the directory of the file and is ready to extract content from the file. Then, [extractAllText](/java/object-reference/api/com/ironsoftware/ironpdf/PdfDocument.html#extractAllText()) reads all the text in the document.

The output is below:

How to Read PDF File in Java, Figure 1: Reading PDF Text Output Reading PDF Text Output

Read Text from a Specific Page

IronPDF can also read content from a specific page in a PDF. The extractTextFromPage method uses a PageSelection object to accept a range of page(s) from which text will be read.

In the following example, the text is extracted from the second page of the PDF document. PageSelection.singlePage takes the index of the page which needs to be extracted (index starting from 0).

// Load the PDF document from file
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
// Extract text from the second page (page index based, starts at 0, so 1 means second page)
String text = pdf.extractTextFromPage(PageSelection.singlePage(1));
// Print the extracted text from the specified page
System.out.println(text);
// Load the PDF document from file
PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
// Extract text from the second page (page index based, starts at 0, so 1 means second page)
String text = pdf.extractTextFromPage(PageSelection.singlePage(1));
// Print the extracted text from the specified page
System.out.println(text);
JAVA

How to Read PDF File in Java, Figure 2: Reading PDF Text Output Reading PDF Text Output

Other methods available in the PageSelection class which can be used to extract text from various pages include: [firstPage](/java/object-reference/api/com/ironsoftware/ironpdf/edit/PageSelection.html#lastPage()), [lastPage](/java/object-reference/api/com/ironsoftware/ironpdf/edit/PageSelection.html#firstPage()), pageRange, and [allPages](/java/object-reference/api/com/ironsoftware/ironpdf/edit/PageSelection.html#allPages()).

Read Text from a Newly-Generated PDF File

Search text can also be performed from a newly generated PDF file from either an HTML file or URL. The following sample code generates PDFs from URL and extracts all text from the website.

// Generate PDF from a URL
PdfDocument pdf = PdfDocument.renderUrlAsPdf("https://unsplash.com/");
// Extract all text from the generated PDF
String text = pdf.extractAllText();
// Print the extracted text from the URL
System.out.println("Text extracted from the website: " + text);
// Generate PDF from a URL
PdfDocument pdf = PdfDocument.renderUrlAsPdf("https://unsplash.com/");
// Extract all text from the generated PDF
String text = pdf.extractAllText();
// Print the extracted text from the URL
System.out.println("Text extracted from the website: " + text);
JAVA

How to Read PDF File in Java, Figure 3: Read from a New File Read from a New File

IronPDF can also be used to extract images from PDF files.

The complete code is as follows:

import com.ironsoftware.ironpdf.License;
import com.ironsoftware.ironpdf.PdfDocument;
import com.ironsoftware.ironpdf.edit.PageSelection;

import java.io.IOException;
import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) throws IOException {
        // Set the IronPDF license key for commercial use
        License.setLicenseKey("YOUR LICENSE KEY HERE");

        // Read text from a specific page in an existing PDF
        PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
        String text = pdf.extractTextFromPage(PageSelection.singlePage(1));
        System.out.println(text);

        // Read all text from a PDF generated from a URL
        pdf = PdfDocument.renderUrlAsPdf("https://unsplash.com/");
        text = pdf.extractAllText();
        System.out.println("Text extracted from the website: " + text);
    }
}
import com.ironsoftware.ironpdf.License;
import com.ironsoftware.ironpdf.PdfDocument;
import com.ironsoftware.ironpdf.edit.PageSelection;

import java.io.IOException;
import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) throws IOException {
        // Set the IronPDF license key for commercial use
        License.setLicenseKey("YOUR LICENSE KEY HERE");

        // Read text from a specific page in an existing PDF
        PdfDocument pdf = PdfDocument.fromFile(Paths.get("assets/sample.pdf"));
        String text = pdf.extractTextFromPage(PageSelection.singlePage(1));
        System.out.println(text);

        // Read all text from a PDF generated from a URL
        pdf = PdfDocument.renderUrlAsPdf("https://unsplash.com/");
        text = pdf.extractAllText();
        System.out.println("Text extracted from the website: " + text);
    }
}
JAVA

Summary

This article explained how to open and read PDFs in Java using IronPDF.

IronPDF helps easily create PDFs from HTML or URL and convert from different file formats. It also helps in getting PDF tasks done quickly and easily.

Try IronPDF for 30 days with a free trial and find out how well it works for you in production. Explore commercial licensing options for IronPDF which start only from $799.

자주 묻는 질문

Java로 PDF 리더를 만들려면 어떻게 해야 하나요?

IronPDF를 사용하여 `fromFile` 메서드를 사용하여 PDF 문서를 로드한 다음 `extractAllText`와 같은 메서드를 사용하여 콘텐츠를 구문 분석하고 조작함으로써 Java에서 PDF 리더를 만들 수 있습니다.

Java에서 IronPDF를 사용하기 위한 사전 요구 사항을 설치하는 단계는 무엇인가요?

Java에서 IronPDF를 사용하려면 Java 개발 키트(JDK)를 설치하고, Eclipse 또는 IntelliJ와 같은 통합 개발 환경(IDE)을 설정하고, 종속성 관리를 위해 Maven을 구성하고, 프로젝트에 IronPDF 라이브러리를 포함시켜야 합니다.

Java로 된 PDF 파일에서 텍스트를 추출하려면 어떻게 하나요?

IronPDF를 사용하여 Java로 PDF 파일에서 텍스트를 추출하려면 `extractAllText` 메서드를 사용하여 전체 문서의 텍스트를 검색하거나 `extractTextFromPage`를 사용하여 특정 페이지에서 텍스트를 추출할 수 있습니다.

Java로 된 URL에서 PDF를 생성할 수 있나요?

예, IronPDF를 사용하면 웹 콘텐츠를 PDF 형식으로 변환하는 `renderUrlAsPdf` 메서드를 사용하여 URL에서 PDF를 생성할 수 있습니다.

IronPDF는 Java에서 PDF에 비밀번호 보호 기능을 추가하는 기능을 지원하나요?

예, IronPDF는 디지털 서명, 문서 병합 또는 분할과 같은 다른 기능과 함께 PDF에 비밀번호 보호 기능을 추가하는 기능을 지원합니다.

IronPDF는 어떤 파일 형식을 Java에서 PDF로 변환할 수 있나요?

IronPDF는 HTML 및 기타 문서 형식을 포함하여 다양한 파일 형식을 PDF로 변환할 수 있어 PDF 생성 및 조작을 위한 유연한 옵션을 제공합니다.

Java용 IronPDF 평가판이 있나요?

예, IronPDF는 30일 무료 평가판을 제공하므로 라이선스를 구매하기 전에 기능을 테스트하고 Java 애플리케이션에서 성능을 평가할 수 있습니다.

Java 라이브러리를 사용하여 PDF 문서의 특정 페이지에서 텍스트를 추출하려면 어떻게 해야 하나요?

IronPDF를 사용하면 페이지 번호 또는 범위를 지정해야 하는 `extractTextFromPage` 메서드를 사용하여 PDF의 특정 페이지에서 텍스트를 추출할 수 있습니다.

커티스 차우
기술 문서 작성자

커티스 차우는 칼턴 대학교에서 컴퓨터 과학 학사 학위를 취득했으며, Node.js, TypeScript, JavaScript, React를 전문으로 하는 프론트엔드 개발자입니다. 직관적이고 미적으로 뛰어난 사용자 인터페이스를 만드는 데 열정을 가진 그는 최신 프레임워크를 활용하고, 잘 구성되고 시각적으로 매력적인 매뉴얼을 제작하는 것을 즐깁니다.

커티스는 개발 분야 외에도 사물 인터넷(IoT)에 깊은 관심을 가지고 있으며, 하드웨어와 소프트웨어를 통합하는 혁신적인 방법을 연구합니다. 여가 시간에는 게임을 즐기거나 디스코드 봇을 만들면서 기술에 대한 애정과 창의성을 결합합니다.