using IronPdf; // Disable local disk access or cross-origin requests Installation.EnableWebSecurity = true; // Instantiate Renderer var renderer = new ChromePdfRenderer(); // Create a PDF from a HTML string using C# var pdf = renderer.RenderHtmlAsPdf("<h1>Hello World</h1>"); // Export to a file or Stream pdf.SaveAs("output.pdf"); // Advanced Example with HTML Assets // Load external html assets: Images, CSS and JavaScript. // An optional BasePath 'C:\site\assets\' is set as the file location to load assets from var myAdvancedPdf = renderer.RenderHtmlAsPdf("<img src='icons/iron.png'>", @"C:\site\assets\"); myAdvancedPdf.SaveAs("html-with-assets.pdf");

PDF ツール

C++でPDFファイルを読む方法

カーティス・チャウ

更新日:2025年7月28日

PDF（ポータブル・ドキュメント・フォーマット）ファイルは文書交換に広く使用されており、プログラムによってその内容を読み取れることはさまざまなアプリケーションで価値があります。C++でPDFを読み取るために利用できるライブラリには、Poppler、MuPDF、Haru無償PDFライブラリ、Xpdf、Qpdfなどがあります。

この記事では、XpdfコマンドラインツールをC++で使用してPDFファイルを読み取る方法を解説します。Xpdfは、PDFファイルを扱うための幅広いユーティリティを提供しており、テキストコンテンツの抽出も可能です。C++プログラムにXpdfを統合することで、PDFファイルからテキストを抽出してプログラムで処理できます。

Xpdf - コマンドラインツール

Xpdfは、PDF（ポータブル・ドキュメント・フォーマット）ファイル用のツールとライブラリのコレクションを提供するオープンソースソフトウェアスイートです。 Xpdfスイートには、様々なPDF関連機能を実現するコマンドラインユーティリティとC++ライブラリが含まれており、解析、レンダリング、テキスト抽出などが可能です。 Xpdfの主要なコンポーネントには、pdfimages、pdftops、pdfinfo、およびpdftotextがあります。ここでは、pdftotext を使用して PDF 文書を読み込みます。

pdftotext は、PDF ファイルからテキストコンテンツを抽出し、プレーンテキストとして出力するコマンドラインツールです。このツールは、PDFからテキスト情報を抽出してさらに処理や分析する必要がある場合に特に役立ちます。オプションを使用すると、テキストを抽出するページやページの指定もできます。

前提条件

テキストを抽出するPDFリーダープロジェクトを作成するには、次の前提条件を整える必要があります。

システムにGCCやClangなどのC++コンパイラがインストールされていること。 C++プログラミングをサポートする任意のIDEを使用できます。
システムにインストールされたXpdfコマンドラインツール。 Xpdfは、Xpdfサイトから入手可能なPDFユーティリティのコレクションです。Xpdf Websiteからダウンロードしてください。 Xpdfのbinディレクトリを環境変数のパスに設定して、どこからでもコマンドラインツールを使用してアクセスできるようにします。

C++でPDFファイルフォーマットを読むための手順

ステップ1：必要なヘッダーのインクルード

まず、main.cpp ファイルの先頭に必要なヘッダーファイルを追加しましょう。

#include <cstdlib>  // For system call
#include <iostream> // For basic input and output
#include <fstream>  // For file stream operations

#include <cstdlib>  // For system call
#include <iostream> // For basic input and output
#include <fstream>  // For file stream operations

C++

ステップ2：C++コードを書く

PDFドキュメントからテキストコンテンツを抽出するために、Xpdfコマンドラインツールを呼び出すC++コードを書きましょう。以下の input.pdf ファイルを使用します。

C++でPDFファイルを読み込む方法：図1

コード例は次のとおりです：

#include <cstdlib>
#include <iostream>
#include <fstream>

using namespace std;

int main() {
    // Specify the input and output file paths
    string pdfPath = "input.pdf";
    string outputFilePath = "output.txt";

    // Construct the command to run pdftotext
    string command = "pdftotext " + pdfPath + " " + outputFilePath;
    int status = system(command.c_str());

    // Check if the command executed successfully
    if (status == 0) {
        cout << "Text extraction successful." << endl;
    } else {
        cout << "Text extraction failed." << endl;
        return 1; // Exit the program with error code
    }

    // Open the output file to read the extracted text
    ifstream outputFile(outputFilePath);
    if (outputFile.is_open()) {
        string textContent;
        string line;
        while (getline(outputFile, line)) {
            textContent += line + "\n"; // Append each line to the textContent
        }
        outputFile.close();

        // Display the extracted text
        cout << "Text content extracted from PDF document:" << endl;
        cout << textContent << endl;
    } else {
        cout << "Failed to open output file." << endl;
        return 1; // Exit the program with error code
    }

    return 0; // Exit the program successfully
}

#include <cstdlib>
#include <iostream>
#include <fstream>

using namespace std;

int main() {
    // Specify the input and output file paths
    string pdfPath = "input.pdf";
    string outputFilePath = "output.txt";

    // Construct the command to run pdftotext
    string command = "pdftotext " + pdfPath + " " + outputFilePath;
    int status = system(command.c_str());

    // Check if the command executed successfully
    if (status == 0) {
        cout << "Text extraction successful." << endl;
    } else {
        cout << "Text extraction failed." << endl;
        return 1; // Exit the program with error code
    }

    // Open the output file to read the extracted text
    ifstream outputFile(outputFilePath);
    if (outputFile.is_open()) {
        string textContent;
        string line;
        while (getline(outputFile, line)) {
            textContent += line + "\n"; // Append each line to the textContent
        }
        outputFile.close();

        // Display the extracted text
        cout << "Text content extracted from PDF document:" << endl;
        cout << textContent << endl;
    } else {
        cout << "Failed to open output file." << endl;
        return 1; // Exit the program with error code
    }

    return 0; // Exit the program successfully
}

C++

コードの説明

上記のコードでは、入力PDFファイルへのパスを格納する変数としてpdfPathを定義しています。実際の入力PDFドキュメントへの適切なパスに置き換えてください。

また、Xpdfによって生成される出力テキストファイルへのパスを保持する変数outputFilePathを定義します。

このコードは、system 関数を使用して pdftotext コマンドを実行し、入力 PDF ファイルパスと出力テキストファイルパスをコマンドライン引数として渡します。 status 変数は、コマンドの終了ステータスをキャプチャします。

pdftotext が正常に実行された場合 (ステータスが 0 で示されます)、ifstream を使用して出力テキストファイルを開きます。次に、テキストの内容を1行ずつ読み込み、textContent という文字列に格納します。

最終的に、生成された出力ファイルからコンソールに抽出されたテキストコンテンツを出力します。編集可能な出力テキストファイルが不要な場合やディスクスペースを解放したい場合、プログラムの最後にmain関数を終了する前に、次のコマンドを使用してそれを削除します：

remove(outputFilePath.c_str());

remove(outputFilePath.c_str());

C++

ステップ3：プログラムのコンパイルと実行

C++コードをコンパイルし、実行可能ファイルを実行します。 pdftotext が環境変数システムパスに追加されると、そのコマンドは正常に実行されます。プログラムは出力テキストファイルを生成し、PDFドキュメントからテキストコンテンツを抽出します。抽出されたテキストはその後コンソールに表示されます。

出力は次の通りです

C++でPDFファイルを読み込む方法：図2

C#でPDFファイルを読む

IronPDFライブラリ

IronPDFは、PDFドキュメントの操作に強力な機能を提供する人気のC# PDFライブラリです。開発者がプログラムでPDFファイルを作成、編集、変更、および読み取ることを可能にします。

IronPDFライブラリを使用してPDF文書を読むことは、簡単なプロセスです。ライブラリは様々なメソッドとプロパティを提供しており、開発者がPDFページからテキスト、画像、メタデータ、その他のデータを抽出することを可能にします。抽出された情報はアプリケーション内でのさらなる処理、分析、または表示に使用できます。

次に示すコード例では、IronPDFを使ってPDFファイルを読みます：

// Import necessary namespaces
using IronPdf; // For PDF functionalities
using IronSoftware.Drawing; // For handling images
using System.Collections.Generic; // For using the List

// Example of extracting text and images from PDF using IronPDF

// Open a 128-bit encrypted PDF
var pdf = PdfDocument.FromFile("encrypted.pdf", "password");

// Get all text from the PDF
string text = pdf.ExtractAllText();

// Extract all images from the PDF
var allImages = pdf.ExtractAllImages();

// Iterate over each page to extract text and images
for (var index = 0; index < pdf.PageCount; index++) {
    int pageNumber = index + 1;
    text = pdf.ExtractTextFromPage(index);
    List<AnyBitmap> images = pdf.ExtractBitmapsFromPage(index);
    // Perform actions with text and images...
}

// Import necessary namespaces
using IronPdf; // For PDF functionalities
using IronSoftware.Drawing; // For handling images
using System.Collections.Generic; // For using the List

// Example of extracting text and images from PDF using IronPDF

// Open a 128-bit encrypted PDF
var pdf = PdfDocument.FromFile("encrypted.pdf", "password");

// Get all text from the PDF
string text = pdf.ExtractAllText();

// Extract all images from the PDF
var allImages = pdf.ExtractAllImages();

// Iterate over each page to extract text and images
for (var index = 0; index < pdf.PageCount; index++) {
    int pageNumber = index + 1;
    text = pdf.ExtractTextFromPage(index);
    List<AnyBitmap> images = pdf.ExtractBitmapsFromPage(index);
    // Perform actions with text and images...
}

' Import necessary namespaces
Imports IronPdf ' For PDF functionalities
Imports IronSoftware.Drawing ' For handling images
Imports System.Collections.Generic ' For using the List

' Example of extracting text and images from PDF using IronPDF

' Open a 128-bit encrypted PDF
Private pdf = PdfDocument.FromFile("encrypted.pdf", "password")

' Get all text from the PDF
Private text As String = pdf.ExtractAllText()

' Extract all images from the PDF
Private allImages = pdf.ExtractAllImages()

' Iterate over each page to extract text and images
For index = 0 To pdf.PageCount - 1
	Dim pageNumber As Integer = index + 1
	text = pdf.ExtractTextFromPage(index)
	Dim images As List(Of AnyBitmap) = pdf.ExtractBitmapsFromPage(index)
	' Perform actions with text and images...
Next index

$vbLabelText $csharpLabel

PDFドキュメントの読み取り方法の詳細な情報については、IronPDF C# PDFリーディングガイドをご覧ください。

結論

この記事では、Xpdfコマンドラインツールを使用してC++でPDFドキュメントの内容を読む方法を学びました。 XpdfをC++プログラムに統合することで、PDFファイルからテキストコンテンツをプログラムで数秒で抽出できます。このアプローチにより、C++アプリケーション内で抽出されたテキストを処理および分析することが可能になります。

IronPDFは、PDFファイルを読み取りおよび操作することを容易にする強力なC#ライブラリです。その広範な機能、使いやすさ、および信頼性の高いレンダリングエンジンにより、C#プロジェクトでPDFドキュメントを扱う開発者に人気のある選択肢となっています。

IronPDFは開発目的には無料で使用でき、商用利用のための無料トライアル版も提供しています。商用目的で使用する場合は、ライセンスを取得する必要があります。

カーティス・チャウ

今すぐエンジニアリングチームとチャット

テクニカルライター

Curtis Chauは、カールトン大学でコンピュータサイエンスの学士号を取得し、Node.js、TypeScript、JavaScript、およびReactに精通したフロントエンド開発を専門としています。直感的で美しいユーザーインターフェースを作成することに情熱を持ち、Curtisは現代のフレームワークを用いた開発や、構造の良い視覚的に魅力的なマニュアルの作成を楽しんでいます。

開発以外にも、CurtisはIoT（Internet of Things）への強い関心を持ち、ハードウェアとソフトウェアの統合方法を模索しています。余暇には、ゲームをしたりDiscordボットを作成したりして、技術に対する愛情と創造性を組み合わせています。

顧客ハイライト:

開発者スポットライト:

ウェビナー:

30日間無料トライアルを開始

C++でPDFファイルを読む方法

Xpdf - コマンドラインツール

前提条件

C++でPDFファイルフォーマットを読むための手順

ステップ1：必要なヘッダーのインクルード

ステップ2：C++コードを書く

コードの説明

ステップ3：プログラムのコンパイルと実行

出力は次の通りです

C#でPDFファイルを読む

IronPDFライブラリ

結論

アイアンサポートチーム

30日間無料トライアルを開始

C++でPDFファイルを読む方法

Xpdf - コマンドラインツール

前提条件

C++でPDFファイルフォーマットを読むための手順

ステップ1：必要なヘッダーのインクルード

ステップ2：C++コードを書く

コードの説明

ステップ3：プログラムのコンパイルと実行

出力は次の通りです

C#でPDFファイルを読む

IronPDFライブラリ

結論

関連する記事

2025年の最高のPDF訂正ソフトウェアを発見する

iPhone向けのベストPDFリーダー（無料＆有料ツールの比較）

Windows用のベスト無料PDFエディタ（無料＆有料ツールの比較）

次のステップ：30日間の無料トライアルを開始

Thank You

次のステップ：30日間の無料トライアルを開始

Want to deploy IronSuite to a live project for FREE?

What’s included?

世界中の数百万人のエンジニアから信頼されています。

アイアンサポートチーム