Test in production without watermarks.
Works wherever you need it to.
Get 30 days of fully functional product.
Have it up and running in minutes.
Full access to our support engineering team during your product trial
This article will explore how to view PDF files in Python using the IronPDF library.
IronPDF is a powerful Python library that enables developers to work with PDF files programmatically. With IronPDF, you can easily generate, manipulate, and extract data from PDF documents, making it a versatile tool for various PDF-related tasks. Whether you need to create PDFs from scratch, modify existing PDFs, or extract content from PDFs, IronPDF provides a comprehensive set of features to simplify your workflow.
Some features of the IronPDF for Python library include:
Note: IronPDF produces a watermarked PDF data file. To remove the watermark, you need to license IronPDF. If you wish to use a licensed version of IronPDF, visit the IronPDF website to obtain a license key.
Before working with IronPDF in Python, there are a few prerequisites:
IronPDF Library: Install the IronPDF library to access its functionality. You can install it using the Python package manager (pip) by executing the following command in your command-line interface:
pip install ironpdf
pip install ironpdf
Tkinter Library: Tkinter is the standard GUI toolkit for Python. It is used for creating the graphical user interface for the PDF viewer in the provided code snippet. Tkinter usually comes pre-installed with Python, but if you encounter any issues, you can install it using the package manager:
pip install tkinter
pip install tkinter
Pillow Library: The Pillow library is a fork of the Python Imaging Library (PIL) and provides additional image processing capabilities. It is used in the code snippet to load and display the images extracted from the PDF. Install Pillow using the package manager:
pip install pillow
pip install pillow
After installing PyCharm IDE, create a PyCharm Python project by following the steps below:
Create a New Project: Click on "Create New Project" or open an existing Python project.
PyCharm IDE
Configure Project Settings: Provide a name for your project and choose the location to create the project directory. Select the Python interpreter for your project. Then click "Create".
Create a new Python project
To begin, import the necessary libraries. In this case, the os
, shutil
, ironpdf
, tkinter
, and PIL
libraries will be needed. The os
and shutil
libraries are used for file and folder operations, ironpdf
is the library for working with PDF files, tkinter
is used for creating the graphical user interface (GUI), and PIL is used for image manipulation.
import os
import shutil
import ironpdf
from tkinter import *
from PIL import Image, ImageTk
import os
import shutil
import ironpdf
from tkinter import *
from PIL import Image, ImageTk
Next, define a function called convert_pdf_to_images
. This function takes the path of the PDF file as input. Inside the function, the IronPDF library is used to load the PDF document from the file. Then, specify a folder path to store the extracted image files. IronPDF's pdf.RasterizeToImageFiles
method is used to convert each PDF page of the PDF to an image file and save it in the specified folder. A list is used to store the image paths. The complete code example is as follows:
def convert_pdf_to_images(pdf_file):
"""Convert each page of a PDF file to an image."""
pdf = ironpdf.PdfDocument.FromFile(pdf_file)
# Extract all pages to a folder as image files
folder_path = "images"
pdf.RasterizeToImageFiles(os.path.join(folder_path, "*.png"))
# List to store the image paths
image_paths = []
# Get the list of image files in the folder
for filename in os.listdir(folder_path):
if filename.lower().endswith((".png", ".jpg", ".jpeg", ".gif")):
image_paths.append(os.path.join(folder_path, filename))
return image_paths
def convert_pdf_to_images(pdf_file):
"""Convert each page of a PDF file to an image."""
pdf = ironpdf.PdfDocument.FromFile(pdf_file)
# Extract all pages to a folder as image files
folder_path = "images"
pdf.RasterizeToImageFiles(os.path.join(folder_path, "*.png"))
# List to store the image paths
image_paths = []
# Get the list of image files in the folder
for filename in os.listdir(folder_path):
if filename.lower().endswith((".png", ".jpg", ".jpeg", ".gif")):
image_paths.append(os.path.join(folder_path, filename))
return image_paths
To extract text from PDF documents, visit this code examples page.
To clean up the extracted image files when the application window is closed, define an on_closing
function. Inside this function, use the shutil.rmtree()
method to delete the entire images
folder. Next, set this function as the protocol to be executed when the window is closed. The following code helps achieve the task:
def on_closing():
"""Handle the window closing event by cleaning up the images."""
# Delete the images in the 'images' folder
shutil.rmtree("images")
window.destroy()
window.protocol("WM_DELETE_WINDOW", on_closing)
def on_closing():
"""Handle the window closing event by cleaning up the images."""
# Delete the images in the 'images' folder
shutil.rmtree("images")
window.destroy()
window.protocol("WM_DELETE_WINDOW", on_closing)
Now, let's create the main GUI window using the Tk()
constructor, set the window title to "Image Viewer," and set the on_closing()
function as the protocol to handle window closure.
window = Tk()
window.title("Image Viewer")
window.protocol("WM_DELETE_WINDOW", on_closing)
window = Tk()
window.title("Image Viewer")
window.protocol("WM_DELETE_WINDOW", on_closing)
To display the images and enable scrolling, create a Canvas
widget. The Canvas
widget is configured to fill the available space and expand in both directions using pack(side=LEFT, fill=BOTH, expand=True)
. Additionally, create a Scrollbar
widget and configure it to control the vertical scrolling of all the pages and canvas.
canvas = Canvas(window)
canvas.pack(side=LEFT, fill=BOTH, expand=True)
scrollbar = Scrollbar(window, command=canvas.yview)
scrollbar.pack(side=RIGHT, fill=Y)
canvas.configure(yscrollcommand=scrollbar.set)
# Update the scrollregion to encompass the entire canvas
canvas.bind("<Configure>", lambda e: canvas.configure(
scrollregion=canvas.bbox("all")))
# Configure the vertical scrolling using mouse wheel
canvas.bind_all("<MouseWheel>", lambda e: canvas.yview_scroll(
int(-1*(e.delta/120)), "units"))
canvas = Canvas(window)
canvas.pack(side=LEFT, fill=BOTH, expand=True)
scrollbar = Scrollbar(window, command=canvas.yview)
scrollbar.pack(side=RIGHT, fill=Y)
canvas.configure(yscrollcommand=scrollbar.set)
# Update the scrollregion to encompass the entire canvas
canvas.bind("<Configure>", lambda e: canvas.configure(
scrollregion=canvas.bbox("all")))
# Configure the vertical scrolling using mouse wheel
canvas.bind_all("<MouseWheel>", lambda e: canvas.yview_scroll(
int(-1*(e.delta/120)), "units"))
Next, create a Frame
widget inside the canvas to hold the images by using create_window()
to place the frame within the canvas. The (0, 0)
coordinates and anchor='nw'
parameter ensure that the frame starts in the top left corner of the canvas.
frame = Frame(canvas)
canvas.create_window((0, 0), window=frame, anchor="nw")
frame = Frame(canvas)
canvas.create_window((0, 0), window=frame, anchor="nw")
The next step is to call the convert_pdf_to_images()
function with the file path name of the input PDF file. This function extracts the PDF pages as images and returns a list of image paths. By iterating through the image paths and loading each image using the Image.open()
method from the PIL library, a PhotoImage
object is created using ImageTk.PhotoImage()
. Then create a Label
widget to display the image.
images = convert_pdf_to_images("input.pdf")
# Load and display the images in the Frame
for image_path in images:
image = Image.open(image_path)
photo = ImageTk.PhotoImage(image)
label = Label(frame, image=photo)
label.image = photo # Store a reference to prevent garbage collection
label.pack(pady=10)
images = convert_pdf_to_images("input.pdf")
# Load and display the images in the Frame
for image_path in images:
image = Image.open(image_path)
photo = ImageTk.PhotoImage(image)
label = Label(frame, image=photo)
label.image = photo # Store a reference to prevent garbage collection
label.pack(pady=10)
The input file
Finally, let's run the main event loop using window.mainloop()
. This ensures that the GUI window remains open and responsive until it is closed by the user.
window.mainloop()
window.mainloop()
The UI output
This tutorial explored how to view PDF documents in Python using the IronPDF library. It covered the steps required to open a PDF file and convert it to a series of image files, then display them in a scrollable canvas, and handle the cleanup of extracted images when the application is closed.
For more details on the IronPDF for Python library, please refer to the documentation.
Download and install IronPDF for Python library and also get a free trial to test out its complete functionality in commercial development.
IronPDF is a powerful Python library that allows developers to programmatically work with PDF files. It offers features such as generating, manipulating, and extracting data from PDF documents.
You can install IronPDF using the Python package manager pip by executing the command: `pip install ironpdf`.
The prerequisites include having Python installed (version 3.x), installing the IronPDF library, along with the Tkinter and Pillow libraries. An Integrated Development Environment (IDE) like PyCharm is also recommended.
IronPDF allows you to secure PDF files with passwords and restrictions, enhancing document security.
Tkinter is used to create the graphical user interface (GUI) for the PDF viewer application, enabling the display of PDF pages as images in a scrollable window.
Yes, IronPDF can convert PDF files to other formats, providing versatility in handling PDF documents.
To remove the watermark from PDFs, you need to obtain a license for IronPDF. You can visit the IronPDF website to acquire a license key.
The Pillow library, a fork of the Python Imaging Library (PIL), is used for additional image processing capabilities, including loading and displaying images extracted from PDF files.
Launch PyCharm, create a new project by clicking 'Create New Project', configure the project settings by providing a name and selecting a Python interpreter, then start developing your IronPDF project.
The 'convert_pdf_to_images' function is used to convert each page of a PDF file into an image, which can then be displayed in a GUI application.