PDF 工具

如何使用 Puppeteer 在 Node.js 中将 HTML 转换为 PDF

已更新:七月 28, 2025

In today's digital world, it is crucial to have the ability to convert web pages or HTML documents into PDF files. This can be useful for generating reports, creating invoices, or simply sharing information in a more presentable format. In this blog post, we will explore how to convert HTML pages to PDF using Node.js and Puppeteer, an open-source library developed by Google.

Introduction to Puppeteer

Puppeteer is a powerful Node.js library that allows developers to control headless browsers, mainly Google Chrome or Chromium, and perform various actions like web scraping, taking screenshots, and generating PDFs. Puppeteer provides an extensive API to interact with the browser, making it an excellent choice for converting HTML to PDF.

Why Puppeteer?

Ease of use: Puppeteer offers a simple and easy-to-use API that abstracts away the complexities of working with headless browsers.
Powerful: Puppeteer provides extensive capabilities for manipulating web pages and interacting with browser elements.
Scalable: With Puppeteer, you can easily scale your PDF generation process by running multiple browser instances in parallel.

Setting Up Your NodeJS Project

Before we begin, you'll need to set up a new NodeJS project. Follow these steps to get started:

Install NodeJS if you haven't already (you can download it from the NodeJS website).
Create a new folder for your project and open it in Visual Studio Code or any specific code editor.
Run npm init to create a new package.json file for your project. Follow the prompts and fill in the required information.
Install Puppeteer by running npm install puppeteer.

Now that we have our project set up, let's dive into the code.

Loading HTML Template and Converting to PDF File

To convert HTML templates to a PDF file using Puppeteer, follow these steps:

Create a file named "HTML To PDF.js" in the folder.

Importing Puppeteer and fs

const puppeteer = require('puppeteer');
const fs = require('fs');

The code starts by importing two essential libraries: puppeteer, a versatile tool for controlling headless browsers like Chrome and Chromium, and fs, a built-in NodeJS module for handling file system operations. Puppeteer enables you to automate a wide range of web-based tasks, including rendering HTML, capturing screenshots, and generating PDF files.

Defining the exportWebsiteAsPdf Function

async function exportWebsiteAsPdf(html, outputPath) {
  // Create a browser instance
  const browser = await puppeteer.launch({
    headless: true // Launches the browser in headless mode
  });

  // Create a new page
  const page = await browser.newPage();

  // Set the HTML content for the page, waiting for DOM content to load
  await page.setContent(html, { waitUntil: 'domcontentloaded' });

  // To reflect CSS used for screens instead of print
  await page.emulateMediaType('screen');

  // Download the PDF
  const PDF = await page.pdf({
    path: outputPath,
    margin: { top: '100px', right: '50px', bottom: '100px', left: '50px' },
    printBackground: true,
    format: 'A4',
  });

  // Close the browser instance
  await browser.close();

  return PDF;
}

The exportWebsiteAsPdf function serves as the core of our code snippet. This asynchronous function accepts an html string and an outputPath as input parameters and returns a PDF file. The function performs the following steps:

Launches a new headless browser instance using Puppeteer.
Creates a new browser page.
Sets the provided html string as the page content, waiting for the DOM content to load.
Emulates the 'screen' media type to apply the CSS used for screens instead of print-specific styles.
Generates a PDF file from the loaded HTML content, specifying margins, background printing, and format (A4).
Closes the browser instance.
Returns the created PDF file.

Using the exportWebsiteAsPdf Function

// Usage example
// Get HTML content from HTML file
const html = fs.readFileSync('test.html', 'utf-8');

// Convert the HTML content into a PDF and save it to the specified path
exportWebsiteAsPdf(html, 'result.pdf').then(() => {
  console.log('PDF created successfully.');
}).catch((error) => {
  console.error('Error creating PDF:', error);
});

The last section of the code illustrates how to use the exportWebsiteAsPdf function. We perform the following steps:

Read the HTML content from an HTML file using the fs module's readFileSync method.
Call the exportWebsiteAsPdf function with the loaded html string and the desired outputPath.
Utilize a .then block to handle the successful PDF creation, logging a success message to the console.
Employ a .catch block to manage any errors that occur during the HTML to PDF conversion process, logging an error message to the console.

This code snippet provides a comprehensive example of how to convert an HTML template to a PDF file using NodeJS and Puppeteer. By implementing this solution, you can efficiently generate high-quality PDFs, meeting the needs of various applications and users.

How to Convert HTML to PDF in Node.js: Figure 3

Converting URLs to PDF Files

In addition to converting HTML templates, Puppeteer also allows you to convert URLs directly into PDF files.

Importing Puppeteer

const puppeteer = require('puppeteer');

The code starts by importing the Puppeteer library, which is a powerful tool for controlling headless browsers like Chrome and Chromium. Puppeteer allows you to automate a variety of web-based tasks, including rendering your HTML code, capturing screenshots, and in our case, generating PDF files.

Defining the exportWebsiteAsPdf Function

async function exportWebsiteAsPdf(websiteUrl, outputPath) {
  // Create a browser instance
  const browser = await puppeteer.launch({
    headless: true // Launches the browser in headless mode
  });

  // Create a new page
  const page = await browser.newPage();

  // Open the URL in the current page
  await page.goto(websiteUrl, { waitUntil: 'networkidle0' });

  // To reflect CSS used for screens instead of print
  await page.emulateMediaType('screen');

  // Download the PDF
  const PDF = await page.pdf({
    path: outputPath,
    margin: { top: '100px', right: '50px', bottom: '100px', left: '50px' },
    printBackground: true,
    format: 'A4',
  });

  // Close the browser instance
  await browser.close();

  return PDF;
}

The exportWebsiteAsPdf function is the core of our code snippet. This asynchronous function accepts a websiteUrl and an outputPath as its input parameters and returns a PDF file. The function performs the following steps:

Launches a new headless browser instance using Puppeteer.
Creates a new browser page.
Navigates to the provided websiteUrl and waits for the network to become idle using the waitUntil option set to networkidle0.
Emulates the 'screen' media type to ensure the CSS used for screens is applied instead of print-specific styles.
Converts the loaded web page to a PDF file with the specified margins, background printing, and format (A4).
Closes the browser instance.
Returns the generated PDF file.

Using the exportWebsiteAsPdf Function

// Usage example
// Convert the URL content into a PDF and save it to the specified path
exportWebsiteAsPdf('https://ironpdf.com/', 'result.pdf').then(() => {
  console.log('PDF created successfully.');
}).catch((error) => {
  console.error('Error creating PDF:', error);
});

The final section of the code demonstrates how to use the exportWebsiteAsPdf function. We execute the following steps:

Call the exportWebsiteAsPdf function with the desired websiteUrl and outputPath.
Use a then block to handle the successful PDF creation. In this block, we log a success message to the console.
Use a catch block to handle any errors that occur during the website to PDF conversion process. If an error occurs, we log an error message to the console.

By integrating this code snippet into your projects, you can effortlessly convert URLs into high-quality PDF files using NodeJS and Puppeteer.

How to Convert HTML to PDF in Node.js: Figure 4

Best HTML To PDF Library for C# Developers

Explore IronPDF is a popular .NET library used for generating, editing, and extracting content from PDF files. It provides a simple and efficient solution for creating PDFs from HTML, text, images, and existing PDF documents. IronPDF supports .NET Core, .NET Framework, and .NET 5.0+ projects, making it a versatile choice for various applications.

IronPDF 的主要功能

HTML to PDF Conversion with IronPDF: IronPDF allows you to convert HTML content, including CSS, to PDF files. This feature enables you to create pixel-perfect PDF documents from web pages or HTML templates.

URL Rendering: IronPDF can fetch web pages directly from a server using a URL and convert them to PDF files, making it easy to archive web content or generate reports from dynamic web pages.

Text, Image, and PDF Merging: IronPDF allows you to merge text, images, and existing PDF files into a single PDF document. This feature is particularly useful for creating complex documents with multiple sources of content.

PDF Manipulation: IronPDF provides tools for editing existing PDF files, such as adding or removing pages, modifying metadata, or even extracting text and images from PDF documents.

结论

In conclusion, generating and manipulating PDF files is a common requirement in many applications, and having the right tools at your disposal is crucial. The solutions provided in this article, such as using Puppeteer with NodeJS or IronPDF with .NET, offer powerful and efficient methods for converting HTML content and URLs into professional, high-quality PDF documents.

IronPDF, in particular, stands out with its extensive feature set, making it a top choice for .NET developers. IronPDF offers a free trial allowing you to explore its capabilities.

Users can also benefit from the Iron Suite package, a suite of five professional .NET libraries including IronXL, IronPDF, IronOCR and more.