Python2.7实现PDF转图片需求

xiaoxiao2025-07-09 15

文章目录

环境安装Mac环境Linux环境Windows环境代码实现注意事项

环境安装

PDF转IMG需要两个Python模块：PyPDF2(1.26.0)和Wand(0.4.4)，其中Wand安装前需要先安装软件ImageMagick，由于7.xx版本的接口改变，必须安装6.xx版本，以及GhostScript。PyPDF2和Wand可以直接使用pip安装，主要两个软件以及环境配置需要根据系统环境不同进行不同操作。由于自己开发用的Mac，生产环境覆盖了Linux和Windows，需要在这3种系统上进行环境安装，也是折腾了不少，以下简要列一下安装过程。

Mac环境

安装ImageMagick：

brew install imagemagick@6

安装完后可以用命令convert --version来测试

软链接：

$ ln -s /usr/local/Cellar/imagemagick@6/6.9.9-49/lib/libMagickWand-6.Q16.dylib /usr/local/lib/libMagickWand.dylib

添加至系统环境：

echo 'export PATH="/usr/local/opt/imagemagick@6/bin:$PATH"' >> ~/.bash_profile . ~/.bash_profile

安装GhostScript：

brew install gs

Linux环境

安装ImageMagick和GhostScript：

yum install ImageMagick

选择安装6.xx版本，由于依赖关系，会自动安装GhostScript

Windows环境

下载ImageMagick相关安装及配置下载安装GhostScript

windows环境下配合Wand0.4.4使用的时候，加载Wand后，python读取的环境变量Path变成了unicode类型，导致启动webdirver时会报“TypeError: environment can only contain strings”，可以在引入Wand后，将path修改回str类型

import os import wand os.environ['path'] = str(os.environ['path'])

代码实现

考虑PDF会有多页的情况，每一页PDF会生成单独的一张图片，如有需求，可以用PIL将多张图片进行合并。

import io from wand.image import Image from wand.color import Color from PyPDF2 import PdfFileReader, PdfFileWriter from PIL import Image as PIL_Image def pdf_to_img(pdf_path, resolution=200, img_suffix='jpeg'): """ PDF转图片 :param pdf_path: PDF路径 :param resolution: 分辨率 :param img_suffix: 图片后缀名 :return: List 图片路径列表 """ pdf_file = PdfFileReader(pdf_path, strict=False) pages = pdf_file.getNumPages() img_list = [] for page in range(pages): page_obj = pdf_file.getPage(page) dst_pdf = PdfFileWriter() dst_pdf.addPage(page_obj) pdf_bytes = io.BytesIO() dst_pdf.write(pdf_bytes) pdf_bytes.seek(0) img = Image(file=pdf_bytes, resolution=resolution) img.format = img_suffix img.compression_quality = 90 # 图片质量压缩 img.background_color = Color('white') img_path = pdf_path.replace('.pdf', '_{}.{}'.format(page, img_suffix))\ if page > 0 else pdf_path.replace('.pdf', '.{}'.format(img_suffix)) img.save(filename=img_path) img.destroy() img_list.append(img_path) if len(img_list) > 1: # 多图上下拼接 return _merge_img(img_list) elif len(img_list) == 0: # 异常情况，无图片生成 return '' else: return img_list[0] def _merge_img(img_list): """拼接图片""" if img_list: img_name = img_list[0] color_mod = 'RGBA' if img_name.endswith('.png') else 'RGB' # jpeg格式不支持RGBA first_img = PIL_Image.open(img_list[0]) height_size = first_img.size[1] total_width = first_img.size[0] total_height = height_size * len(img_list) left = 0 right = height_size target = PIL_Image.new(color_mod, (total_width, total_height)) # 最终拼接的图像的大小 for img in img_list: target.paste(PIL_Image.open(img), (0, left, total_width, right)) left += height_size right += height_size target.save(img_name, quality=100) return img_name else: return ''

注意事项

通过读PyPDF2的源码可以发现：

encrypt = self.trailer['/Encrypt'].getObject() if encrypt['/Filter'] != '/Standard': raise NotImplementedError("only Standard PDF encryption handler is available") if not (encrypt['/V'] in (1, 2)): raise NotImplementedError("only algorithm code 1 and 2 are supported")

如果PDF是有密码的，PyPDF2是支持输入密码的，但是仅限于其中两种密码算法，所以如果有密码需求的话，还需要测试下PyPDF2支不支持自己的PDF所用密码算法。

转载请注明原文地址: https://www.6miu.com/read-5032811.html

Java

最新回复(0)