做網(wǎng)站怎么備案谷歌官方網(wǎng)站登錄入口
python -從文件夾批量提取pdf文章的第n頁(yè),并存儲(chǔ)起來(lái)
廢話不多說(shuō),看下面代碼
講解一下下面代碼
reader = PyPDF2.PdfReader (file)
將文件轉(zhuǎn)化為PdfReader 對(duì)象,方便使用內(nèi)置方法。
first_page = reader.pages[0]
提取第一頁(yè)
writer = PyPDF2.PdfWriter ()
writer.add_page (first_page)
writer.write(output_file)
將代碼寫(xiě)入對(duì)應(yīng)位置
def process_folder(folder_path):# 遍歷文件夾中的所有文件for filename in os.listdir(folder_path):if filename.endswith('.pdf'):pdf_path = os.path.join(folder_path, filename)print(pdf_path)output_path = os.path.join('D:\data\pdf_output', filename[0:-4]+'(首頁(yè))'+'.pdf')# 提取第一頁(yè)并保存為同名文件extract_first_page(pdf_path, output_path)print(f"Processed {filename}")
讀取某個(gè)文件夾下的所有pdf文件,并調(diào)用函數(shù)取出第一頁(yè),并寫(xiě)下來(lái)。
import os
import PyPDF2def extract_first_page(pdf_path, output_path):# 打開(kāi)PDF文件with open(pdf_path, 'rb') as file:reader = PyPDF2.PdfReader (file)# 獲取第一頁(yè)first_page = reader.pages[0]# 寫(xiě)入新PDF文件with open(output_path, 'wb') as output_file:writer = PyPDF2.PdfWriter ()writer.add_page (first_page)writer.write(output_file)def process_folder(folder_path):# 遍歷文件夾中的所有文件for filename in os.listdir(folder_path):if filename.endswith('.pdf'):pdf_path = os.path.join(folder_path, filename)print(pdf_path)output_path = os.path.join('D:\data\pdf_output', filename[0:-4]+'(首頁(yè))'+'.pdf')# 提取第一頁(yè)并保存為同名文件extract_first_page(pdf_path, output_path)print(f"Processed {filename}")# 指定你的文件夾路徑
folder_path = 'D:\data\pdf'
process_folder(folder_path)