This is a memo of a script that uses PyPDF2 to divide a pdf file such as a document scanned in a spread into two left and right.
from PyPDF2 import PdfFileWriter, PdfFileReader
import copy
def split_pdf(input_pdf, output_filename):
pdf_file = PdfFileReader(open(input_pdf, 'rb'))
marged = PdfFileWriter()
page_num = pdf_file.getNumPages()
for i in range(0, page_num):
page_L = pdf_file.getPage(i)
page_R = copy.copy(page_L)
(w, h) = page_L.mediaBox.upperRight
page_L.mediaBox.upperRight = (w/2, h)
marged.addPage(page_L)
page_R.mediaBox.upperLeft = (w/2, h)
marged.addPage(page_R)
outputStream = open(output_filename, "wb")
marged.write(outputStream)
outputStream.close()
def sample():
input_pdf = "input.pdf"
output_filename = "output.pdf"
split_pdf(input_pdf, output_filename)
if __name__ == "__main__":
sample()
Divide the pdf file into two by giving the pdf file to be split and the file name for output to split_pdf ().
Prepare two pages, one for the left page and one for the right page. (Page_L, page_R)
Since the page starts from the lower left, the width and height are obtained from the coordinates on the upper right. (w, h)
By specifying the coordinates on the right side for the left page and the coordinates on the left side for the right page, the coordinates are cut based on that.
This time, the upper right coordinate is specified as (w / 2, h) on the left page, and the upper left coordinate is specified as (w / 2, h) on the right page, and it is divided into two parts on the left and right.
It may be easier to understand if you set page_R.mediaBox.lowerLeft = (w / 2, 0)
for the right page.
I use it as follows. Write only the additions and changes.
import sys, os
def main():
argv = [sys.argv[i] for i in range(1,len(sys.argv))]
for input_pdf in argv:
basename_without_ext = os.path.splitext(os.path.basename(input_pdf))[0]
output_filename = "split_{}.pdf".format(basename_without_ext)
print("{}Is divided into two.".format(os.path.basename(input_pdf)))
split_pdf(input_pdf, output_filename)
if __name__ == '__main__':
main()
Get the argument, turn it with a for statement and pass it to split_pdf. The output file name will be the input file name with split_ at the beginning.
Split the PDF of the scanned center-bound material into pages (Qiita: @hrb23m)
[How to divide a spread scan PDF with misalignment into two pages (Python, pypdf) (Technical memo collection)](https://www.robotech-note.com/entry/2018/04/28/ Misalignment of center Split the spread scan PDF into two pages)
Recommended Posts