Continuation of disassembling PDF file for every page (https://qiita.com/nabe3ch/items/e1638544546ab1f4b18f).
A colleague who was asked by his boss to divide the daily PDF statistics (collected every month) into daily (two pages in one set) opens the PDF in Chrome. I was sorry that it was saved every 2 pages (date), so I made a program in Python that divides it into arbitrary pages, and went to lunch in the free time created by automation.
At that time, I was asked, "Can I name the file as a date?" I'm sorry I didn't feel like it. "Isn't it okay if I stop splitting in the first place?", But "I've done that so far, so I can't change it anymore ...". I'm wondering if there is a demand for a program to free the company from increasing niche, but I'll leave a memorandum.
Suppose there is a file called "test.pdf" that contains data from 2020/01/01 to 2020/02/29. Since 2 pages per day is one set, I wonder if it will be 120 pages in total. Break it down into two pages and save it according to the saved name format of a colleague such as "January 01, 2020.pdf", "January 02, 2020.pdf".
pdf_separate.py
import PyPDF2
from datetime import datetime
from datetime import timedelta
f = 'test.pdf' #Pdf file you want to disassemble
page_sep = 2 #How many pages to disassemble
start_day = '2020/01/01' #Date of first data
#Understanding the number of PDF pages
reader = PyPDF2.PdfFileReader(f)
page_num = reader.getNumPages()
#Convert first date for datetime
start_datetime = datetime.strptime(start_day, '%Y/%m/%d')
#Set the number of processes to zero
count = 0
#Total number of pages from "0 (1st page"" page_"Page" up to just before "num"_Derived numbers with "sep" (in 2 increments) and turn with for
for page in range(0, page_num, page_sep):
merger = PyPDF2.PdfFileMerger()
#Where is the beginning of the page to cut out?
start = page
#Where is the end of the page to cut out?
end = start + page_sep
#break the ice
merger.append(f, pages=(start,end))
#start date+processing time=Enter the process of creating a file name
file_date = start_datetime + timedelta(days=count)
#Set the date format to "January 01, 2020"
file_date_format = file_date.strftime('%Y{0}%m{1}%d{2}').format(*'date')
#Confirm the file name
file_name = file_date_format + '.pdf'
merger.write(file_name)
merger.close
#Counts the number of processes by 1.
count += 1
print ('Let's go drink tonight')
Aside from remembering that the comet (+ =) heads toward the Sun (east, left when viewed from north), sometimes forgetting that it was "+ =" or "= +" at the beginning. The created folder (the upper one is omitted) looks like this.
The monthly change from 1/31 to 2/1 was also converted properly. It's a leap year to have the Olympics this year. The file "February 29, 2020" was also created properly. Now I can go drinking with my colleagues tonight.
"Datetime" is used for date processing. It's a standard Python library, so you don't need to install it. It's a library that's too familiar, but I remember when I first started using Python, and I'll write it so that it's easy to understand for those who are new to it.
First, let's figure out what value is returned. For example, to get the current date
from datetime import datetime
now_date = datetime.now()
print (now_date)
#2020-02-01 11:09:38.124982
Feeling like that. If you want to use the date format of the file name saved by your colleague (eg February 01, 2020), there are various methods and you may have different preferences, but pass the date to ".strftime" and I like to define it as ".format".
from datetime import datetime
now_date = datetime.now()
format_now_date = now_date.strftime('%Y{0}%m{1}%d{2}').format(*'date')
print(format_now_date)
#February 02, 2020
If you want to make "2020/02/01" or "2020-02-01" based on this,
format_now_date = now_date.strftime('%Y{0}%m{1}%d').format(*'//')
format_now_date = now_date.strftime('%Y{0}%m{1}%d').format(*'--')
And. Don't forget to delete "{2}" in "% d {2}".
The first niche program is a mechanism to generate a file name by adding the number of processes (count) to the specified start date (start_day). In other words, the date assigned to the first cut out file is "2020/01/01 + 0", which is January 01, 2020, and the next is "2020/01/01 + 1", which is January 02, 2020. Guai.
Not limited to Python, programming can be managed with basic words and mechanisms, but when adding or subtracting from the reference date, the object "timedelta" is used. Thank you for considering the monthly change and leap year. For example, one day after the base date "start_date",
file_date = start_date + timedelta(days= 1)
And. One day ago, replace "1" with "-1". After one week, set "weeks = 1". That means "years = 1" one year later and "months = 1" one month later, which reminds me of the sweet and sour memories I had when I first started using Python. Not so simple, up to 7 types.
Example | meaning |
---|---|
weeks=10 | 10 weeks later |
days=10 | 10 days later |
hours=10 | 10 hours later |
minutes=10 | 10 minutes later |
seconds=10 | 10 seconds later |
milliseconds=10 | 10 ms |
microseconds=10 | After 10 microseconds |
I'm not doing such a delicate job as using milliseconds or microseconds, so I'd like you to prepare years and months.
I thought that count should be awkward, and I should generate a multidimensional array that sets the start page to cut out the file and the date. It's a hassle, and I want to improve the program later, but when a person who is new to Python appears, I think this is easier to read. Such a strange person will not appear.
Recommended Posts