Date the name of the decomposed PDF for each page

In the first place

Continuation of disassembling PDF file for every page (https://qiita.com/nabe3ch/items/e1638544546ab1f4b18f).

A colleague who was asked by his boss to divide the daily PDF statistics (collected every month) into daily (two pages in one set) opens the PDF in Chrome. I was sorry that it was saved every 2 pages (date), so I made a program in Python that divides it into arbitrary pages, and went to lunch in the free time created by automation.

At that time, I was asked, "Can I name the file as a date?" I'm sorry I didn't feel like it. "Isn't it okay if I stop splitting in the first place?", But "I've done that so far, so I can't change it anymore ...". I'm wondering if there is a demand for a program to free the company from increasing niche, but I'll leave a memorandum.

I changed it to something like this

Suppose there is a file called "test.pdf" that contains data from 2020/01/01 to 2020/02/29. Since 2 pages per day is one set, I wonder if it will be 120 pages in total. Break it down into two pages and save it according to the saved name format of a colleague such as "January 01, 2020.pdf", "January 02, 2020.pdf".

pdf_separate.py


import PyPDF2
from datetime import datetime
from datetime import timedelta

f = 'test.pdf' #Pdf file you want to disassemble
page_sep = 2 #How many pages to disassemble
start_day = '2020/01/01' #Date of first data

#Understanding the number of PDF pages
reader = PyPDF2.PdfFileReader(f)
page_num = reader.getNumPages()

#Convert first date for datetime
start_datetime = datetime.strptime(start_day, '%Y/%m/%d')

#Set the number of processes to zero
count = 0

#Total number of pages from "0 (1st page"" page_"Page" up to just before "num"_Derived numbers with "sep" (in 2 increments) and turn with for
for page in range(0, page_num, page_sep):
    merger = PyPDF2.PdfFileMerger()
    #Where is the beginning of the page to cut out?
    start = page
    #Where is the end of the page to cut out?
    end = start + page_sep
    #break the ice
    merger.append(f, pages=(start,end))

    #start date+processing time=Enter the process of creating a file name
    file_date = start_datetime + timedelta(days=count)
    #Set the date format to "January 01, 2020"
    file_date_format = file_date.strftime('%Y{0}%m{1}%d{2}').format(*'date')
    #Confirm the file name
    file_name = file_date_format + '.pdf'
    merger.write(file_name)
    merger.close
    #Counts the number of processes by 1.
    count += 1

print ('Let's go drink tonight')

Aside from remembering that the comet (+ =) heads toward the Sun (east, left when viewed from north), sometimes forgetting that it was "+ =" or "= +" at the beginning. The created folder (the upper one is omitted) looks like this. pdf_sep.png

The monthly change from 1/31 to 2/1 was also converted properly. It's a leap year to have the Olympics this year. The file "February 29, 2020" was also created properly. Now I can go drinking with my colleagues tonight.

use datetime

"Datetime" is used for date processing. It's a standard Python library, so you don't need to install it. It's a library that's too familiar, but I remember when I first started using Python, and I'll write it so that it's easy to understand for those who are new to it.

First, let's figure out what value is returned. For example, to get the current date

from datetime import datetime

now_date = datetime.now()
print (now_date)
#2020-02-01 11:09:38.124982

Feeling like that. If you want to use the date format of the file name saved by your colleague (eg February 01, 2020), there are various methods and you may have different preferences, but pass the date to ".strftime" and I like to define it as ".format".


from datetime import datetime

now_date = datetime.now()
format_now_date = now_date.strftime('%Y{0}%m{1}%d{2}').format(*'date')

print(format_now_date)
#February 02, 2020

If you want to make "2020/02/01" or "2020-02-01" based on this,

format_now_date = now_date.strftime('%Y{0}%m{1}%d').format(*'//')
format_now_date = now_date.strftime('%Y{0}%m{1}%d').format(*'--')

And. Don't forget to delete "{2}" in "% d {2}".

Add or subtract

The first niche program is a mechanism to generate a file name by adding the number of processes (count) to the specified start date (start_day). In other words, the date assigned to the first cut out file is "2020/01/01 + 0", which is January 01, 2020, and the next is "2020/01/01 + 1", which is January 02, 2020. Guai.

Not limited to Python, programming can be managed with basic words and mechanisms, but when adding or subtracting from the reference date, the object "timedelta" is used. Thank you for considering the monthly change and leap year. For example, one day after the base date "start_date",

file_date = start_date + timedelta(days= 1)

And. One day ago, replace "1" with "-1". After one week, set "weeks = 1". That means "years = 1" one year later and "months = 1" one month later, which reminds me of the sweet and sour memories I had when I first started using Python. Not so simple, up to 7 types.

Example meaning
weeks=10 10 weeks later
days=10 10 days later
hours=10 10 hours later
minutes=10 10 minutes later
seconds=10 10 seconds later
milliseconds=10 10 ms
microseconds=10 After 10 microseconds

I'm not doing such a delicate job as using milliseconds or microseconds, so I'd like you to prepare years and months.

I thought that count should be awkward, and I should generate a multidimensional array that sets the start page to cut out the file and the date. It's a hassle, and I want to improve the program later, but when a person who is new to Python appears, I think this is easier to read. Such a strange person will not appear.

Recommended Posts

Date the name of the decomposed PDF for each page
Check the operation of Python for .NET in each environment
Get the number of occurrences for each element in the list
Check the increase / decrease of Bitcoin for each address from the blockchain
python note: map -do the same for each element of the list
I analyzed the voting results of the Osaka Metropolis Plan for each ward
Get the number of visits to each page with ReportingAPI + Cloud Functions
The third night of the loop with for
Pandas of the beginner, by the beginner, for the beginner [Python]
The second night of the loop with for
Set the process name of the Python program
[IOS] Change the display time for each frame of GIF animation in Pythonista3.
[Bash] While read, pass the contents of the file to variables for each column
[Python] I tried substituting the function name for the function name
The story of low learning costs for Python
Match the distribution of each group in Python
Check the date of the flag duty with Python
Image processing? The story of starting Python for
Rewrite the name of the namespaced tag with lxml
[Python] Get the last updated date of the website
Code for checking the operation of Python Matplotlib
Get the update date of the Python memo file.
Output the specified table of Oracle database in Python to Excel for each file
Check the processing time and the number of calls for each process in python (cProfile)
Save the output of conditional GAN for each class ~ With cGAN implementation by PyTorch ~