For the time being, I zip up the image data scanned by the company, and when I take it home and unzip it, the characters are garbled for some reason ... What experience do you have? ** I have! (Half-gile) ** However, since only the unexpected serial number remains without being garbled, it is often the case that the order of the files is somehow understood. (Example æ–‡å—000.jpg, æ–‡ås001.jpg) If you can remove only this garbled part from the file name and form a beautiful serial number file, it will look good and it will be helpful for later work. Since it was about 100 sheets of data, it would be a hassle to modify it, so I decided to make a Python script lightly. By the way, I tried to make it possible to convert image data into one PDF with one touch.
At that time, I used ** Pillow ** for image processing and ** PyPDF2 ** for PDF conversion, so I would like to explain about that.
The script is listed on github. Please refer to the README for how to use the script.
It's a messy script because I wrote it in a hurry, but it consists of the following functions
I will briefly introduce it below. Please skip it because it is redundant content name2number(folderpass, digits, extension) From the images in the folder path with the image data received as an argument (only those specified by the extension) The file names are serialized from the non-garbled part (000 in æ–‡å—000.jpg). The os is used to rewrite the file name, and the re module is used to handle regular expressions. The return value is an array of file names that skipped processing.
changeNameHand(existfiles) The return value of name2number is an array containing the file name that is covered by the serial number and the file name rewriting is skipped. When this function receives it as an argument, it interactively listens to the file name in standard input / output and rewrites it. The os is used to rewrite the file name, and the re module is used to handle regular expressions. In addition, the Pillow (PIL) module is used to display the image for image confirmation.
addstr(before, after, digits, extension) This function attaches the character string before the argument before the serial number and the character string after the argument after the serial number. I use it when the serial number alone is not tasteful The os is used to rewrite the file name, and the re module is used to handle regular expressions.
image2pdf(filename, digits, extension, removeflug) Use Pillow and PyPDF2 to combine the image data in the current directory into a single PDF. I will explain in detail later
makeZip(filename, flug) Zip the data in the current directory. I'm using the Zipfile module.
main() A function to get command line arguments. I am using the argparse module to get command line arguments.
fileName2SerialNumber.py
def image2pdf(filename, digits, extension, removeflug):
u"""
Function to convert image file to PDF
"""
if (extension != ".jpg " and ".png " and ".gif"): #Play if not an image
print("Unsupported image files! jpg, png,Onacious with gif")
sys.exit(1)
fileWriter = PdfFileWriter()
files = os.listdir()
ext = re.compile(extension)
files.sort()
count = 1
removefiles = []
for file in files:
num = re.search('\\d{' + str(digits) +'}', file)
if (num == None):
pass
else:
if (ext.search(file) and num):
image = PIL.Image.open(file)
pdfFile = str(file).replace(extension, ".pdf")
image.save(pdfFile, "PDF", resolution = 100.0)
with open(pdfFile, "rb") as f:
fileReader = PdfFileReader(f, "rb")
pageNum = fileReader.getNumPages()
for i in range(pageNum):
fileWriter.addPage(fileReader.getPage(i))
print("%s%Write to page d" % (str(file), count))
count += 1
removefiles.append(pdfFile)
if (removeflug):
removefiles.append(file)
with open(filename, "wb") as outputs:
fileWriter.write(outputs)
print("-------------------------------------------------------")
print("Finished writing! file name%s \n" % filename)
for file in removefiles:
os.remove(file)
print("file name%Deleted s" % str(file))
print("--------------------------------------------------------")
print("the end! Deliverables: %s" % filename)
return None
Pillow is a module for image processing in Python. You can enter with PIP.
To open a file with an image
python
image = PIL.Image.open(file)
will do.
This time I want to save the image as PDF, so I will rewrite it to PDF. Therefore
python
image.save(image, "PDF", resolution = 100.0)
You can save the image in PDF format by doing. It is also possible to freely change the image quality by changing the "PDF" part to another image standard or setting the resolution to something other than 100. It seems that it is necessary to rewrite the extension of the image file to .pdf before reading the image file as a point when converting to PDF format. So in this script
python
pdfFile = str(file).replace(extension, ".pdf")
The extension is changed as.
Also, to display an image, which is not used in this function After opening an image with PIL.image.open ()
python
image.show()
It can be displayed with.
PyPDF2 is a convenient module that can combine multiple PDF files into one and extract text data from PDF. You can use PIP.
First, create an instance to write the file.
python
fileWriter = PdfFileWriter()
Also, the instance for reading the PDF file is
python
fileReader = PdfFileReader(open(pdffile, "rb"), "rb")
Create with. Since opening of the file is involved, use with
python
with open(pdfFile, "rb") as f:
fileReader = PdfFileReader(f, "rb")
I think that it is less troublesome to do.
To check the current total number of pages
python
pageNum = fileReader.getNumPages()
The number of pages is entered in pageNum.
To add a PDF page
python
fileWriter.addPage(fileReader.getPage(pageNumber))
will do. Returns the page object with the page number specified by getpage (number of pages). The page is added to the specified number by addingPage with the page object as an argument.
When the editing is completed, it is finally exported as a PDF file.
For export
python
fileWriter.write(open(filename, "wb")
You can write with. Since we use the open function, we use the with statement
python
with open(filename, "wb") as outputs:
fileWriter.write(outputs)
I think that it is less troublesome to do.
Use Pillow and PyPDF2 to improve work efficiency!
Recommended Posts