A PowerPoint document Review pointed out that some fonts were incorrect. Check and correct one by one which fonts are different on slides of 100 pages or more. I didn't want to do it by visual inspection, and I wondered if I could do it efficiently because I might do the same work in the future.
python-pptx There's nothing you can't do with python in this day and age! After some research, I found that python-pptx can work with PowerPoint files from python. Official page: python-pptx
I touched on the official Getting Start a little, referring to the articles of the great pioneers of Qiita. Somehow I got an image, so I will Dump it below. (If you make a mistake, please point it out ...)
Presentation > slides[] > shapes[] > text_frame.paragraphs[] > runs[]
slide has the objects in that slide in the form of an array in shapes [].
Among shapes, those that can have characters (?) Can have text_frame.paragraphs []. I often get an error trying to access with shapes [n] .paragraphs [m] ...
You can change the font for each character, but I've always wondered how I have it. It seems that it has a unit called run, and you can set the font for each.
Dump all paragraphs and runs of all slides to csv. Set TARGET_FILE_PATH to the path to the target powerpoint file and OUTPUT_FILE_PATH to the path to the output csv file.
python-pptx.py
from pptx import Presentation
from pptx.util import Pt
import csv
TARGET_FILE_PATH = './targetFile/targetFile.pptx'
OUTPUT_FILE_PATH = './output.csv'
FONT_SIZE_DIVESER = 12700
# ["pptxFile name","slide number","object number","run number","Object type"," font.name"," font.size"," text"]
def export_slide_fonts_and_text():
outputArray = []
outputArray.append(["pptxFile name","slide number","object number","paragraph number","run number","Object type"," font.name"," font.size"," text"])
#Read ppt file
prs = Presentation(TARGET_FILE_PATH)
slide_number = 0
#Loading slides
for slide in prs.slides:
shape_number = 0
#Processed by shape
for shape in slide.shapes:
if not shape.has_text_frame:
shape_number = shape_number + 1
continue
paragraph_number = 0
#Output once at the stage of paragraph
for paragraph in shape.text_frame.paragraphs:
if(paragraph.font.size != None):
fontSize = paragraph.font.size/FONT_SIZE_DIVESER
else:
fontSize = None
outputArray.append([TARGET_FILE_PATH, slide_number, shape_number,paragraph_number, "-","paragraph", str(paragraph.font.name), str(fontSize), paragraph.text])
run_number = 0
#Output each run
for run in paragraph.runs:
if(run.font.size != None):
fontSize = run.font.size/FONT_SIZE_DIVESER
else:
fontSize = None
outputArray.append([TARGET_FILE_PATH, slide_number, shape_number,paragraph_number, run_number, "run", str(run.font.name), str(fontSize), run.text])
run_number = run_number + 1
paragraph_number = paragraph_number + 1
shape_number = shape_number + 1
slide_number = slide_number +1
#writing
with open(OUTPUT_FILE_PATH, 'w', encoding="shift-jis") as f:
wirter = csv.writer(f, lineterminator='\n', quoting=csv.QUOTE_ALL)
wirter.writerows(outputArray)
if __name__ == "__main__":
export_slide_fonts_and_text()
The following is what was output by csv, pasted on Excel and colored. Generally, I feel like I can do what I want to do!
ToBe
As you can see in Excel above, there are many places where the font name and font size are "None". If the value is not set, it will refer to the default value. I think I can get it by accessing the placeholder ... I would like to investigate a little more.
Thank you…. Overwhelming thanks ...! python-pptx Summary Automate reporting with python-pptx [Python] Generate report PowerPoint at explosive speed! Automatic report creation using Python [PowerPoint] [python-pptx] https://qiita.com/code_440/items/22e8539da465686496d3
Recommended Posts