background

A PowerPoint document Review pointed out that some fonts were incorrect. Check and correct one by one which fonts are different on slides of 100 pages or more. I didn't want to do it by visual inspection, and I wondered if I could do it efficiently because I might do the same work in the future.

python-pptx There's nothing you can't do with python in this day and age! After some research, I found that python-pptx can work with PowerPoint files from python. Official page: python-pptx

Image of object

I touched on the official Getting Start a little, referring to the articles of the great pioneers of Qiita. Somehow I got an image, so I will Dump it below. (If you make a mistake, please point it out ...)

Overall bird's eye view

Presentation > slides[] > shapes[] > text_frame.paragraphs[] > runs[] スライド4.PNG

Correspondence between shapes [] and slides

slide has the objects in that slide in the form of an array in shapes []. スライド5.PNG

Correspondence between text_frame.paragraphs [] and slides

Among shapes, those that can have characters (?) Can have text_frame.paragraphs []. I often get an error trying to access with shapes [n] .paragraphs [m] ... スライド6.PNG

Correspondence between runs [] and slides

You can change the font for each character, but I've always wondered how I have it. It seems that it has a unit called run, and you can set the font for each. スライド7.PNG

What I made

Dump all paragraphs and runs of all slides to csv. Set TARGET_FILE_PATH to the path to the target powerpoint file and OUTPUT_FILE_PATH to the path to the output csv file.

`python-pptx.py`


from pptx import Presentation
from pptx.util import Pt
import csv

TARGET_FILE_PATH = './targetFile/targetFile.pptx'
OUTPUT_FILE_PATH = './output.csv'
FONT_SIZE_DIVESER = 12700
# ["pptxFile name","slide number","object number","run number","Object type"," font.name"," font.size"," text"]
def export_slide_fonts_and_text():
  outputArray = []
  outputArray.append(["pptxFile name","slide number","object number","paragraph number","run number","Object type"," font.name"," font.size"," text"])
  
  #Read ppt file
  prs = Presentation(TARGET_FILE_PATH)

  slide_number = 0
  #Loading slides
  for slide in prs.slides:
    shape_number = 0
    #Processed by shape
    for shape in slide.shapes:
      if not shape.has_text_frame:
        shape_number = shape_number + 1
        continue

      paragraph_number = 0
      #Output once at the stage of paragraph
      for paragraph in shape.text_frame.paragraphs:
        if(paragraph.font.size != None):
          fontSize = paragraph.font.size/FONT_SIZE_DIVESER
        else:
          fontSize = None
        outputArray.append([TARGET_FILE_PATH, slide_number, shape_number,paragraph_number, "-","paragraph", str(paragraph.font.name), str(fontSize), paragraph.text])

        run_number = 0
        #Output each run
        for run in paragraph.runs:
          if(run.font.size != None):
            fontSize = run.font.size/FONT_SIZE_DIVESER
          else:
            fontSize = None
          outputArray.append([TARGET_FILE_PATH, slide_number, shape_number,paragraph_number, run_number, "run", str(run.font.name), str(fontSize), run.text])

          run_number = run_number + 1
        paragraph_number = paragraph_number + 1
      shape_number = shape_number + 1
    slide_number = slide_number +1

  #writing
  with open(OUTPUT_FILE_PATH, 'w', encoding="shift-jis") as f:
    wirter = csv.writer(f, lineterminator='\n', quoting=csv.QUOTE_ALL)
    wirter.writerows(outputArray)

if __name__ == "__main__":
  export_slide_fonts_and_text()

Process the output a little

The following is what was output by csv, pasted on Excel and colored. Generally, I feel like I can do what I want to do!

ToBe

Get the default value

As you can see in Excel above, there are many places where the font name and font size are "None". If the value is not set, it will refer to the default value. I think I can get it by accessing the placeholder ... I would like to investigate a little more.

The pioneer who was allowed to refer

Thank you…. Overwhelming thanks ...! python-pptx Summary Automate reporting with python-pptx [Python] Generate report PowerPoint at explosive speed! Automatic report creation using Python [PowerPoint] [python-pptx] https://qiita.com/code_440/items/22e8539da465686496d3

[Python-pptx] Output PowerPoint font information to csv with python