When sending the minutes of a meeting by e-mail, I sometimes wanted only the letters on the slide, so I wrote a little code. It was a little troublesome to extract the characters on the table.
File name: sampleFile.pptx
-- Page 1 --
It's the title
Subtitle
-- Page 2 --
This is the second page.
It's a text box ♪
fruit,Greengrocer A,Super B,
banana,100 yen,200 yen,
Apple,150 yen,140 yen,
Table sample
import pptx
from glob import glob
for fname in glob ('*.pptx'):
print ('File name: ', fname, '\n')
prs = pptx.Presentation(fname)
for i, sld in enumerate(prs.slides, start=1):
print(f'-- Page {i} --')
for shp in sld.shapes:
if shp.has_text_frame:
print (shp.text)
if shp.has_table:
tbl = shp.table
row_count = len(tbl.rows)
col_count = len(tbl.columns)
for r in range(0, row_count):
text=''
for c in range(0, col_count):
cell = tbl.cell(r,c)
paragraphs = cell.text_frame.paragraphs
for paragraph in paragraphs:
for run in paragraph.runs:
text+=run.text
text+=', '
print (text)
print ()
Extracts the text of all files with the pptx extension in the same folder.
Scraping Powerpoint (pptx) table https://qiita.com/barobaro/items/a3a4a00aeda9d19e41b6
Method to extract text part from PDF / Word / PowerPoint / Excel file at once https://qiita.com/barobaro/items/a3a4a00aeda9d19e41b6
Recommended Posts