Extract text from PowerPoint with Python! (Compatible with tables)

About this article

When sending the minutes of a meeting by e-mail, I sometimes wanted only the letters on the slide, so I wrote a little code. It was a little troublesome to extract the characters on the table.

Prepared PowerPoint sample file (put it in the same folder as sampleFile.pptx)

スクリーンショット 2020-12-29 195938.png

result

File name:  sampleFile.pptx 

-- Page 1 --
It's the title

Subtitle

-- Page 2 --
This is the second page.
It's a text box ♪

fruit,Greengrocer A,Super B, 
banana,100 yen,200 yen, 
Apple,150 yen,140 yen, 

Table sample

code


import pptx
from glob import glob

for fname in glob ('*.pptx'):
    print ('File name: ', fname, '\n')
    prs = pptx.Presentation(fname)

    for i, sld in enumerate(prs.slides, start=1):

        print(f'-- Page {i} --')

        for shp in sld.shapes:
            
            if shp.has_text_frame:
                print (shp.text)

            if shp.has_table:
                tbl = shp.table
                row_count = len(tbl.rows)
                col_count = len(tbl.columns)
                for r in range(0, row_count):                 
                    text=''
                    for c in range(0, col_count):
                        cell = tbl.cell(r,c)
                        paragraphs = cell.text_frame.paragraphs 
                        for paragraph in paragraphs:
                            for run in paragraph.runs:
                                text+=run.text
                            text+=', '
                    print (text)
            print ()

Extracts the text of all files with the pptx extension in the same folder.

reference

Scraping Powerpoint (pptx) table https://qiita.com/barobaro/items/a3a4a00aeda9d19e41b6

Method to extract text part from PDF / Word / PowerPoint / Excel file at once https://qiita.com/barobaro/items/a3a4a00aeda9d19e41b6

Recommended Posts

Extract text from PowerPoint with Python! (Compatible with tables)

Extract text from images in Python

Extract Japanese text from PDF with PDFMiner

[python] Extract text from pdf and read characters aloud with Open-Jtalk

Extract database tables with CSV [ODBC connection from R and python]

Extract lines that match the conditions from a text file with python

Install vim7.3 (+ python2.4) from source (compatible with Gundo.vim)

Wav file generation from numeric text with python

Extract data from a web page with Python

With skype, notify with skype from python!

Extract images and tables from pdf with python to reduce the burden of reporting

Extract template of EML file saved from Thunderbird with python3.7

[Python] Extract text data from XML data of 10GB or more.

Python: Extract file information from shared drive with Google Drive API

Extract the xz file with python

Call C from Python with DragonFFI

GOTO in Python with Sublime Text 3

Using Rstan from Python with PypeR

[Automation] Extract Outlook appointments with Python

Install Python from source with Ansible

Create folders from '01' to '12' with python

Text extraction with AWS Textract (Python3.6)

Text mining with Python ① Morphological analysis

Enable Python raw_input with Sublime Text 3

[Python] Join two tables with pandas

Run Aprili from Python with Orange

[Beginner] Extract character strings with Python

Speak Japanese text with OpenJTalk + python

Call python from nim with Nimpy

Master the type with Python [Python 3.9 compatible]

Read fbx from python with cinema4d

Extract strings from files in Python

Try to extract a character string from an image with Python3

Collecting information from Twitter with Python (Twitter API)

Receive textual data from mysql with python

Creating a simple PowerPoint file with Python

[Note] Get data from PostgreSQL with Python

Play audio files from Python with interrupts

Create wordcloud from your tweet with python3

Tweet from python with Twitter Developer + Tweepy

# 5 [python3] Extract characters from a character string

Business efficiency starting from scratch with Python

Decrypt files encrypted with openssl from python with openssl

Working with Azure CosmosDB from Python Part.2

Image acquisition from camera with Python + OpenCV

Document classification with toch text from PyTorch

[Python] Let's make matplotlib compatible with Japanese

Getting started with Dynamo from Python boto

Try calling Python from Ruby with thrift

Scraping from an authenticated site with python

Text mining with Python ② Visualization with Word Cloud

Manipulate BigQuery tables from a Python client

Use C ++ functions from python with pybind11

Load AWS-RDS / PostgreSQL tables with AWS-Lambda + Python

Read text in images with python OCR

[Python] (Line) Extract values from graph images

Collecting information from Twitter with Python (Environment construction)

Csv output from Google search with [Python]! 【Easy】

Python scraping Extract racing environment from horse racing site

Text extraction with GCP Cloud Vision API (Python3.6)

Image processing from scratch with python (5) Fourier transform