I posted a post Convert PDF attached to email to text format before, and it seems that it was quite popular.
I will write about the modified version.
In 18.4. mailbox — Mailbox operations in various formats
Great care must be taken when updating mailboxes that may be changed at the same time by some other process. The safest mailbox format for doing such tasks is Maildir, and try to avoid using a single file format like mbox for concurrent writes.
And
There are several variations of the mbox format, each claiming to overcome the shortcomings of the original format.
Because it was written like this, in PDF attached to email ..., it was read as Maildir format by Cygwin's procmail / fetchmail. The mbox format handled by Thunderbird can be read by Python's mailbox library (albeit in some dialects), so I'm using the official Python.org binary.
In Thunderbidr, the directory for saving mail is generated with 8 random characters, so it is necessary to rewrite that area.
#coding:utf-8
import os
import os.path
import sys
import email
import mailbox
import mimetypes
maildir =
'C:\\Users\\t.uehara\\AppData\\Roaming\\Thunderbird\\Profiles\\abcdef12.default\\Mail\\pop.example.com\\Inbox'
tempdir = 'C:\\Users\\t.uehara\\Downloads\\'
def extractMime(message):
for part in message.walk():
if part.get_content_maintype() == 'multipart':
continue
fname = part.get_filename()
if fname != None:
if fname.find(".zip") != -1:
zipname = tempdir+fname
#If the Zip file exists, skip the process
if os.path.isfile(zipname) == False:
fp = open(zipname, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
print zipname
if __name__ == '__main__':
for message in mailbox.mbox(maildir):
extractMime(message)
In addition to saving the Zip file, you can also read the compressed log file and perform appropriate processing, but since the source code is a bit crowded, I will add it if there is a request in the comments etc. ..
Recommended Posts