I wanted to pull an email with a specific label from Gmail, so I made a memo in Python.
Connecting and labeling is simple.
import imaplib, re, email, six, dateutil.parser
email_default_encoding = 'iso-2022-jp'
def main():
gmail = imaplib.IMAP4_SSL("imap.gmail.com")
gmail.login("user","password")
gmail.select('INBOX') #Specify your inbox
gmail.select('register') #Specify the label
Get the mail with .search ()
.
If you specify ALL, you can get all unread items in UNSEEN.
For other settings, be sure to look at the IMAP4 manual. Although it is in English.
INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1
typ, [data] = gmail.search(None, "(UNSEEN)")
#typ, [data] = gmail.search(None, "(ALL)")
#Verification
if typ == "OK":
if data != '':
print("New Mail")
else:
print("Non")
#Processing of acquired mail list
for num in data.split():
###Processing to each email###
#Clean up
gmail.close()
gmail.logout()
The middle part is just checking if it was received. After that, write the processing for each mail, and it is the end flow. From the next, we will get the sender, title, and text in the mail processing part.
Since only the id of the target email can be obtained by .search ()
, the entire email with the id specified by .fetch ()
can be accessed.
First you need to get the character code of the email.
After parsing the mail using ʻemail.message_from_string, access the title part, and if the character code is specified, set it, otherwise set the default value ʻiso2022-jp
.
After that, I'm decoding with that character code again, but I think there seems to be a better way to write it here ...
for num in data.split():
###Processing to each email###
result, d = gmail.fetch(num, "(RFC822)")
raw_email = d[0][1]
#For character code acquisition
msg = email.message_from_string(raw_email.decode('utf-8'))
msg_encoding = email.header.decode_header(msg.get('Subject'))[0][1] or 'iso-2022-jp'
#Parse and prepare for analysis
msg = email.message_from_string(raw_email.decode(msg_encoding))
print(msg.keys())
You can check the items that can be obtained here with msg.keys ()
.
['Delivered-To', 'Received', 'X-Received', 'Return-Path', 'Received', 'Received-SPF', 'Authentication-Results', 'DKIM-Signature', 'Subject', 'From', 'To', 'Errors-To', 'MIME-Version', 'Date', 'X-Mailer', 'X-Priority', 'Content-Type', 'Message-ID', 'X-Antivirus', 'X-Antivirus-Status']
Get the sender, but have a hard time here.
fromObj = email.header.decode_header(msg.get('From'))
addr = ""
for f in fromObj:
if isinstance(f[0],bytes):
addr += f[0].decode(msg_encoding)
else:
addr += f[0]
print(addr)
If you parse something like "Sender <[email protected]>
"
fromObj [0] [0]: b'xxxxxxxxx' ・ ・ ・ Encoded "From" fromObj [0] [1]:'iso-2022-jp' ・ ・ ・ Character code fromObj [1] [0]: b'[email protected]' ・ ・ ・ Address part fromObj [1] [1]: None ・ ・ ・ No character code because only alphanumeric characters
It seems that you can get it in the format.
Also, if there is no Japanese notation, it will not be decomposed if it is like "Sashidashi <[email protected]>
"
fromObj[0][0] : 'Sashidashi<[email protected]>'
fromObj[0][1] : None
It becomes str type in the form of. So, I get the whole by loop + type judgment.
I did the same for the title and it worked.
subject = email.header.decode_header(msg.get('Subject'))
title = ""
for sub in subject:
if isinstance(sub[0],bytes):
title += sub[0].decode(msg_encoding)
else:
title += sub[0]
print(title)
In addition, the sender part may not be decoded well when Japanese is included in some emails. I will summarize it in another article.
Get the date and change the format.
If you want the yyyyMMdd format, it's easy to use dateutil
.
date = dateutil.parser.parse(msg.get('Date')).strftime("%Y/%m/%d %H:%M:%S")
print(date)
The acquisition of the text also has a branch.
You can get it with .get_payload ()
, but in the case of mail sent in html format, both text and html are also obtained, so the text / plain one is taken out.
body = ""
if msg.is_multipart():
for payload in msg.get_payload():
if payload.get_content_type() == "text/plain":
body = payload.get_payload()
else:
if msg.get_content_type() == "text/plain":
body = msg.get_payload()
I had a hard time not knowing how to set and delete labels.
#Unread
gmail.store(num, '-FLAGS','\\SEEN')
#Add label
gmail.store(num, '+X-GM-LABELS','added')
#Remove label
gmail.store(num, '-X-GM-LABELS','added')
It seems that it can be done by specifying "X-GM-LABELS". "+" To add, "-" to delete. Source: Gmail IMAP Extensions
Including those mentioned above INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1 Gmail IMAP Extensions IMAP4 (Internet Mail Access Protocol version 4) -Part 1 IMAP4 (Internet Mail Access Protocol version 4) -Part 2 email — Package for Email and MIME Processing
No, it's difficult if you don't understand imap4 properly.
Recommended Posts