I wanted to pull an email with a specific label from Gmail, so I made a memo in Python.

environment

Python3.2.5 --Use imaplib to connect to Gmail --Verified on Windows version

Login and label

Connecting and labeling is simple.

import imaplib, re, email, six, dateutil.parser
email_default_encoding = 'iso-2022-jp'

def main():
    gmail = imaplib.IMAP4_SSL("imap.gmail.com")
    gmail.login("user","password")
    gmail.select('INBOX') #Specify your inbox
    gmail.select('register') #Specify the label

Get email

Get the mail with .search (). If you specify ALL, you can get all unread items in UNSEEN. For other settings, be sure to look at the IMAP4 manual. Although it is in English. INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1

    typ, [data] = gmail.search(None, "(UNSEEN)")
    #typ, [data] = gmail.search(None, "(ALL)")
    
    #Verification
    if typ == "OK":
        if data != '':
            print("New Mail")
        else:
            print("Non")
            
    #Processing of acquired mail list
    for num in data.split():
        ###Processing to each email###
    
    #Clean up
    gmail.close()
    gmail.logout()

The middle part is just checking if it was received. After that, write the processing for each mail, and it is the end flow. From the next, we will get the sender, title, and text in the mail processing part.

Get email content

Character code acquisition and perspective

Since only the id of the target email can be obtained by .search (), the entire email with the id specified by .fetch () can be accessed. First you need to get the character code of the email. After parsing the mail using ʻemail.message_from_string, access the title part, and if the character code is specified, set it, otherwise set the default value ʻiso2022-jp. After that, I'm decoding with that character code again, but I think there seems to be a better way to write it here ...

    for num in data.split():
        ###Processing to each email###
        result, d = gmail.fetch(num, "(RFC822)")
        raw_email = d[0][1]

        #For character code acquisition
        msg = email.message_from_string(raw_email.decode('utf-8'))
        msg_encoding = email.header.decode_header(msg.get('Subject'))[0][1] or 'iso-2022-jp'
        #Parse and prepare for analysis
        msg = email.message_from_string(raw_email.decode(msg_encoding))

        print(msg.keys())

You can check the items that can be obtained here with msg.keys ().

['Delivered-To', 'Received', 'X-Received', 'Return-Path', 'Received', 'Received-SPF', 'Authentication-Results', 'DKIM-Signature', 'Subject', 'From', 'To', 'Errors-To', 'MIME-Version', 'Date', 'X-Mailer', 'X-Priority', 'Content-Type', 'Message-ID', 'X-Antivirus', 'X-Antivirus-Status']

Get sender / title

Get the sender, but have a hard time here.

        fromObj = email.header.decode_header(msg.get('From'))
        addr = ""
        for f in fromObj:
            if isinstance(f[0],bytes):
                addr += f[0].decode(msg_encoding)
            else:
                addr += f[0]
        print(addr)

If you parse something like "Sender <[email protected]>"

fromObj [0] [0]: b'xxxxxxxxx' ・・・ Encoded "From" fromObj [0] [1]:'iso-2022-jp' ・・・ Character code fromObj [1] [0]: b'[email protected]' ・・・ Address part fromObj [1] [1]: None ・・・ No character code because only alphanumeric characters

It seems that you can get it in the format. Also, if there is no Japanese notation, it will not be decomposed if it is like "Sashidashi <[email protected]>"

fromObj[0][0] : 'Sashidashi<[email protected]>'
fromObj[0][1] : None

It becomes str type in the form of. So, I get the whole by loop + type judgment.

I did the same for the title and it worked.

        subject = email.header.decode_header(msg.get('Subject'))
        title = ""
        for sub in subject:
            if isinstance(sub[0],bytes):
                title += sub[0].decode(msg_encoding)
            else:
                title += sub[0]
        print(title)

In addition, the sender part may not be decoded well when Japanese is included in some emails. I will summarize it in another article.

Addition I summarized it. ⇒ What to do when the sender name is not decoded occasionally in the Python email library

Get date / body

Get the date and change the format. If you want the yyyyMMdd format, it's easy to use dateutil.

        date = dateutil.parser.parse(msg.get('Date')).strftime("%Y/%m/%d %H:%M:%S")
        print(date)

The acquisition of the text also has a branch. You can get it with .get_payload (), but in the case of mail sent in html format, both text and html are also obtained, so the text / plain one is taken out.

        body = ""
        if msg.is_multipart():
            for payload in msg.get_payload():
                if payload.get_content_type() == "text/plain":
                    body = payload.get_payload()
        else:
            if msg.get_content_type() == "text/plain":
                body = msg.get_payload()

Setting and deleting labels

I had a hard time not knowing how to set and delete labels.

        #Unread
        gmail.store(num, '-FLAGS','\\SEEN')

        #Add label
        gmail.store(num, '+X-GM-LABELS','added')
        #Remove label
        gmail.store(num, '-X-GM-LABELS','added')

It seems that it can be done by specifying "X-GM-LABELS". "+" To add, "-" to delete. Source: Gmail IMAP Extensions

reference

Including those mentioned above INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4rev1 Gmail IMAP Extensions IMAP4 (Internet Mail Access Protocol version 4) -Part 1 IMAP4 (Internet Mail Access Protocol version 4) -Part 2 email — Package for Email and MIME Processing

No, it's difficult if you don't understand imap4 properly.

Get mail from Gmail and label it with Python3