Sample for handling eml files in Python

If you use Python's email package, the eml file that can save emails is just the standard library. It's easy to analyze.

Get attachments, subject, body, etc. from eml file I made a class.


# coding:utf-8
"""
Get data based on eml file for easy handling
sample.

There may be omissions in consideration due to the minimum implementation. .. ..

"""
import sys
import email
from email.header import decode_header

class MailParser(object):
    """
A class that takes the path of a mail file and parses it
    """

    def __init__(self, mail_file_path):
        self.mail_file_path = mail_file_path
        #email from eml file.message.Get a Message instance
        with open(mail_file_path, 'rb') as email_file:
            self.email_message = email.message_from_bytes(email_file.read())
        self.subject = None
        self.to_address = None
        self.cc_address = None
        self.from_address = None
        self.body = ""
        #Attachment related information
        # {name: file_name, data: data}
        self.attach_file_list = []
        #Interpretation of eml
        self._parse()

    def get_attr_data(self):
        """
Get email data
        """
        result = """\
FROM: {}
TO: {}
CC: {}
-----------------------
BODY:
{}
-----------------------
ATTACH_FILE_NAME:
{}
""".format(
            self.from_address,
            self.to_address,
            self.cc_address,
            self.body,
            ",".join([ x["name"] for x in self.attach_file_list])
        )
        return result


    def _parse(self):
        """
Parsing mail files
        __init__Calling in
        """
        self.subject = self._get_decoded_header("Subject")
        self.to_address = self._get_decoded_header("To")
        self.cc_address = self._get_decoded_header("Cc")
        self.from_address = self._get_decoded_header("From")

        #Processing of message body part
        for part in self.email_message.walk():
            #If the ContentType is multipart, the actual content is even more
            #Since it is in the inside part, skip it
            if part.get_content_maintype() == 'multipart':
                continue
            #Get file name
            attach_fname = part.get_filename()
            #Should be the body if there is no file name
            if not attach_fname:
                charset = str(part.get_content_charset())
                if charset:
                    self.body += part.get_payload(decode=True).decode(charset, errors="replace")
                else:
                    self.body += part.get_payload(decode=True)
            else:
                #If there is a file name, it's an attachment
                #Get the data
                self.attach_file_list.append({
                    "name": attach_fname,
                    "data": part.get_payload(decode=True)
                })

    def _get_decoded_header(self, key_name):
        """
Get the decoded result from the header object
        """
        ret = ""

        #Keys that do not have the corresponding item return an empty string
        raw_obj = self.email_message.get(key_name)
        if raw_obj is None:
            return ""
        #Make the decoded result unicode
        for fragment, encoding in decode_header(raw_obj):
            if not hasattr(fragment, "decode"):
                ret += fragment
                continue
            #UTF for the time being without encode-Decode with 8
            if encoding:
                ret += fragment.decode(encoding)
            else:
                ret += fragment.decode("UTF-8")
        return ret

if __name__ == "__main__":
    result = MailParser(sys.argv[1]).get_attr_data()
    print(result)

For the time being, the expected results have been obtained. I hope it will be helpful in handling emails.

Recommended Posts

Sample for handling eml files in Python
Handling of JSON files in Python
Type annotations for Python2 in stub files!
Handling json in python
Hexadecimal handling in Python 3
Specific sample code for working with SQLite3 in Python
Search for strings in Python
Techniques for sorting in Python
[Python] Sample code for Python grammar
Recursively search for files and directories in Python and output
Relative url handling in python
About "for _ in range ():" in python
Transpose CSV files in Python Part 1
Check for memory leaks in Python
Manipulate files and folders in Python
Download Google Drive files in Python
Handling timezones in Python (datetime, pytz)
Read files in parallel with Python
Export and output files in Python
Run unittests in Python (for beginners)
Extract strings from files in Python
Google Cloud Vision API sample for python
Find files like find on linux in Python
Output tree structure of files in Python
Notes on nfc.ContactlessFrontend () for nfcpy in python
Inject is recommended for DDD in Python
Tips for dealing with binaries in Python
Sample script to trap signals in Python
Summary of various for statements in Python
Template for writing batch scripts in python
Automate jobs by manipulating files in Python
Process multiple lists with for in Python
MongoDB for the first time in Python
Read and write JSON files in Python
Get a token for conoha in python
AtCoder cheat sheet in python (for myself)
I searched for prime numbers in python
Notes for using python (pydev) in eclipse
Tips for making small tools in python
Use pathlib in Maya (Python 2.7) for upcoming Python 3.7
Download files in any format using Python
Module import and exception handling in python
Template for creating command line applications in Python
Quadtree in Python --2
Python in optimization
python [for myself]
CURL in python
CERTIFICATE_VERIFY_FAILED in Python 3.6, the official installer for macOS
++ and-cannot be used for increment / decrement in python
Geocoding in python
SendKeys in Python
[Python] Get the files in a folder with Python
Meta-analysis in Python
Unittest in python
Import-linter was useful for layered architecture in Python
Summary of how to import files in Python 3
Resolve Japanese write error UnicodeEncodeError in Python files
Python closure sample
Epoch in Python
Discord in Python
Add quotation marks ">" for replying emails in Python3