How to Parse MSG Files in Python

How to Parse MSG Files in Python

aspose-email-foss for Python provides a pure-Python API for reading Outlook MSG files without Microsoft Office dependencies. Load a file into a MapiMessage object to access all message data.

Step-by-Step Guide

Step 1: Install the Package

pip install aspose-email-foss

Requires Python 3.10 or later.


Step 2: Import the MapiMessage Class

from aspose.email_foss.msg import MapiMessage

Step 3: Load an MSG File

msg = MapiMessage.from_file("message.msg")

For lenient parsing of malformed files, pass strict=False:

msg = MapiMessage.from_file("message.msg", strict=False)

Step 4: Access Message Properties

print(f"Subject: {msg.subject}")
print(f"Body: {msg.body}")
print(f"HTML Body: {msg.body_html[:200] if msg.body_html else 'None'}")
print(f"Message Class: {msg.message_class}")

Step 5: List Attachments

for att in msg.iter_attachments_info():
    name = att.storage_name
    is_embedded = att.is_embedded_message
    print(f"Attachment: {name}, embedded={is_embedded}")

Step 6: Inspect Low-Level CFB Structure

from aspose.email_foss.cfb import CFBReader

reader = CFBReader.from_file("message.msg")
print(f"Directory entries: {reader.directory_entry_count}")
for entry in reader.iter_streams():
    print(f"  Stream: {entry}")
reader.close()

Common Issues and Fixes

CFBError when loading

The file is not a valid CFB container. Verify it is an actual Outlook MSG file, not an EML.

Body is empty but HTML body has content

Some messages store content only in HTML. Check msg.body_html when msg.body returns None.

Validation warnings

Access msg.validation_issues to see a tuple of compliance warnings for the loaded file.


Frequently Asked Questions (FAQ)

Can I read EML files?

Not directly. The library handles MSG (CFB) format. Convert EML content to an EmailMessage object first, then use MapiMessage.from_email_message().

Does loading read all attachment data into memory?

Yes. All attachment data including binary content is loaded into memory when MapiMessage.from_file() completes. iter_attachments_info() is a convenience iterator over the already-loaded attachments list.

Is it thread-safe?

Each MapiMessage instance is independent. Concurrent reads from separate instances are safe.

 English