Email Classification Tool Using Python

Hey Everyone,

I am not sure if I am posting in the correct category.

I’m not looking for help with a piece of code but rather some advice with regards to an automation project. My department at work receives daily emails from clients to provide us with market values for funds which we manage. These emails have different types of attachments, some are excel, pdf and text files whereas some of the emails include the market value information in the body of the email. These emails are received to a group mailbox which receives a vast number of other emails which are not relevant to this task.

My question is - is there a reliable way to build a machine learning algorithm or any other sorting tool to sort these specific market value emails and consequently extract these market values to an excel spreadsheet as opposed to manually sorting through these emails everyday? I am merely looking for someone to point me in the appropriate direction, as I am new to using Python. Any feedback would be greatly appreciated!

Please advise if I am being too vague or if more information is required.


Hi @duke7807 ,

Interesting… How are you planning on reading the mails? Directly trough IMAP or via an HTTP API?

While there are definitely libraries to handle PDF, Excel and other file formats (just type “excel” into PyPi for example) it will be a challenge to correctly extract the values from the attachments.

If you’ve found a way to extract the market values from attachments, then it should be quite easy to make a distinction between marketing mails and other mails.



Thanks for the reply! I was actually planning to sort the mails into a folder using Outlook rules and then proceed to use Python to read the mails and extract necessary information. Do you have any reccomendations on which Python libraries I could use to read mails and auto download attachments?


Oops, sorry for the late reply! Actually, this article suggests that is feasible: