So I created a model which classifies emails into different categories, just like a spam filter. I deployed the model as a webservice, no problem with that but I can’t get my head around how I would use it to predicht the output category of a new email. How do I preprocess the new email (subject and message body) to match the input format of the model/webservice ? The model I trained has about 1000 features, corresponding to the 1000 most frequent words in the training dataset. Do I vectorize the new email ? Do I just search for the features/words in the new email ?
There is something obvious I’m missing, I think.
I used python, sklearn and pandas/numpy to preprocess and train the model