PDF data extraction

I have managed to use tesseract OCR to read pdf files and extract specific data using REgex. Is there any possibility of adding machine learning capabilities in such a situation. If so, please give me some ideas?

TIA

What technologies are you using?