i’ve got an idea but i dont really know how to realize it. At first let me introduce you the idea/project.
I just want to extract structured data out of documents like a curriculum vitae
full name, birthday, adress, contact data (email, phone/mobile number), hobbys, parents
school and professional stations
My first try was to use elasticsearch for it. Right now I’m indexing the full unstructed text (e.g. from pdf files) with the toolkit “Apache Tika”. But in next step i’m not sure how to analyze the text and extract the wanted data.
Oh before I forget. I want to parse german curriculum vitaes.
Thanks for every suggestions!