some inspirationwould be great

Hey guys,

i’ve got an idea but i dont really know how to realize it. At first let me introduce you the idea/project. :wink:

I just want to extract structured data out of documents like a curriculum vitae

  • full name, birthday, adress, contact data (email, phone/mobile number), hobbys, parents

  • school and professional stations

  • …  

My first try was to use elasticsearch for it. Right now I’m indexing the full unstructed text (e.g. from pdf files) with the toolkit “Apache Tika”. But in next step i’m not sure how to analyze the text and extract the wanted data.

Oh before I forget. I want to parse german curriculum vitaes. :confused:

Thanks for every suggestions! 

best greetings