I’m making a sequence chunking deep learning program using TensorFlow. Currently, I’m stuck in the data form.
Before I talk about the datasets, let me talk about the program first. The program I’m using is based on “Neural Models for Sequence Chunking” by Feifei Zhai, Saloni Potdar, Bing Xiang, and Bowen Zhou. The model I want to use is model 2 (Encoder-Decoder Framework).
The deep learning program works like this:
- The program receives input data text
- The program will segment the text into a phrase
- The program will label them
So if the input is “But it could be much worse”, the output is like this:
But - O
it - NP
could be - VP
much worse - ADJP
Now onto the question. I can’t decide what data form I should use for this program. The data should include sentences, but I also have to add the correct output. That’s the question.
What Data Form I should make for it to be read by the program?
These are some ideas I have, but I’m not sure yet.
|But it could be much worse||O NP VP ADJP|