Text analysis. Identifying linkage between Board Minutes, and Mission Vision, Values
Using R to complete a project to determine if Organizational Minutes are linked to Mission, Vision, Values.
I have minutes, mission vision and values from 15 organizations over a span of 8 years.
I have prepped all the data, using standard techniques, (e.g. Convert to Lower case, remove punctuation, stop words, etc and lemmatized text)
For each foundational document (Mission, Vision, Values) I have calculated the proportion each word appears in the organization minutes. Summing these proportions allows me to calculate a value that shows how much each organization uses the words in the foundational document. Unsurprisingly there is considerable variation between organizations. (why is another problem at the moment)
One of the areas of Variation is the length of the foundational documents, for example, some have a 5 word mission statement while others have 60 word mission statements.
I need some method to adjust for the length of the foundational document. I could take the word by word sum of the proportional occurrence in the minutes and divide it by the total number of words in the foundational document but I wonder if there is a better way?
Any expertise/advise on doing this type of analysis appreciated.