Background

This page gives more information about the various analyses performed and the analyzer in general. See below for a description of the analysis process, and a description of the complexity measures used for the complexity report.


When you ask the site to perform an analysis, the site applies the analyzers in the order specified in step 0 here. Each analyzer is applied to the words that the previous analyzer failed on. Be sure to make the highest priority analyzer match the spelling practice that is most widely used in the text.


I created three complexity scores to use for complexity-based sorting, but only one seems to be really useful. So far, the most useful score gives the average number of narrow morphosyntactic features in a sentence. The idea here is that longer sentences/sentences with more morphological information are more complex. This score seems to provide a fairly good distribution of scores that matches our own subjective judgements. Off the cuff, a score of 20 seems to be a pretty middle of the road score.

Another score represents the average complexity of verbs. There are five major types of verbs in Nishnaabemwin, and I score them as follows (where higher numbers are more complex): VTA=4, VAIO=3, VTI=3, VAI=2, VII=1 (see the grammatical code explanation here). In a very small sample of texts, the average scores on these measures did not seem to differ much between texts. I do feel that these scores are quite useful for assessing the difficulty of individual sentences.

Finally, the last score represents the proportion of verbs in conjunct vs independent order. If there are more conjunct order verbs, the score is positive. The score is negative if 50% or more of the verbs are in independent order. In my view, this score does not say much about how hard a text is, because both independent order and conjunct order are tricky in their own ways. However, it seems useful to see how far a text leans in a particular direction.

At some point it would be good to scale the scores according to hard/easy or by grade level, but we do not currently have data about how difficult various texts are for learners or readers.


Last updated: 11/23/2024