Background

Background

This page gives more information about the various analyses performed and the analyzer in general. See below for a description of the analysis process, and a description of the complexity measures used for the complexity report.


What is an Analyzer?

An analyzer is a network that represents the possible words of the language. It might be helpful to think of it like a map of a road system, where there are towns and the roads that connect them, except this is a map of possible words, where each road is labeled with a letter, and a path through the map represents a single word. The network was built by hand to specify what the prefixes and suffixes of the language are, how they combine with the root words, and any modification that happens to a prefix/suffix/root combination.

The site reads a word with the analyzer by starting at the beginning of the network, and spending each letter as "fuel" to get to the next town if the letter matches the letter labeling the road. Some towns in the network are designated as "pass through only towns", and others are "destination towns" where you are allowed to end a trip. If the site can get all the way to a destination town by using all of the letters in the word, it says that the word is a possible word in the language. Otherwise, it reports that the trip was a failure.

There's one more aspect of the analyzer: as you travel along the roads, you can pick up souvenirs. In this case the souvenirs are "tags" that show what prefixes, root word, and suffixes you have encountered on the journey. When a successful trip is completed, the analyzer presents the tags in the order that they were collected in. This appears in the "narrow analysis" field when you select the "full analysis (interlinear)" option. Everything else on the site is based on this narrow analysis.


Picking and Applying Analyzers

When you ask the site to perform an analysis, the site takes the text you entered, and sends each word to the first analyzer you specified in step 0 here. Each analyzer is applied to the words that the previous analyzer failed on. Be sure to make the highest priority analyzer be the one that best matches the text overall.

So which analyzer to choose? The analyzers mainly differ in spelling practice, but now we also have enough analyzers that we can support different dialects!

There are currently four analyzers to choose from:


Analysis Format

The largest differences between the Nishnaabemwin analyzers and the Southwestern Anishinaabemowin analyzer are that the Nishnaabemwin analyzers are "terse" and "concrete", while the Southwestern Anishinaabemowin analyzer is what we might call "verbose" and "abstract". The Nishnaabemwin analyzers are terse because, for instance, they only mark plural and add nothing for the singular, leaving the default unstated. Saying that the Southwestern Anishinaabemowin analyzer is verbose means that either value of "singular" or "plural" is always stated, instead of leaving default values like "singular" unstated.

The terseness of the Nishnaabemwin analyzers derives partially from the Nishnaabemwin analyzers being concrete, meaning that they tend to "follow the language". Nishnaabemwin only marks plural and adds nothing for the singular (like most languages), so the Nishnaabemwin analyzers do the same. The language also pervasively uses combinations of affixes to convey information (and some affixes in VTAs are basically instructions for how other combinations of affixes should be interpreted!). Since the Nishnaabemwin analyzers are concrete, they generally represent grammatical information as it comes up in the word, even if what might be expressed in one place in other languages is spread over multiple affixes. The Southwestern Anishinaabemowin analyzer is abstract, meaning that it combines the information from potentially multiple affixes. This means that the two analyzers drift even further apart in how they represent the language.

To be clear, I do not think the verbose/abstract approach to analysis is "wrong". Ultimately, the same information is being conveyed by the two systems, so a lot of this comes down to preference for the casual user. The verbose/abstract approach used in the Southwestern Anishinaabemowin analyzer is direct (everything is fully written out and each tag states everything relevant about itself). It is also approachable, since the great majority of people who have studied languages have not studied Algonquian languages, so it feels more familiar to abstract away from the Algonquian specific combination system. The terse/concrete approach used in the Nishnaabemwin analyzer is granular and faithful to the actual language. Which you prefer is entirely up to you. That said, at some point, serious students of the language should probably have a granular, faithful understanding of how the affixes work. The Nishnaabemwin analyzer obviously supports this directly. How and when students should develop this understanding are important questions.


Broad Analysis

The site does some extra processing to bridge the gap between the Southwestern Anishinaabemowin and Nishnaabemwin analyzers, by making a "broad" analysis that is common to both. The broad analysis is abstract (combining information from multiple affixes into a single place), but still terse (not enumerating default values). The "broad" analyses should be consistent between the analyzers. Please contact the author of the site if you find something wrong.


Effect on Higher Analyses

Higher level analyses that depend on the narrow analyses might not perform identically if you use the Southwestern Anishinaabemowin analyzer or the Nishnaabemwin analyzers. For instance, the complexity scoring system (see below) counts the amount of morphological information in the analysis. With the Nishnaabemwin analyzers, the complexity score will reflect how many affixes are in the word, while with the Southwestern Anishinaabemowin analyzer, it will behave more like a traditional word-counting complexity measure as is used in English reading grade level scores (though more "informationally heavy" word categories, like VTAs, will contribute more to the complexity score of a sentence than "informationally light" words like adverbs or unpossessed nouns).


Higher Level Analyses

As neat as the full analysis of a word, sentence or story is, people probably want something further to happen besides just getting the raw analysis. Here we describe (some of) what has already been done. If you have an idea for what else should be done, please reach out!


Frequency

It can be really useful to know which words appear the most in a text. Without an analyzer, this is hard to determine in a language like Nishnaabemwin, where there can be lots of different prefixes or suffixes added on to the base, or root, word. When you select this option, the site looks at the analysis of the word instead of the word itself. All of the words that are analyzed as having the same root are grouped together and counted.


Sentence Sort (by complexity)

Sentences may be hard or simple. The various "sentence sort" options were created to help students/teachers zero in on the sentences at the right level for them.

"Sentence sort (by complexity)" works by counting the amount of grammatical information. That is, it counts the number of narrow morphosyntactic features in a sentence. The idea here is that longer sentences/sentences with more morphological information are more complex. This score seems to provide a fairly good distribution of scores that matches our own subjective judgements. Off the cuff, a score of 20 seems to be a pretty middle of the road score. Note you might say that the complexity sort actually punts on the question of finding sentences "at the right level" for someone, because it only measures the quantity of grammatical information, not the type of grammatical information.

At some point it would be good to scale the complexity score according to hard/easy or by grade level, but we do not currently have data about how difficult various texts are for learners or readers.


Sentence Sort (by verb type)

The verb type sorting option lets students or teachers access sentences with the same types of grammatical structures. The different verb types of Nishnaabemwin (VTA, VTI, VAIO, VAI, VII, see the grammatical code explanation here if these abbreviations are not familiar) differ a lot in what information they convey. The VII verbs describe inanimate objects and only have a handful of different suffixes that can be added to them, mostly describing whether there is one or many inanimate things being described. VAI verbs describe actions done by animate entities, and these verbs have machinery that encodes (among other things) whether I/you/someone else did the action, and how many of them there were. VAIO and VTI verbs are much the same as VAI verbs, but they also indicate some information about the thing that the action was done to. Finally VTA verbs describe things that are done to animate entities, and there is an enormous amount of machinery used to track whether I/you/someone else did the action, and whether it was done to me/you/someone else (among other things).

So that students or teachers can see sentences containing these different verb types in one place (and then compare the instances of the same verb types together to see how they differ), the verb type sorting option was made. The sentences within each block of verbs are sorted by their complexity score. Note that a sentence will be listed in multiple sections if it has verbs of different types in it. Also, to reduce clutter/give students something productive to struggle with, there is no indication of where the verb is in the sentence. This could obviously be changed if requested, but also see below.

Interestingly, stories seem to use roughly the same mix of verb types. At least, verbs were given scores as following (where higher numbers are more complex): VTA=4, VAIO=3, VTI=3, VAI=2, VII=1 (see the grammatical code explanation here). In a very small sample of texts, the average on this measure did not seem to differ much between texts.


Verb Collation

The verb collation option allows you to isolate the verbs from a story for comparison. Within each block of verbs of the same type, the verbs are sorted by the broad analysis. This means that verbs with the same subject (the doer) will be grouped together, within that verbs with the same object (the do-ee) are grouped together, and within that, verbs that fit into the same context are grouped together (subordinate clauses/relative clauses/questions, aka 'conjunct order' verbs; commands, aka 'imperative order' verbs, and main clause verbs aka 'independent order' verbs), and so on down the list of categories that verbs can show.

A small tangent, since we mentioned the conjunct order/independent order split. One possible metric to score texts on is the proportion of verbs in conjunct vs independent order. If there are more conjunct order verbs, the score is positive. The score is negative if 50% or more of the verbs are in independent order. In my view, this score does not say much about how hard a text is, because both independent order and conjunct order are tricky in their own ways. The conjunct order has a lot of irregularities, but conjunct order verbs put all of the characteristics of the doer/do-ee in one place. The independent order is very regular, and as the order that appears in main clauses, it will be something you use a lot. The thing that is hard about it is that information about the doer/do-ee is spread across multiple affixes, plus there are tricky questions about which vowels will appear. However, it may be useful to see how far a text leans in a particular direction, so you have an idea of what kind of verbs you are going to be getting.


Triage

The "triage" options are intended for easily identifying words that the analyzer failed on. This is mostly used for bug testing.

Speaking of bugs, while the analyzers are quite reliable, they are not perfect (at some point in the near future, I intend to post performance data for the analyzers, especially because the analyzers will be developed further). There could be thoughtless mistakes, and in some places I had to make educated guesses about how the language works, and I could have been wrong. The site also only presents one analysis for a word, though there may be several possible analyses.


Last updated: 8/27/2025