Background
This page gives more information about the various analyses performed and the analyzer in general. See below for a description of the analysis process, and a description of the complexity measures used for the complexity report.
What is an Analyzer?
An analyzer is a network that represents the possible words of the language. It might be helpful to think of it like a map of a road system, where there are towns and the roads that connect them, except this is a map of possible words, where each road is labeled with a letter, and a path through the map represents a single word. The network was built by hand to specify what the prefixes and suffixes of the language are, how they combine with the root words, and any modification that happens to a prefix/suffix/root combination.
The site reads a word with the analyzer by starting at the beginning of the network, and spending each letter as "fuel" to get to the next town if the letter matches the letter labeling the road. Some towns in the network are designated as "pass through only towns", and others are "destination towns" where you are allowed to end a trip. If the site can get all the way to a destination town by using all of the letters in the word, it says that the word is a possible word in the language. Otherwise, it reports that the trip was a failure.
There's one more aspect of the analyzer: as you travel along the roads, you can pick up souvenirs. In this case the souvenirs are "tags" that show what prefixes, root word, and suffixes you have encountered on the journey. When a successful trip is completed, the analyzer presents the tags in the order that they were collected in. This appears in the "narrow analysis" field when you select the "full analysis (interlinear)" option. Everything else on the site is based on this narrow analysis.
Picking and Applying Analyzers
When you ask the site to perform an analysis, the site takes the text you entered, and sends each word to the first analyzer you specified in step 0 here. Each analyzer is applied to the words that the previous analyzer failed on. Be sure to make the highest priority analyzer be the one that best matches the text overall.
So which analyzer to choose? The analyzers mainly differ in spelling practice, but now we also have enough analyzers that we can support different dialects!
There are currently four analyzers to choose from:
- Nishnaabemod
- Dialect zone: Eastern
- Spelling: Chuck Fiero's double vowel system (see the Eastern Ojibwa-Chippewa-Ottawa Dictionary by Richard Rhodes)
- Grammar/Vocabulary: The vocabulary of this analyzer is based on the Nishnaabemwin Online Dictionary, and the grammar is based on Professor Rand Valentine's Nishnaabemwin Reference Grammar.
- Other notes: These Eastern dialects have dropped many vowels and so can be called Nishnaabemwin instead of Anishinaabemowin. There has been some evolution in the designations of the spelling styles that each analyzer is tuned for. For a while I used the last name of a person associated with the system, but that was a bit cumbersome to remember and didn't quite sit right (see below). The current practice just shows how an example word Nishnaabemod/Nishnaabemat/Anishinaabemod 'if he/she speaks Nishnaabemwin' is spelled (hat tip to Professor Mary Ann Corbiere for the final, concise name idea). Hopefully this is transparent and informative.
- Nishnaabemat
- Dialect zone: Eastern
- Spelling: A popular modification of Chuck Fiero's double vowel system. This style is used in the Nishnaabemwin Online Dictionary
- Grammar/Vocabulary: The vocabulary of this analyzer is based on the Nishnaabemwin Online Dictionary, and the grammar is based on Rand Valentine's Nishnaabemwin Reference Grammar.
- Other notes: This analyzer is the Nishnaabemat analyzer, with another layer to convert the Nishnaabemod spelling into Nishnaabemat spelling. Moving from Nishnaabemod spelling to Nishnaabemat spelling is seamless, but going the other way can produce hiccups. For a while I called this spelling system the 'Corbiere system' in honor of Professor Maanyaan/Mary Ann Corbiere, who has promoted it. However, when I asked Dr. Corbiere about this, she pointed out that the differences are fairly minor, so branding it with a totally different name seemed inappropriate.
- Anishinaabemod
- Dialect zone: Eastern
- Spelling: Chuck Fiero's double vowel system, but vowels have not been dropped.
- Grammar/Vocabulary: The vocabulary of this analyzer is based on the Nishnaabemwin Online Dictionary, and the grammar is based on Rand Valentine's Nishnaabemwin Reference Grammar.
- Other notes: This analyzer is the Nishnaabemod analyzer, but with the vowel dropping rule turned off and a couple other minor adjustments. See The Dog's Children by Angeline Williams for examples of stories from what we are calling the Eastern or Nishnaabemwin region, but with dropped vowels retained, so that instead of, for instance, kidod you will find ikidod 'if he/she says'.
- Anishinaabemod (Southwestern)
- Dialect zone: Southwestern (Border Lakes/Minnesota/Wisconsin)
- Spelling: Chuck Fiero's double vowel system, also without dropping vowels.
- Grammar/Vocabulary: The vocabulary of this analyzer is based on the Ojibwe People's Dictionary, and the grammar is based on paradigms collected by Professor Chris Hammerly.
- Divergence alert: This analyzer is not related to the various Nishnaabemwin analyzers above. It was written by a different team (though for uninteresting reasons this site is using this version). The Southwestern Anishinaabemowin analyzer uses a different set of grammatical abbreviations, so the "narrow" analysis field will look fairly different. See below for more discussion of the differences.
- Other notes: At the moment, we do not have access to terse one word translations for Southwestern Anishinaabemowin words.
Analysis Format
The largest differences between the Nishnaabemwin analyzers and the Southwestern Anishinaabemowin analyzer are that the Nishnaabemwin analyzers are "terse" and "concrete", while the Southwestern Anishinaabemowin analyzer is what we might call "verbose" and "abstract". The Nishnaabemwin analyzers are terse because, for instance, they only mark plural and add nothing for the singular, leaving the default unstated. Saying that the Southwestern Anishinaabemowin analyzer is verbose means that either value of "singular" or "plural" is always stated, instead of leaving default values like "singular" unstated.
The terseness of the Nishnaabemwin analyzers derives partially from the Nishnaabemwin analyzers being concrete, meaning that they tend to "follow the language". Nishnaabemwin only marks plural and adds nothing for the singular (like most languages), so the Nishnaabemwin analyzers do the same. The language also pervasively uses combinations of affixes to convey information (and some affixes in VTAs are basically instructions for how other combinations of affixes should be interpreted!). Since the Nishnaabemwin analyzers are concrete, they generally represent grammatical information as it comes up in the word, even if what might be expressed in one place in other languages is spread over multiple affixes. The Southwestern Anishinaabemowin analyzer is abstract, meaning that it combines the information from potentially multiple affixes. This means that the two analyzers drift even further apart in how they represent the language.
To be clear, I do not think the verbose/abstract approach to analysis is "wrong". Ultimately, the same information is being conveyed by the two systems, so a lot of this comes down to preference for the casual user. The verbose/abstract approach used in the Southwestern Anishinaabemowin analyzer is direct (everything is fully written out and each tag states everything relevant about itself). It is also approachable, since the great majority of people who have studied languages have not studied Algonquian languages, so it feels more familiar to abstract away from the Algonquian specific combination system. The terse/concrete approach used in the Nishnaabemwin analyzer is granular and faithful to the actual language. Which you prefer is entirely up to you. That said, at some point, serious students of the language should probably have a granular, faithful understanding of how the affixes work. The Nishnaabemwin analyzer obviously supports this directly. How and when students should develop this understanding are important questions.
Broad Analysis
The site does some extra processing to bridge the gap between the Southwestern Anishinaabemowin and Nishnaabemwin analyzers, by making a "broad" analysis that is common to both. The broad analysis is abstract (combining information from multiple affixes into a single place), but still terse (not enumerating default values). The "broad" analyses should be consistent between the analyzers. Please contact the author of the site if you find something wrong.
Effect on Higher Analyses
Higher level analyses that depend on the narrow analyses might not perform identically if you use the Southwestern Anishinaabemowin analyzer or the Nishnaabemwin analyzers. For instance, the complexity scoring system (see below) counts the amount of morphological information in the analysis. With the Nishnaabemwin analyzers, the complexity score will reflect how many affixes are in the word, while with the Southwestern Anishinaabemowin analyzer, it will behave more like a traditional word-counting complexity measure as is used in English reading grade level scores (though more "informationally heavy" word categories, like VTAs, will contribute more to the complexity score of a sentence than "informationally light" words like adverbs or unpossessed nouns).
Higher Level Analyses
As neat as the full analysis of a word, sentence or story is, people probably want something further to happen besides just getting the raw analysis. Here we describe (some of) what has already been done. If you have an idea for what else should be done, please reach out!
Frequency
It can be really useful to know which words appear the most in a text. Without an analyzer, this is hard to determine in a language like Nishnaabemwin, where there can be lots of different prefixes or suffixes added on to the base, or root, word. When you select this option, the site looks at the analysis of the word instead of the word itself. All of the words that are analyzed as having the same root are grouped together and counted.
Sentence Sort (by complexity)
Sentences may be hard or simple. The various "sentence sort" options were created to help students/teachers zero in on the sentences at the right level for them.
"Sentence sort (by complexity)" works by counting the amount of grammatical information. That is, it counts the number of narrow morphosyntactic features in a sentence. The idea here is that longer sentences/sentences with more morphological information are more complex. This score seems to provide a fairly good distribution of scores that matches our own subjective judgements. Off the cuff, a score of 20 seems to be a pretty middle of the road score. Note you might say that the complexity sort actually punts on the question of finding sentences "at the right level" for someone, because it only measures the quantity of grammatical information, not the type of grammatical information.
At some point it would be good to scale the complexity score according to hard/easy or by grade level, but we do not currently have data about how difficult various texts are for learners or readers.
Sentence Sort (by verb type)
The verb type sorting option lets students or teachers access sentences with the same types of grammatical structures. The different verb types of Nishnaabemwin (VTA, VTI, VAIO, VAI, VII, see the grammatical code explanation here if these abbreviations are not familiar) differ a lot in what information they convey. The VII verbs describe inanimate objects and only have a handful of different suffixes that can be added to them, mostly describing whether there is one or many inanimate things being described. VAI verbs describe actions done by animate entities, and these verbs have machinery that encodes (among other things) whether I/you/someone else did the action, and how many of them there were. VAIO and VTI verbs are much the same as VAI verbs, but they also indicate some information about the thing that the action was done to. Finally VTA verbs describe things that are done to animate entities, and there is an enormous amount of machinery used to track whether I/you/someone else did the action, and whether it was done to me/you/someone else (among other things).
So that students or teachers can see sentences containing these different verb types in one place (and then compare the instances of the same verb types together to see how they differ), the verb type sorting option was made. The sentences within each block of verbs are sorted by their complexity score. Note that a sentence will be listed in multiple sections if it has verbs of different types in it. Also, to reduce clutter/give students something productive to struggle with, there is no indication of where the verb is in the sentence. This could obviously be changed if requested, but also see below.
Interestingly, stories seem to use roughly the same mix of verb types. At least, verbs were given scores as following (where higher numbers are more complex): VTA=4, VAIO=3, VTI=3, VAI=2, VII=1 (see the grammatical code explanation here). In a very small sample of texts, the average on this measure did not seem to differ much between texts.
Verb Collation
The verb collation option allows you to isolate the verbs from a story for comparison. Within each block of verbs of the same type, the verbs are sorted by the broad analysis. This means that verbs with the same subject (the doer) will be grouped together, within that verbs with the same object (the do-ee) are grouped together, and within that, verbs that fit into the same context are grouped together (subordinate clauses/relative clauses/questions, aka 'conjunct order' verbs; commands, aka 'imperative order' verbs, and main clause verbs aka 'independent order' verbs), and so on down the list of categories that verbs can show.
A small tangent, since we mentioned the conjunct order/independent order split. One possible metric to score texts on is the proportion of verbs in conjunct vs independent order. If there are more conjunct order verbs, the score is positive. The score is negative if 50% or more of the verbs are in independent order. In my view, this score does not say much about how hard a text is, because both independent order and conjunct order are tricky in their own ways. The conjunct order has a lot of irregularities, but conjunct order verbs put all of the characteristics of the doer/do-ee in one place. The independent order is very regular, and as the order that appears in main clauses, it will be something you use a lot. The thing that is hard about it is that information about the doer/do-ee is spread across multiple affixes, plus there are tricky questions about which vowels will appear. However, it may be useful to see how far a text leans in a particular direction, so you have an idea of what kind of verbs you are going to be getting.
Triage
The "triage" options are intended for easily identifying words that the analyzer failed on. This is mostly used for bug testing.
Speaking of bugs, while the analyzers are quite reliable, they are not perfect (at some point in the near future, I intend to post performance data for the analyzers, especially because the analyzers will be developed further). There could be thoughtless mistakes, and in some places I had to make educated guesses about how the language works, and I could have been wrong. The site also only presents one analysis for a word, though there may be several possible analyses.
Last updated: 8/27/2025