While we know that gospel principles are eternal, we must also admit that the language used to describe them changes over time. And now we have a tool for discovering and analyzing how Church leaders have changed their descriptions of the gospel over the past 160 years.
BYU Linguistics professor Mark Davies has released his Corpus of LDS General Conference Talks, a database containing General Conference talks since 1850 (some 10,000 talks and 24 million words) along with robust tools for searching and analyzing how the language in the talks has changed over time. This corpus, or collection of texts, is just the latest of several that Davies has made available to researchers, including his 400 million word Corpus of Historical American English and his 410 million word Corpus of Contemporary American English.
This is much more than just a long word processing document and better than average search tools that allow you to find every time the word “green” appears. The texts in this corpus includes much more information than just words. The texts are dated and have been analyzed to identify the part of speech of each word. And the search tools are much more sophisticated than those you will find in any word processor. Users can search not just for a word, but for its synonyms also, essentially allowing users to search for a concept instead of just a word (searching for “sin” and its synonyms also finds evil, wickedness, iniquity, crime, transgression, immorality, transgress, err, wrongdoing, lapse, debauchery, depravity, turpitude, misdemeanor and misdeed).
Best of all, users can look at the frequency of these words and concepts over time, learning, for example, that the concept of “sin” was mentioned twice as often as it is now in the 1850s, and 50% more through the 1880s, before falling to a level 20% lower than now in the early 1900s. The concept was again a popular topic int eh 1960s and 1970s (50% more than now) before dropping back down again.
Late last fall Google introduced a tool with some of these capabilities, drawing a few posts here on the bloggernacle about how it could be used for Mormon Studies (see J. Max Wilson’s post at Millennial Star and my own on a still unexplained Mormon literary mystery on A Motley Vision). We were then unaware that Davies already had a tool that provided the same information and allowed more sophisticated searches. While not as large as Google Books, which includes 500 billion words, the tools in Davies’ smaller (400 million words) Corpus of Historical American English are much more sophisticated, and Davies argues that, at least for researchers, his corpus is more useful.
Unfortunately, the interface for using Davies’ corpus isn’t as easy to use as Google’s—mainly because it is so much more sophisticated. It is hard to make more complex tools easy to use—sophistication comes at a price. Davies’ system also doesn’t give the nice graphs that Googlelabs’ project provides. However, its easy to take the data from Davies’ corpus and copy it into a spreadsheet, where any spreadsheet jockey worth his salt can produce very nice graphs.
But perhaps most importantly, Davies’ General Conference corpus has one overriding advantage over any other—it is limited to just Conference talks. Searching on Google Books’ ngrams viewer or even on Davies own Corpus of Historical American English tells you about overall use of words and concepts—it gives you an idea of how the culture as a whole used language. The General Conference corpus helps us understand the word use of a much smaller group of people—LDS general authorities. That restriction alone makes this corpus extremely useful to Mormon Studies.
Of course, this also begs the question “what other corpora could be useful to Mormon Studies?” Off the top of my head it should be possible to put together corpora for things like the text of Mormon periodicals, Mormon missionary diaries (from BYU’s collection), the Deseret News and even the collection of contemporary Mormon texts we call the bloggernacle. I wonder what we could learn if we were able to analyze these corpora also?