“Hearing the gospel in their own tongue”
A January 2024 report on advances in AI and what they mean for the Church
New Unicorn-startup-on-the-block Elevens Labs has rolled out a more refined dubbing/translation service. Now one can simply upload any video under 45 minutes long and hear it in one of 27 different languages in the voice of the speaker. (Somebody should “back translate” Elder Uchtdorf speaking in German back into English to see how close it gets). Word on the street is that it is pretty accurate.
Implications for the Church:
- A lot of the more esoteric debates about things in the Church like Egyptian papyrology, how Elder so-and-so’s remark might be interpreted by this or that group, and ancient Mesoamerican population growth rates were the purview of the sociocultural elite in the Church in English-speaking countries. Google Translate solved some of this, but increasingly even more popular content such as podcasts or YouTube channels from Church and Church-adjacent (or anti-Church) influencers is going to be widely available to the international Church. I suspect Church influencer culture going international will be a net bad thing, but I might be wrong.
- Obviously, this could make General Conference translations much, much more efficient, although if it’s still not 100% equivalent I suspect the Church will continue with traditional translation (and even if it is institutional inertia and internal incentives will probably keep the translation department there for the time being). Still, there are a lot of non-General Conference materials such as BYU Speeches and youth fireside-type meetings that might not meet the threshold for full professional translation, but can get maybe 97% there with this new technology, and now the whole Church potentially has access to everything on the Church’s YouTube channel, for example.
- Translating it in the voice of the speaker might be worth whatever slippage there is between the dubbed version and the professionally translated version. Language and speech is intimate, and having a translator’s voice between the audience and the speaker adds another layer of distance between the messenger and the audience. It would be great for Croatian speakers to hear Sister or Elder so-and-so in their actual voice.
I never thought this was possible.
Very cool, but DEFINITELY check the results.
The first step in this process is for the computer to figure out what the speaker is saying, something I have practical experience with. I attend Church council meetings regularly via Zoom, and because I’m hard of hearing I turn on the AI-generated captions. Most of the time they’re great, and contribute to me having an easier time following the discussion in Zoom meetings than in person. But they struggle with our unique vocabulary and usage, just because that’s not most of the AI’s training data.
One it almost always gets wrong is “amen” at the end of prayers. “I’m in” is kind of fitting; “…in the name of Jesus Christ, demon” definitely is not!
There’s been some progress in teaching generative AI to “code switch” like real people do (“write for an expert in the field”). I won’t be surprised if that carries over to voice recognition (“this is a religious discussion”). It’s also getting easier to train your own model on data that’s relevant to your application. So if this tool turns out not to be great for us right now, it may be in the near future. Still, check the results.
Like RLD, I work with a couple pieces of this, and reasonably close to what’s currently possible. I’m 99% certain that the first couple steps are relying on existing speech recognition and machine translation services (very likely OpenAI’s Whisper for the first, maybe Google for the second). Both have gotten very, very good recently, and the customized models RLD mentions would definitely be useful. But – again agreeing with RLD – you’d absolutely want to check the results at each step. Conference talks would actually work pretty well most of the time since you have careful speech, clean audio and a known text to work with. Things like Q&A events or round tables get trickier. I would assume the translation department has been looking into the potential of ASR and MT for years now and is probably using them already in some form.
I usually just lurk here, but as a translator by profession for the past 37 years, I thought I might have something of worth to offer.
This old-school translator has had to finally admit that automatic translation based on large language models (LLM) is now producing highly usable results, especially with major world languages. However, there is a trade-off involved that will apply to the kind of service that Stephen is describing. By the way, taken as a whole I believe the trade-off is a positive.
LLM translation tools work particularly well on source content that is very straightforward and relatively simple syntactically (although I am often surprised at how well they handle long, grammatically complex sentences). This means that to optimize the use of such tools, there may be pressure on Church authors and speakers to write and speak to a certain standard of simplicity and straightforwardness, avoiding what we might call Neal Maxwell-type creativity. Be Nephi, not Isaiah. This could tend to homogenize somewhat the distinctive voices of general authorities and officers.
At the same time, LLM translation tends to further standardize the output by relying on the phraseology and vocabulary that it already knows. This factor will be mitigated in time as the models’ knowledge base expands. The likely downsides, then, are a certain blandness and homogeneity in the translated talks and articles, or possibly missing subtleties that the author might have intended by using words creatively or in a sense other than their dictionary meaning. And yes, there is still the chance of the worst case happening: of a gross distortion of meaning. I use an LLM tool frequently in my profession, and I have seen it happen, though rarely. Thus the need to check everything, especially any talk or article that could be taken as authoritative.
The upside, of course, is vastly increased access to Church teachings, both spoken and written, by people throughout the world. A good trade-off and an amazing new capability, but we will have to understand the limitations and take care in how we use it.