20. Feb 2026
Mother Language Day: How AI supports linguistic diversity and where it (still) falls short
AI and languages – When digital means only English
Whether voice control, chatbots, or translations: in many products, functions appear to be most mature in English. One key reason for this is data. Where there is more digital text, more documentation, more standardization, and more publicly available training corpora, models can learn more easily. English, as the most widely spoken language in the world – including in computer science and academia – benefits from this. The relatively simple grammar of English also makes it easier for computers to process. Current AI language models only support 30 to a maximum of 80 languages. As of February 2024, Google Translator translates a total of 243 languages, with the long-term goal of supporting 1,000 languages.
Can AI help? Yes – but the bottleneck is training material
The urgency of this issue is highlighted by figures from UNESCO, which are often cited on International Mother Language Day:
"Every two weeks, a language disappears, taking with it an entire cultural and intellectual heritage. UNESCO estimates that there are 8,324 languages spoken or used in sign language. Of these, around 7,000 are still in use. Only a few hundred languages have actually gained a place in education and public life, and less than a hundred are used in the digital world."
When languages are not represented in the digital space, millions of people lack access to information, education, services, and participation. The reason for this is not always the small number of speakers. Even widely spoken languages are rarely supported, denying the majority of their speakers digital participation. This applies to many African languages, each with 10 to 50 million native speakers.
This is where AI can help: translation, speech recognition, and text-to-speech can translate content into languages that have previously received little digital support.
However, there is a problem here:
To achieve good translation results, you need lots of high-quality examples without grammatical errors and with correct spelling – which is often an obstacle for rare or underrepresented languages. And this is precisely why they lag behind in AI systems.
Research and practice are responding to this with two strategies that are currently particularly common:
- More efficient models that require less training data (e.g., through better transfer learning).
- Synthetic training data, for example through back-translation or targeted data generation.
A concrete example of “scaling towards diversity” is Meta's NLLB-200 (No Language Left Behind): a translation model for 200 languages that explicitly addresses low-resource languages.
The technology is used in the Wikipedia environment, among other places: the content translation tool can be used to translate articles into more than 20 languages – including languages that were not previously supported there.
Discrimination and AI: When dialects and social linguistic diversity are disadvantaged
Language is an expression of cultural diversity and is not limited to national borders. Dialects, sociolects, and regional spellings cause problems when using AI: Language models such as GPT-5 or Llama have biases against dialect speakers and reproduce or reinforce negative stereotypes. Experiments conducted by Johannes Gutenberg University Mainz, the University of Hamburg, and the University of Washington show that dialect speakers are significantly more likely to be attributed negative characteristics such as “uneducated” or “unfriendly” by AI.
These biases against non-standard or regional language variants have also been observed in other languages such as English. Mechanisms to avoid this discrimination still have room for improvement in German. In English, many of these “mitigation strategies” have already been implemented to minimize prejudice against dialects and social groups.
What does this mean for companies (and for us as a tech industry)?
If we take multilingualism seriously, AI systems need more than just “more parameters”:
- Measure language coverage, don't guess: Which languages, varieties, and dialects really work well in the product?
- Test quality according to target group: Standard English-only benchmarks are not enough when user groups speak and write diversely.
- Systematically check for bias: Stereotypes in generated attributes, decisions, summaries, etc.
- Plan for human-in-the-loop: Especially for languages with little training data, community and expert feedback is invaluable (and often indispensable).
Languages tested at Accso
Our team at Accso speaks more than 20 different languages. So we decided to put ChatGPT to the test: How well does it work in isiZulu (the most widely spoken native language in South Africa), Spanish, Creole, and Dutch, for example?
To do this, we presented translations created by ChatGPT (model GPT 5.2) to our multilingual colleagues for evaluation. The sentences created cover various quality characteristics such as fluency, grammar, and technical language.
Our – non-representative – insight: There are rather small differences between the 21 languages evaluated, but clear differences between the evaluation dimensions: Formal/informal and precision of meaning are very good almost everywhere, while “thinking in the language,” grammar, and code-switching are significantly weaker. Overall, the quality appears “robust” for standard requirements, but shows weaknesses in cognitively/contextually more demanding areas.
Albanian, Bulgarian, Indonesian, and Italian scored very well in our test. At the lower end—but still understandable—were Kazakh, Croatian, Arabic, Chinese, and Creole. These “weaker” languages are not necessarily bad overall—often the average is dragged down by one or two very low subscores.
Our internal test is not sufficient to determine differences between the various language families, for example.
| Language | Fluency | Formal vs. Informal | Idioms & Naturalness | Grammar | Precision | Thinking | Code Switching | Technical Language | Average |
| Afrikaans | 1 | 2 | 2 | 1 | 2 | 1 | 2 | 2 | 1,6 |
| Albanian | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2,0 |
| Arabic | 1 | 2 | 2 | 1 | 2 | 2 | 0 | 2 | 1,5 |
| Bulgarian | 2 | 2 | (keine Angabe) | 2 | 2 | 2 | 2 | 2 | 2,0 |
| Chinese | 1 | 2 | 2 | 1 | 2 | 1 | 2 | 1 | 1,5 |
| Creole | 1 | 1 | 2 | 1 | 2 | 1 | 2 | 2 | 1,5 |
| Croatian | 2 | 2 | 2 | 1 | 2 | 0 | 1 | 1 | 1,4 |
| French | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 1,9 |
| Greek | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 1,9 |
| Indonesian | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2,0 |
| isiZulu | 1 | 2 | 1 | 1 | 2 | 2 | 2 | 2 | 1,6 |
| Italian | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2,0 |
| Kazakh | 1 | 2 | 1 | 1 | 2 | 1 | 1 | 1 | 1,3 |
| Korean | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 1,9 |
| Persian | 2 | 2 | 2 | 1 | 2 | 2 | 0 | 2 | 1,6 |
| Polish | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 1,9 |
| Portuguese | 1 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 1,8 |
| Russian | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1,9 |
| Spanish | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 1,9 |
| Turkish | 1 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 1,8 |
| Ukrainian | 2 | 2 | 2 | 1 | 2 | 1 | 2 | 2 | 1,8 |
| Average | 1,6 | 2,0 | 1,9 | 1,6 | 2,0 | 1,5 | 1,6 | 1,8 |
Rating:
2 = natural, correct, appropriate register
1 = understandable, but unnatural/awkward or minor errors
0 = many errors, evasive, drifts into another language, seems “broken”
Sources:
- https://www.unesco.org/en/days/mother-language
- https://www.deutschlandfunknova.de/beitrag/ki-modell-forschende-bringen-kuenstlicher-intelligenz-seltene-sprachen-bei
- https://www.uni-saarland.de/aktuell/tag-der-muttersprache-26294.html
- https://presse.uni-mainz.de/ki-sprachmodelle-zeigen-vorurteile-gegen-regionale-deutsche-sprachvarianten/
Publications:
- Jesujoba O. Alabi, David Ifeoluwa Adelani, Marius Mosbach, and Dietrich Klakow. 2022. Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4336–4349, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Minh Duc Bui, Carolin Holtermann, Valentin Hofmann, Anne Lauscher, and Katharina von der Wense. 2025. Large Language Models Discriminate Against Speakers of German Dialects. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 8212–8240, Suzhou, China. Association for Computational Linguistics.