Creating chatbots in African languages

AI Researchers Aiming to Create Chatbots for African Languages


Natural Language Processing (NLP) has made tremendous progress in the most widely used languages ​​such as English and Russian. But an emerging body of research focuses on training AI models using African languages.

Thanks to such efforts, the dream of an African language chatbot is getting closer to reality.

Chatbot research is dominated by the English language

The natural language processing and large-scale language models that power chatbots like ChatGPT are still relatively new technologies. And so far, research and development has focused on the most spoken languages.

Betfury

For example, ChatGPT is available in English, Spanish, French, German, Portuguese, Italian, Dutch, Russian, Arabic, and Chinese.

The tendency towards language dominance in AI research is largely driven by data availability.

Read more: 9 Best AI Crypto Trading Bots to Maximize Your Profits

It is estimated that more than half of the written content available online is in English. Accordingly, the largest and most readily available datasets for training language models are English, followed by other popular languages.

African languages ​​pose a challenge to AI researchers.

Currently, the world's biggest AI companies are scrambling to build the most advanced chatbots for a handful of languages. But another area of ​​research wants to develop AI tools for many popular languages.

The limited availability of training data for African languages ​​poses a major challenge for AI developers.

The linguistic diversity of many African countries further complicates matters. For example, South Africa has 11 official spoken languages ​​and thirty-five indigenous languages. With nearly 2000 languages ​​spoken across the continent, it is almost impossible to compile a vast library of digital content with English.

Representation of African Linguistic Diversity (Source: ACL Anthology)

Moreover, a recent study revealed that the lack of basic digital language tools hinders content creation. As the authors note:

“Creating digital content in African languages ​​is frustrating because of the lack of basic tools like dictionaries, spell checkers and keyboards.”

However, efforts are being made to increase access to African language information, for example by digitizing archival language repositories and making many databases freely accessible. The work of content creators, moderators and translators is also critical.

Read more: 6 Best Copywriting Platforms in 2023

Multilingual models can make African language chatbots a reality.

Although a lack of training data has held African language NLP research back, multilingual pre-trained linguistic models (mPLMs) can help researchers overcome this challenge.

Pre-trained models can be thought of as the building blocks of high-performing chatbots. But they still need task-specific fine-tuning to deliver conversational results.

By acquiring comprehensive language data during pre-training, multilingual speakers can interpret the basic language structure and detail without the huge training datasets normally required.

Surprisingly, a recent study showed that language similarity improves model performance. Just as speakers of a language often communicate with each other, models trained in the same language can accurately interpret similar languages.

Using this approach, researchers developed an mPLM called SERENGETI that covers 517 African languages ​​and language varieties.

This represents a major technological advance and a major improvement over the 31 African languages ​​previously covered.

Read more: 11 Best Crypto Portfolio Trackers in 2023

Disclaimer

Adhering to the Trust Project guidelines, BeInCrypto is committed to unbiased, transparent reporting. This news report aims to provide accurate and up-to-date information. However, readers are advised to independently verify facts and consult with professionals before making any decisions based on this content. Please note that our terms and conditions, privacy policy and disclaimer have been updated.

Leave a Reply

Pin It on Pinterest