Society22

SONORA Belarusian Community Project Seeks Those Who Can Help 'Connect' Belarusian Voice to BigTech

In the "TTS in Belarusian" channel, an announcement appeared about the launch of SONORA. This is a Belarusian community project where participants are recording an open Belarusian dataset for TTS — "so that Belarusian sounds natural in modern AI services," writes the Telegram channel Dzik Pic.

Illustrative photo. Photo: freepik.com

What's the problem? As stated on the website, today there are almost no high-quality Belarusian voice datasets specifically recorded for training modern TTS models. At the same time, the Belarusian language has thousands of homographs: identical spelling but different meanings depending on the stress. If the model makes a mistake with stress, both the pronunciation and the meaning are broken.

The second problem is phonetic correctness: softness, 'ў', 'дз/дж', intonation, and rhythm:

«Without high-quality studio material, models repeat errors and sound less natural».

Yes, similar initiatives already exist today, for example, Donar.by or the BexTTS model. But Sonora will continue their path and «bring it to a studio level».

Therefore, enthusiasts have launched crowdfunding to «organize professional studio recording of Belarusian speech using specially selected texts».

Researchers, enthusiasts, startups, and educational initiatives will be able to use the dataset.

In addition, the team plans partnerships with Google, OpenAI, and ElevenLabs — «so that our dataset strengthens their solutions for Belarusians».

Specifically, SONORA is looking for warm intros / direct contacts at:

  • Google;
  • OpenAI;
  • ElevenLabs;
  • Speechify;
  • Meta.

«If you know anyone in these companies and can make an intro — please write to us privately, or write a request on our behalf yourself. The text of the request is here», — the creators ask.

You can listen to how the Belarusian language sounds in technologies today on the homepage.

Comments2

  • ???
    09.03.2026
    Чаму напрыклад гэтым не займаецца Каардынацыйная Рада? Ці гендарныя квоты і лгбт важней чым беларуская мова?
  • Чарговы веласіпед
    10.03.2026
    Так а ў чым розніца паміж дадзеным праектам і Donar.by? Donar.by ж спецыяльна пашыраў функцыянал Common Voice, каб былі датасэты не толькі на распазнаванне гаворкі, але і на агучку.

    Ну а наконт "амаль не існуе якасных беларускамоўных галасавых датасэтаў", то наогул хлусня, на адным толькі Common Voice на дадзены момант запісана і праверана 1800 гадзін ад больш як 8000 чалавек. Такія вялікія датасэты ў свабодным доступе мала для якой мовы існуюць.

Now reading

"Turn off the repeaters, remove them and show us that they are removed." Zelenskyy explained why he issued an ultimatum and what Minsk should do. 27

"Turn off the repeaters, remove them and show us that they are removed." Zelenskyy explained why he issued an ultimatum and what Minsk should do.

All news →
All news

US denies Strait of Hormuz is closed again 1

Where in Europe is the cleanest bathing water? New beach ranking published

Lithuania deemed Skindzerau, a figure in Nasha Niva's investigation, a threat to national security and plans to revoke his refugee status 18

US Treasury Secretary called Zelenskyy "Mr. Bean on crack" 31

Belarusians share photos of storks visiting their yards 2

Netherlands defeated Sweden 1

Now Belarusian actors will also be Russified — through VGIK 14

Kuchma and Yushchenko also refused Polish orders 14

Unusual bollards installed at a pedestrian crossing in Vitebsk. How do they work? 4

больш чытаных навін
больш лайканых навін

"Turn off the repeaters, remove them and show us that they are removed." Zelenskyy explained why he issued an ultimatum and what Minsk should do. 27

"Turn off the repeaters, remove them and show us that they are removed." Zelenskyy explained why he issued an ultimatum and what Minsk should do.

Main
All news →

Заўвага:

 

 

 

 

Закрыць Паведаміць