Global Voices

A New Audio Uploading Tool for Crowdsourced Wiktionary Project in Odia Language

One Global Voices contributor who's passionate about the Odia language has created an open-source solution for recording and uploading words under open licenses for projects like Wiktionary.

A home recording setup for the Kathabhidhana project for Wiktionary. Image via Subhashish Panigrahi from Wikimedia Commons. CC BY-SA 4.0

Wiktionary, Wikipedia's multilingual sister project, promises a great deal. At present, there are not many open-licensed audio recordings that you can hear or download — especially if your mother tongue is not one of the major languages. Wiktionary is already available in multiple languages and in addition to the definitions of the words, many phonetic notations — at least in terms of the International Phonetic Alphabet (IPA) — are available. Now, an Odia-language community project is helping to simplify the process of volunteer contributions to the Odia Wiktionary project.

Kathabhidhana, a community project led by Global Voices contributor and Odia Wikipedian Subhashish Panigrahi, is an open-source solution for recording large chunks of words. It then uploads them under open licenses so that they can be useful for projects like Wiktionary.

Odia, one of the state languages in India, is a Indo-Aryan language that is spoken mostly in eastern India by around 40 million native speakers. With over 5,000 years of literary heritage, it has been recognized as one of the oldest South Asian languages, and has been given the status of a classical language by the Indian government.

But thanks to the use of non-Unicode-based typing systems, the language's online presence is still lagging behind. To address these issues, a bunch of character encoding converters that change typed text to Unicode using various non-Unicode encoding systems, are incorporated in Odia Wikipedia; it now has more than 12,000 entries. The Odia Wiktionary, on the other hand, as a free, online-based and completely crowdsourced dictionary in the Odia language, is trying to bridge the gap.

The project draws its inspiration largely from other open-source software created by Shrinivasan T, who used Python programming language to automate and simplify the process. He posted this tutorial on YouTube:

Panigrahi was inspired to do the Kathabhidhana project because the existing method was a cumbersome process: you have to pronounce and record a word, then export it in Ogg Vorbis format to your Wikimedia Commons account, which is a central repository of media files for all Wikimedia projects. Once uploaded, the entry is added to the Wiktionary project. Apart from manually recording pronunciation, there is also an open-source text-to-speech project called Dhvani that works for most Indian languages.

In contrast, having audio recordings of words in Wiktionary helps non-native speakers — as well as people with visual disabilities — listen to the pronunciation of different words. The word library can also be used for several Natural Language Processing projects, like building text-to-speech and speech-to-speech engines.

You can download a copy of Kathabhidhana and find all the audio recordings made using this software.

Originally published in Global Voices.

More from Global Voices

Global Voices5 min read
Elon Musk, Superhero Of The Latin American Right
Musk praises Milei, Bukele and Bolsonaro, while picking fights with Chavismo and Lula. Along with his business ventures, he is increasingly active in politics.
Global Voices4 min readWorld
Don't Give Me No Jazz: What Is Happening With Jazz Festivals In Russia
Some projects ceased to exist after the start of the Russian war with Ukraine, while others continue to be held, albeit often dependent on local authorities and state grants
Global Voices5 min read
Forging Bonds: People-to-People Diplomacy Between Taiwan And Somaliland
As traditional diplomatic norms face challenges, Taiwan and Somaliland, two states not recognized by the UN and a majority of countries, are pioneering a unique approach toward international relations.

Related Books & Audiobooks