Kurdish Grapheme-to-Phoneme Convertor

Background Study:

Automated Grapheme-to-Phoneme Conversion for Central Kurdish based on Optimality Theory


    title={Automated grapheme-to-phoneme conversion for Central Kurdish based on optimality theory},
    author={Mahmudi, Aso and Veisi, Hadi},
    journal={Computer Speech \& Language},

Kurdish G2P Dataset: Github DOI


The writing system of Central Kurdish features three cases in which there is no one-to-one mapping between the orthographical letters and the phonemes of the language. Consequently, the written words including these cases may be pronounced in multiple ways. The process of finding the correct pronunciation of written words is called Grapheme-to-Phoneme (G2P) conversion and is a key step in natural language processing tasks such as speech synthesis. As Central Kurdish is a low-resourced language, we present a G2P conversion method based on the phonological rules of the language, rather than pronunciation dictionaries and data-driven learning methods. After reviewing the phonology and alphabet of the language through the framework of Optimality Theory, we generate all possible pronunciations. Then, by specifying and applying ranked constraints, we eliminate undesirable candidates so as to keep only one well-formed pronunciation per word. The evaluation of our proposed method on two datasets resulted in 0.75% of overall Phoneme Error Rate (PER) and achieved 94.71% precision in the detection of the short vowel /i/ and 100% of accuracy in the conversion of the letters “ی” and “و”. Analyzing these results suggests that there is no need for additional new letters in the current orthographic system of Central Kurdish. This approach also enables us to have a ranked suggestion list for the manual checking of the few unresolved ambiguous situations.
Keywords: Grapheme-to-phoneme conversion, Optimality Theory, Central Kurdish, Kurdish phonology