AsoSoft Speech Corpus

Speech Recognition for Kurdish

AsoSoft is the first company to work in the field of Speech Recognition for Kurdish language. We develop Speech Recognition, Speaker Recognition and Speech Command software and tools for Kurdish language through Artificial Intelligence and Signal Processing. Kurdish language speech data and its related resources like tags are of most important language resources which are required for NLP research and applications such as automatic speech recognition, speaker recognition, etc. In this project, speech data for Kurdish language (Central Kurdish) was designed and collected so that it could be used in automatic speech recognition, speaker recognition, phonology researches, dialect analysis, etc. So far, approximately 30 hours of speech has been recorded and transcribed in order to produce this corpus.


A sample of AsoSoft speech corpus could be downloaded via the AsoSoft's repository on GitHub. A portion of AsoSoft Speech Corpus will be made accessible to researchers for research, non-commercial use. Please contact us via "h.veisi[at]" to access it.


If you are using our text corpus cite us.
    title={Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon},
    author={Veisi, Hadi and Hosseini, Hawre and Mohammadamini, Mohammad and Fathy, Wirya and Mahmudi, Aso},
    journal={arXiv preprint arXiv:2102.07412},