AsoSoft Research Group

AsoSoft is a research and business group working on natural language processing technologies for Kurdish language. Kurdish language is a member of the Indo-Iranian branch of Indo-European languages which is spoken by more than 40 million people in western Asia mainly in Iraq, Turkey, Iran, Syria, Armenia, and Azerbaijan. Also, Kurdish language has a variety of dialects. Despite this diversity, Kurdish belongs to low-resourced languages especially for computational linguistics purposes. AsoSoft is doing research and developing computational linguistics resources for Kurdish language and aims to eventually develop NLP technologies and applications. For those purposes, AsoSoft activities focus on, but are not limited to, the following:

  1. Linguistic resources and data
    • Text corpora
    • Speech corpora
    • Computational grammars
    • Lexicons and dictionaries
    • Parallel corpora
    • WordNet and TreeBanks
  2. Basic Tools
    • Tokenizer
    • POS Tagger
    • Parsers
  3. Products
    • Kurdish Automatic Speech Recognition
    • Kurdish Text-to-Speech
    • Spell checker

Knowledge Enterprise

AsoSoft is a knowledge enterprise which is founded and managed by experts in the domains of Artificial Intelligence, Computer Engineering and Computational Linguistics.

Language Data and Resources

Apart from developing applications for Kurdish language, AsoSoft is working to provide language data and resources for Computer Speech and Language Processing of Kurdish language. Our activities to that end include, but are not limited to, providing text corpus, speech corpus, WordNet, lexicon and parallel corpora.

Speech Recognition

AsoSoft is the first company to work in the field of Speech Recognition for Kurdish language. We develop Speech Recognition, Speaker Recognition and Speech Command software and tools for Kurdish language through Artificial Intelligence and Signal Processing. Kurdish language speech data and its related resources like tags are of most important language resources which are required for NLP research and applications such as automatic speech recognition, speaker recognition, etc. In this project, speech data for Kurdish language (Central Kurdish) was designed and collected so that it could be used in automatic speech recognition, speaker recognition, phonology researches, dialect analysis, etc. So far, approximately 30 hours of speech has been recorded and transcribed in order to produce this corpus.