Projects / Programmes source: ARIS

Modeling of spontaneous speech and highly inflectional languages for voice driven telecommunication services

Research activity

Code Science Field Subfield
2.08.00  Engineering sciences and technologies  Telecommunications   

Code Science Field
T180  Technological sciences  Telecommunication engineering 
telecommunication services, user interface, automatic speech recognition, modeling of spontaneous speech, modeling of highly inflectional languages
Evaluation (rules)
source: COBISS
Researchers (5)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  27896  PhD Matej Grašič  Telecommunications  Junior researcher  2007 - 2008  59 
2.  06821  PhD Zdravko Kačič  Telecommunications  Researcher  2007 - 2008  704 
3.  21304  PhD Tomaž Rotovnik  Electronic components and technologies  Researcher  2007 - 2008  115 
4.  18168  PhD Mirjam Sepesy Maučec  Telecommunications  Researcher  2007 - 2008  249 
5.  20032  PhD Andrej Žgank  Telecommunications  Head  2007 - 2008  241 
Organisations (1)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  0796  University of Maribor, Faculty of Electrical Engineering and Computer Science  Maribor  5089638003  27,463 
Development in the area of communication networks also influences the constant demand to improve communication services. One of the most important characteristics of state-of-the-art communication service is usage of user friendly interfaces. When voice-driven interface is employed in the service it is reasonable to use automatic spontaneous speech recognition. This is one of the most challenging tasks in the area of language technologies – the word error rates are still relatively high. The proposed project will focus on research in the area of modeling spontaneous speech. The major disfluencies groups included in the modeling are: fillers, word repetitions, sentence restarts and truncated words. The project’s main goal is to improve speech recognition results. The automatic continuous speech recognition of highly inflectional languages is a very complex task, due to the high ratio of out of vocabulary words. In the second part of the project, the focus will be given on improving the acoustic modeling for highly inflectional languages based on subword units stem – ending. The main goal is again to improve the speech recognition performance. The last part of the project will focus on integrating proposed methods into a single system. An advanced communication service demonstrator will be also implemented.
Significance for science
The results of this project significantly contributed to two research areas of automatic speech recognition, which is also being used for development of telecommunication services. A statistically significant improvement of spontaneous speech recognition performance was achieved, especially for modeling filled pauses and onomatopoeas. The proposed method was evaluated using Slovenian spoken language resources, but the method is due to the usage of data-driven metrics for modeling, language independent. Therefore the proposed method could be also used for modeling spontaneous speech in other world’s languages. Automatic speech recognition also takes its role in other areas of information-communication technologies (e.g.: speech-to-speech translation, user interfaces,…), consequently influence on these areas could be also anticipated. The analysis of modeling spontaneous speech also showed that modeling of truncated words presents a highly complex problem. Therefore additional focus should be given to this topic in the future, which is also supported with recent publications on international conferences. The results on combining spoken language resources showed that it is possible to significantly improve the acoustic modeling for spontaneous speech and modeling of highly inflectional languages using such approach. This was especially noticeable in the case, when imperfect transcriptions were used for training. The guidelines on combining spoken language resources are very important for languages with less well developed spoken language resources. The majority of present-day languages belong into this group, as is the development of spoken language resources connected with immense expenses and it is also very time consuming. The performed research in the area of modeling highly inflectional languages reduced the gap of these languages behind other western European languages in the area of speech recognition. The research results contribute to increased interest in the area of subword unit decoding, which is still one of the most important methods for modeling highly inflectional languages. The achieved project’s results significantly contributed to the increased performance of spontaneous speech recognition. The Slovenian experimental system could be for example used for indexing various audio-visual contents and in such a way increase the availability of information for further processing. An important interdisciplinary result of this project is also the achievement in the area of discourse studies. These results were presented in an A1” publication according to the ARRS methodology.
Significance for the country
The methods proposed in the project are language independent, but are very suitable for modeling highly inflectional languages as is Slovenian language. The project’s results are very significant for the preservation of Slovenian language in the era of digitalization. The support for spontaneous speech recognition in native language is necessary to increase the interest for the development of various voice driven information-communication services in Slovenia. This results in the reduction of digital divide, as the users can easily access various information sources, using user friendly interfaces. The results on combining spoken language resources for spontaneous speech recognition could be applied to future development of new Slovenian speech databases. The anticipated trends are in increased importance of developing less expensive speech databases with imperfect transcriptions. This is of immense importance, if we take into account the increased penetration rate of broad-band internet access and the resulting availability of various audio-visual content on the internet (e.g. Parliamentary debates). For those audio-visual content, which is available without transcriptions, the imperfect transcription can be generated manually or automatically using automatic speech recognition system in an unsupervised training mode. The results of experimental system pointed out the possibility of transfer of knowledge in a scope of an applied project in Slovenian telecommunication industry, specifically in the area of audio-video content indexing. Such system could be very valuable for different content providers. From the point of view of care for Slovenian language are very important interdisciplinary results on discourse markers. These results were the first such results for Slovenian language. The following research was carried out: analysis of influence of human annotators, the analysis of discourse markers frequency in different genres and the analysis of influence of context on discourse markers in spontaneous speech. These results are important for future research work on the topic of discourse studies.
Most important scientific results Final report, complete report on dLib.si
Most important socioeconomically and culturally relevant results Final report, complete report on dLib.si
Views history