Modeling of spontaneous speech and highly inflectional languages for voice driven telecommunication services

Research activity

Code	Science	Field	Subfield
2.08.00	Engineering sciences and technologies	Telecommunications

Code	Science	Field
T180	Technological sciences	Telecommunication engineering

Keywords

telecommunication services, user interface, automatic speech recognition, modeling of spontaneous speech, modeling of highly inflectional languages

Evaluation (metodology)

Evaluation of bibliographic research performance indicators according to ARIS methodology

Citations Citations for bibliographic records in COBIB.SI that are linked to records in citation databases

Organisations (1) , Researchers (5)

0796 University of Maribor, Faculty of Electrical Engineering and Computer Science

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	27896	PhD Matej Grašič	Telecommunications	Young researcher	2007 - 2008	59
2.	06821	PhD Zdravko Kačič	Telecommunications	Researcher	2007 - 2008	721
3.	21304	PhD Tomaž Rotovnik	Electronic components and technologies	Researcher	2007 - 2008	121
4.	18168	PhD Mirjam Sepesy Maučec	Telecommunications	Researcher	2007 - 2008	270
5.	20032	PhD Andrej Žgank	Telecommunications	Head	2007 - 2008	256

Abstract

Development in the area of communication networks also influences the constant demand to improve communication services. One of the most important characteristics of state-of-the-art communication service is usage of user friendly interfaces. When voice-driven interface is employed in the service it is reasonable to use automatic spontaneous speech recognition. This is one of the most challenging tasks in the area of language technologies – the word error rates are still relatively high. The proposed project will focus on research in the area of modeling spontaneous speech. The major disfluencies groups included in the modeling are: fillers, word repetitions, sentence restarts and truncated words. The project’s main goal is to improve speech recognition results. The automatic continuous speech recognition of highly inflectional languages is a very complex task, due to the high ratio of out of vocabulary words. In the second part of the project, the focus will be given on improving the acoustic modeling for highly inflectional languages based on subword units stem – ending. The main goal is again to improve the speech recognition performance. The last part of the project will focus on integrating proposed methods into a single system. An advanced communication service demonstrator will be also implemented.

Significance for science

The results of this project significantly contributed to two research areas of automatic speech recognition, which is also being used for development of telecommunication services. A statistically significant improvement of spontaneous speech recognition performance was achieved, especially for modeling filled pauses and onomatopoeas. The proposed method was evaluated using Slovenian spoken language resources, but the method is due to the usage of data-driven metrics for modeling, language independent. Therefore the proposed method could be also used for modeling spontaneous speech in other world’s languages. Automatic speech recognition also takes its role in other areas of information-communication technologies (e.g.: speech-to-speech translation, user interfaces,…), consequently influence on these areas could be also anticipated. The analysis of modeling spontaneous speech also showed that modeling of truncated words presents a highly complex problem. Therefore additional focus should be given to this topic in the future, which is also supported with recent publications on international conferences.

The results on combining spoken language resources showed that it is possible to significantly improve the acoustic modeling for spontaneous speech and modeling of highly inflectional languages using such approach. This was especially noticeable in the case, when imperfect transcriptions were used for training. The guidelines on combining spoken language resources are very important for languages with less well developed spoken language resources. The majority of present-day languages belong into this group, as is the development of spoken language resources connected with immense expenses and it is also very time consuming.

The performed research in the area of modeling highly inflectional languages reduced the gap of these languages behind other western European languages in the area of speech recognition. The research results contribute to increased interest in the area of subword unit decoding, which is still one of the most important methods for modeling highly inflectional languages.

The achieved project’s results significantly contributed to the increased performance of spontaneous speech recognition. The Slovenian experimental system could be for example used for indexing various audio-visual contents and in such a way increase the availability of information for further processing. An important interdisciplinary result of this project is also the achievement in the area of discourse studies. These results were presented in an A1” publication according to the ARRS methodology.

Significance for the country

The methods proposed in the project are language independent, but are very suitable for modeling highly inflectional languages as is Slovenian language. The project’s results are very significant for the preservation of Slovenian language in the era of digitalization. The support for spontaneous speech recognition in native language is necessary to increase the interest for the development of various voice driven information-communication services in Slovenia. This results in the reduction of digital divide, as the users can easily access various information sources, using user friendly interfaces.

The results on combining spoken language resources for spontaneous speech recognition could be applied to future development of new Slovenian speech databases. The anticipated trends are in increased importance of developing less expensive speech databases with imperfect transcriptions. This is of immense importance, if we take into account the increased penetration rate of broad-band internet access and the resulting availability of various audio-visual content on the internet (e.g. Parliamentary debates). For those audio-visual content, which is available without transcriptions, the imperfect transcription can be generated manually or automatically using automatic speech recognition system in an unsupervised training mode.

The results of experimental system pointed out the possibility of transfer of knowledge in a scope of an applied project in Slovenian telecommunication industry, specifically in the area of audio-video content indexing. Such system could be very valuable for different content providers.

From the point of view of care for Slovenian language are very important interdisciplinary results on discourse markers. These results were the first such results for Slovenian language. The following research was carried out: analysis of influence of human annotators, the analysis of discourse markers frequency in different genres and the analysis of influence of context on discourse markers in spontaneous speech. These results are important for future research work on the topic of discourse studies.

Most important scientific results

Final report

Most important socioeconomically and culturally relevant results

Final report

Modeling of spontaneous speech and highly inflectional languages for voice driven telecommunication services

Views history

Favourite

Modeling of spontaneous speech and highly inflectional languages for voice driven telecommunication services

FRASCATI classification

CERIF classification

Confirmation required

Views history

Favourite