Loading...
Projects / Programmes source: ARIS

Basic Research for the Development of Spoken Language Resources and Speech Technologies for the Slovenian Language

Research activity

Code Science Field Subfield
6.05.00  Humanities  Linguistics   

Code Science Field
6.02  Humanities  Languages and Literature 
Keywords
spoken language resources, spoken language, research of speech, language technologies, speech technologies, corpus lingustics, lexicography
Evaluation (metodology)
source: COBISS
Points
21,287.51
A''
3,945.28
A'
8,406.28
A1/2
12,248.96
CI10
8,126
CImax
531
h10
41
A1
73.16
A3
23.43
Data for the last 5 years (citations for the last 10 years) on March 23, 2026; Data for score A3 calculation refer to period 2020-2024
Data for ARIS tenders ( 04.04.2019 – Programme tender, archive )
Database Linked records Citations Pure citations Average pure citations
WoS  391  6,448  5,941  15.19 
Scopus  717  11,242  10,064  14.04 
Organisations (9) , Researchers (42)
0796  University of Maribor, Faculty of Electrical Engineering and Computer Science
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  53072  Špela Antloga  Linguistics  Researcher  2022 - 2026  82 
2.  54519  MSc Andreja Bizjak  Linguistics  Researcher  2022 - 2026  34 
3.  33286  PhD Gregor Donaj  Telecommunications  Researcher  2022 - 2026  94 
4.  51357  Simona Majhenič  Linguistics  Researcher  2022 - 2024  47 
5.  20044  PhD Franc Marušič  Linguistics  Researcher  2025  316 
6.  50218  PhD Grega Močnik  Telecommunications  Researcher  2022 - 2026  52 
7.  18168  PhD Mirjam Sepesy Maučec  Telecommunications  Researcher  2022 - 2026  268 
8.  23838  PhD Darinka Verdonik  Linguistics  Head  2022 - 2026  231 
9.  60453  Jasna Vidinić  Linguistics  Researcher  2025 - 2026 
10.  20032  PhD Andrej Žgank  Telecommunications  Researcher  2022 - 2026  254 
0106  Jožef Stefan Institute
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  05023  PhD Tomaž Erjavec  Linguistics  Researcher  2022 - 2026  708 
2.  55962  Taja Kuzman Pungeršek  Linguistics  Researcher  2022 - 2026  119 
3.  36871  PhD Nikola Ljubešić  Linguistics  Researcher  2022 - 2026  491 
4.  56348  Peter Rupnik    Technical associate  2022 - 2026  104 
0581  University of Ljubljana, Faculty of Arts
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  27674  PhD Špela Arhar Holdt  Linguistics  Researcher  2022 - 2026  310 
2.  36914  PhD Jaka Čibej  Linguistics  Researcher  2022 - 2026  225 
3.  36491  PhD Kaja Dobrovoljc  Linguistics  Researcher  2024 - 2026  215 
4.  16313  PhD Apolonija Gantar  Linguistics  Researcher  2022 - 2026  241 
5.  33796  PhD Iztok Kosem  Linguistics  Researcher  2022 - 2026  370 
6.  26166  PhD Simon Krek  Linguistics  Researcher  2022 - 2026  432 
7.  57100  Nejc Robida  Linguistics  Researcher  2022 - 2026  33 
8.  05799  PhD Vera Smole  Linguistics  Researcher  2022 - 2026  542 
9.  19059  PhD Mojca Smolej  Humanities  Researcher  2022 - 2026  398 
10.  11651  PhD Marko Stabej  Linguistics  Researcher  2022 - 2026  669 
0618  Research Centre of the Slovenian Academy of Sciences and Arts
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  15689  PhD Helena Dobrovoljc  Linguistics  Researcher  2022 - 2026  425 
2.  32205  PhD Januška Gostenčnik  Linguistics  Researcher  2022 - 2026  149 
3.  37555  PhD Janoš Ježovnik  Linguistics  Researcher  2022 - 2026  134 
4.  10288  PhD Carmen Kenda-Jež  Linguistics  Researcher  2022 - 2026  324 
5.  34592  PhD Tanja Mirtič  Linguistics  Researcher  2023 - 2026  104 
6.  10353  PhD Jožica Škofic  Linguistics  Researcher  2022 - 2026  720 
1538  University of Ljubljana, Faculty of Electrical Engineering
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  11805  PhD Simon Dobrišek  Computer science and informatics  Researcher  2022 - 2026  297 
2.  31985  PhD Janez Križaj  Systems and cybernetics  Researcher  2022 - 2026  51 
3.  21310  PhD Janez Perš  Systems and cybernetics  Researcher  2025 - 2026  261 
1539  University of Ljubljana, Faculty of Computer and Information Science
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  16154  PhD Marko Bajec  Computer science and informatics  Researcher  2022 - 2026  512 
2.  21404  PhD Iztok Lebar Bajec  Computer science and informatics  Researcher  2022 - 2024  203 
3.  59210  PhD Melanija Vezočnik  Computer science and informatics  Researcher  2025  12 
1822  University of Primorska, Faculty of Humanities
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  32126  PhD Klara Šumenjak  Linguistics  Researcher  2022 - 2026  78 
2.  27530  PhD Jana Volk  Linguistics  Researcher  2022 - 2026  142 
1986  ALPINEON R & D
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  12000  PhD Jerneja Žganec Gros  Computer science and informatics  Researcher  2022 - 2026  293 
2565  University of Maribor Faculty of Arts
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  12507  PhD Mihaela Koletnik  Linguistics  Researcher  2022 - 2026  560 
2.  20763  PhD Mira Krajnc Ivič  Humanities  Researcher  2022 - 2026  254 
3.  18502  PhD Melita Zemljak Jontes  Linguistics  Researcher  2022 - 2026  521 
Abstract
Spoken language resources are scarce and underdeveloped compared to the written language resources, especially for small languages like Slovenian. To be able to perform basic research on spoken language or speech technologies with significant scientific impact, the problem of scarce spoken language resources needs to be addressed first. However, development of spoken language resources is not only a matter of applied data collection but opens up a number of basic research questions. These research questions will be addressed in this project, with focus on the Slovenian language. This is a big project proposal and is divided into 4 Work Packages (WPs), each including 2-4 tasks, 14 tasks all together. 4 tasks are solely linguistic, 2 tasks are solely technical, while the majority of the tasks (8) are interdisciplinary. The specific objectives of WPs and their corresponding tasks are as follows: WP1 ACQUIRING RECORDINGS OF SPEECH - Objective 1.1: Analyse the needs for spoken language resources in different linguistic and technical disciplines. - Objective 1.2 Analyse advantages and disadvantages of different recording techniques, with particular attention to crowdsourcing as time- and money-efficient technique. - Objective 1.3 Evaluation of the efficiency of speech recognition models trained on domain specific speech data obtained with low-cost unsupervised or semi-supervised techniques compared to general domain data obtained with high-cost techniques. - Objective 1.4 Identify speech/speaker tasks that need further investment into labelled data for Slovene speech recognition. WP2: DIALECT VARIATION - Objective 2.1 Geolinguistic analysis of selected phonetic features, creation of diachronic phonetic maps of the non-standard phonetic inventory, creation of a proposal for the standardisation of Slovenian dialect transcription and its conversion into IPA (and SAMPA). - Objective 2.2 Creation of synthetic synchronic phonetic maps to define the areas of non-standard phonemes in Slovenian dialects. Making recommendations to improve pronunciation-based transcription for the Slovenian spoken corpus. - Objective 2.3 The creation and testing of diasystemic contrastive Tables of phonemes (dialect vs. standard). Establishement of transcription standards for phonetic transcription for spoken corpora - Objective 2.4 Definition and evaluation of an optimal Slovenian phoneme set for Speech Recognition, taking into account newly defined dialect phonemes, similarity metrics and various available speech data. WP3: SPEECH SEGMENTATION AND ANNOTATION - Objective 3.1 Evaluation of the existing speech segments/utterances in Slovene spoken language resources regarding their appropriateness as the basic units for analysis of speech on syntactic and semantic level. - Objective 3.2 The analysis of different types of disfluencies in spoken text, creation of a disfluencies training corpus and experiments for automatic annotation of disfluencies. - Objective 3.3 The development of a linguistic processing pipeline based on speech and transcription data (both manual and automatic) and linguistic annotation of the GOS 2.0 corpus. - Objective 3.4 Evaluation of the GORDAN dialogue act annotation scheme, its adjustment to the ISO 24617-2 Standard and creation of the training corpus with dialogue acts` annotations. WP4: SPOKEN LEXIS - Objective 4.1 The evaluation of existing information on spoken Slovene in the Sloleks lexicon, and the creation of linguistically sound guidelines for the inclusion of (non-standard) spoken data in Sloleks, comparable with machine-readable lexicons for other languages. - Objective 4.2 Analysis of existing semantic information included in lexicographic resources for Slovene from the perspective of spoken Slovene, together with the analysis of the complementary spoken corpus data, and exploration of the principles of inclusion of the findings in lexicographic resources.
Views history
Favourite