Projects / Programmes source: ARIS

Language Resources and Technologies for Slovene

January 1, 2019 - December 31, 2024
Research activity

Code Science Field Subfield
6.05.00  Humanities  Linguistics   
2.07.00  Engineering sciences and technologies  Computer science and informatics   

Code Science Field
H350  Humanities  Linguistics 

Code Science Field
6.02  Humanities  Languages and Literature 
1.02  Natural Sciences  Computer and information sciences 
Slovene language, computational linguistics, corpus linguistics, language resources, language technologies, reading literacy, machine learning, data mining, data science
Evaluation (rules)
source: COBISS
Data for the last 5 years (citations for the last 10 years) on February 25, 2024; A3 for period 2018-2022
Data for ARIS tenders ( 04.04.2019 – Programme tender , archive )
Database Linked records Citations Pure citations Average pure citations
WoS  131  3,578  3,413  26.05 
Scopus  192  5,259  4,953  25.8 
Researchers (14)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  27674  PhD Špela Arhar Holdt  Linguistics  Researcher  2019 - 2024  227 
2.  36914  PhD Jaka Čibej  Linguistics  Researcher  2019 - 2024  151 
3.  36491  PhD Kaja Dobrovoljc  Linguistics  Researcher  2019 - 2024  142 
4.  53628  Magdalena Gapsa  Linguistics  Junior researcher  2019 - 2024  12 
5.  55352  Matic Kavaš    Technical associate  2021 
6.  55754  Matej Klemen  Computer science and informatics  Junior researcher  2021 - 2024  14 
7.  33796  PhD Iztok Kosem  Linguistics  Researcher  2019 - 2024  296 
8.  26166  PhD Simon Krek  Linguistics  Head  2019 - 2024  358 
9.  37653  PhD Cyprian Adam Laskowski  Linguistics  Researcher  2019 - 2024  35 
10.  36871  PhD Nikola Ljubešić  Linguistics  Researcher  2019 - 2024  392 
11.  21612  PhD Karmen Pižorn  Linguistics  Researcher  2019 - 2024  334 
12.  15295  PhD Marko Robnik Šikonja  Computer science and informatics  Researcher  2019 - 2024  417 
13.  58381  Domen Vreš  Computer science and informatics  Technical associate  2023 - 2024 
14.  56007  Aleš Žagar  Computer science and informatics  Technical associate  2021 - 2024  26 
Organisations (3)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  0581  University of Ljubljana, Faculty of Arts  Ljubljana  1627058  97,066 
2.  0588  University of Ljubljana, Faculty of Education  Ljubljana  1627082  30,716 
3.  1539  University of Ljubljana, Faculty of Computer and Information Science  Ljubljana  1627023  16,002 
The main research topic of the new programme is modern Slovene, considered especially from the point of view of rapid digitization of languages and new developments in ICT. Without language resources and technologies comparable with those available for other languages, the Slovene language will be limited in its participation in the new digital reality. The objective of the programme is to conduct research into the specifics of Slovene and enable the development of resources and technologies according to international standards, and to incorporate research results into long-term development of basic language resources in order to facilitate the development of language technologies for Slovene. In addition, the programme will research language needs of speakers of Slovene, particularly with the aim of improving reading literacy. The proposed programme is interdisciplinary - in addition to linguistics as the primary field it includes computer and information sciences (language technologies) and education (literacy). The programme’s wider organisational framework is the Centre for Language Resources and Technologies at the University of Ljubljana (CJVT UL) which includes the three faculties that will conduct the new programme and offer compatible teaching programmes, which ensures the transfer of research results into education. The programme is closely connected to the activity of CJVT as part of the Network of research infrastructure centres of the University of Ljubljana, which provides the necessary infrastructural basis for research. The programme will be conducted by a team with more than 10 years of experience in research on the programme topics, and demonstrating international excellence in their fields of expertise. Research will cover five general areas integrating interlinked resources and technologies into a unified research programme: language description, standardization, language technologies, terminology and multilinguality. These areas cover all levels of description (text linguistics, semantics, syntax, morphology, phonology), focusing on holistic exploration of language phenomena. The research is empirical, based on real language data found in contemporary corpora and similar resources. In the fields of terminology and multilinguality the programme also covers research into the contact between Slovene and other languages, in order to facilitate the development of multilingual resources and technologies (e.g. for machine translation). Research methodology is rooted in state-of-the-art methods of machine learning and data mining, used for other languages under the theoretical framework of computational and corpus linguistics. In literacy research we also use other methods of investigating productive and receptive language use (testing written production of target user groups, surveys). The research topics of the programme are in line with the aims of the current Resolution on the National Programme for Language Policy (2014-2018), as well as the Action Plan for Language Infrastructure and for Education (2015).
Significance for science
The impact of the research results will be directly and indirectly visible mainly in the field of language infrastructure for Slovene. It is anticipated that the programme will enable successful participation of Slovene in state-of-the-art technological trends, which demand automatic language processing for different applications, from virtual assistants (e.g. Siri, Cortana, Alexa), machine translation systems, to artificial intelligence. In these applications, Slovene will need to be on the same level as languages with considerably higher numbers of speakers; this cannot be achieved without research focused on specific characteristics of Slovene in terms of language technology needs. Considering the fact that the members of the research group are already involved in international research, especially in lexicography and machine learning, we expect that the results will achieve international impact and recognition and be relevant for other languages. The results of research into literacy will have an important impact on all fields where individual’s ability to participate and function in democratic society requires appropriate delivery, understanding and interpretation of language information. The most immediate impact will be made on the quality of literacy acquisition in education: on the one hand, the results will provide relevant resources and materials for language teachers, and on the other hand the results will offer students the access to individualised content, developed by using artificial intelligence and cognitive modelling techniques. In addition, the results will facilitate the improvement of national language testing, international literacy research and diagnostic measures of specific learning difficulties in reading and writing; for the identified problems solutions will be offered which can be applied in teaching practice. From the data mining point of view the programme will develop methodology and tools that allow integration of different information resources, with emphasis on textual information, and their exploitation with automatic machine learning methods. The proposed programme will improve current state-of-the-art approaches in the area of heterogeneous data networks and allow their application in the areas of knowledge databases, corpora, and linked open data. It will develop new machine learning algorithms for learning deep neural networks and new feature subset selection algorithms, which will both be generally applicable and adapted to specific language technologies and for Slovene. The proposed programme solves open and topical problems with long term goals which have high scientific and technological potential. The successful implementation of presented challenges requires strong cooperation between language and data science experts. The programme has a potential to introduce new theoretical and methodological paradigm for addressing language problems and for semantic analysis. The developed methodology as well as its open-source implementations will be of interest to investors from the industry.
Significance for the country
Language technologies are one of the important enabling technologies in today’s information society, they can be found in all applications that require interaction between humans and machines or acquisition of knowledge from large data resources in Slovene. The research conducted in the proposed programme will make an important contribution to the integration of Slovene into products that use these services, e.g. those described in the Smart specialization strategy (smart cities). The interest of the industry is also evidenced by the participation of language technology companies in the Consortium for language resources and technologies led by CLRT. Language resource and technology infrastructure for Slovene is mentioned in several strategic national documents: The National Programme for Culture (pp. 98-103), the Information Society Development Strategy to 2020 – DIGITAL SLOVENIA 2020 (p. 20), partnership agreement between Slovenia and the European Commission for the period 2014-2020 (p. 89), Resolution on the national programme for language policy 2014-2018 etc. The proposed research programme aims to address one of the key challenges of information society, namely the ability to use distributed, heterogeneous resources of information and knowledge, so that scientists and other users can interactively discover and interpret new knowledge. In addition to having objectives which are internationally relevant, the importance of the research programme lies in its aim to facilitate language technology maturity of Slovene, which will help in keeping it scientifically and economically equal to other languages.
Most important scientific results Interim report
Most important socioeconomically and culturally relevant results Interim report
Views history