Language Resources and Technologies for Slovene

Code

P6-0411 (A) - included in ARIS records

Head

PhD Simon Krek

Science

Engineering sciences and technologies (5)
Humanities (10)
Other (1)

Reseacher status

Researcher (15)
Junior expert or technical associate (1)

Education

Doctoral degree (9)
Other (7)

Sex

Woman (5)
Man (11)

Status

Employed at RO and RRD (14)
No data on employment in RO (2)

No. of publications

0 (2)
10–99 (6)
100–999 (8)

Projects / Programmes source: ARIS

Language Resources and Technologies for Slovene

Periods

January 1, 2019 - December 31, 2027

Research activity

Code	Science	Field	Subfield
6.05.00	Humanities	Linguistics
2.07.00	Engineering sciences and technologies	Computer science and informatics

Code	Science	Field
H350	Humanities	Linguistics

Code	Science	Field
6.02	Humanities	Languages and Literature
1.02	Natural Sciences	Computer and information sciences

Keywords

Slovene language, computational linguistics, corpus linguistics, language resources, language technologies, reading literacy, machine learning, data mining, data science

Evaluation (metodology)

Evaluation of bibliographic research performance indicators according to ARIS methodology

Points

9,309.33

A''

2,069.65

3,736.16

A1/2

4,885.65

CI10

5,766

CImax

2,152

h10

31.7

10.15

Data for the last 5 years (citations for the last 10 years) on June 24, 2026; Data for score A3 calculation refer to period 2020-2024

Data for ARIS tenders ( 04.04.2019 – Programme tender , archive )

Citations Citations for bibliographic records in COBIB.SI that are linked to records in citation databases

Database	Linked records	Citations	Pure citations	Average pure citations
WoS	164	4,865	4,650	28.35
Scopus	249	7,219	6,796	27.29

Organisations (3) , Researchers (16)

0581 University of Ljubljana, Faculty of Arts

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	27674	PhD Špela Arhar Holdt	Linguistics	Researcher	2019 - 2026	310
2.	36914	PhD Jaka Čibej	Linguistics	Researcher	2019 - 2026	227
3.	36491	PhD Kaja Dobrovoljc	Linguistics	Researcher	2019 - 2026	215
4.	53628	Magdalena Gapsa	Linguistics	Researcher	2019 - 2024	23
5.	33796	PhD Iztok Kosem	Linguistics	Researcher	2019 - 2026	370
6.	26166	PhD Simon Krek	Linguistics	Head	2019 - 2026	433
7.	37653	PhD Cyprian Adam Laskowski	Linguistics	Researcher	2019 - 2026	43
8.	58009	Luka Terčon	Linguistics	Young researcher	2023 - 2026	61

0588 University of Ljubljana, Faculty of Education

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	21612	PhD Karmen Pižorn	Linguistics	Researcher	2019 - 2026	393

1539 University of Ljubljana, Faculty of Computer and Information Science

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	55352	Matic Kavaš		Technical associate	2021	0
2.	55754	Matej Klemen	Computer science and informatics	Researcher	2021 - 2026	24
3.	36871	PhD Nikola Ljubešić	Linguistics	Researcher	2019 - 2026	492
4.	15295	PhD Marko Robnik Šikonja	Computer science and informatics	Researcher	2019 - 2026	511
5.	61230	Živa Štebljaj	Computer science and informatics	Young researcher	2025 - 2026	0
6.	58381	Domen Vreš	Computer science and informatics	Researcher	2023 - 2025	18
7.	56007	Aleš Žagar	Computer science and informatics	Researcher	2021 - 2026	40

Abstract

The main research topic of the new programme is modern Slovene, considered especially from the point of view of rapid digitization of languages and new developments in ICT. Without language resources and technologies comparable with those available for other languages, the Slovene language will be limited in its participation in the new digital reality. The objective of the programme is to conduct research into the specifics of Slovene and enable the development of resources and technologies according to international standards, and to incorporate research results into long-term development of basic language resources in order to facilitate the development of language technologies for Slovene. In addition, the programme will research language needs of speakers of Slovene, particularly with the aim of improving reading literacy. The proposed programme is interdisciplinary - in addition to linguistics as the primary field it includes computer and information sciences (language technologies) and education (literacy). The programme’s wider organisational framework is the Centre for Language Resources and Technologies at the University of Ljubljana (CJVT UL) which includes the three faculties that will conduct the new programme and offer compatible teaching programmes, which ensures the transfer of research results into education. The programme is closely connected to the activity of CJVT as part of the Network of research infrastructure centres of the University of Ljubljana, which provides the necessary infrastructural basis for research. The programme will be conducted by a team with more than 10 years of experience in research on the programme topics, and demonstrating international excellence in their fields of expertise. Research will cover five general areas integrating interlinked resources and technologies into a unified research programme: language description, standardization, language technologies, terminology and multilinguality. These areas cover all levels of description (text linguistics, semantics, syntax, morphology, phonology), focusing on holistic exploration of language phenomena. The research is empirical, based on real language data found in contemporary corpora and similar resources. In the fields of terminology and multilinguality the programme also covers research into the contact between Slovene and other languages, in order to facilitate the development of multilingual resources and technologies (e.g. for machine translation). Research methodology is rooted in state-of-the-art methods of machine learning and data mining, used for other languages under the theoretical framework of computational and corpus linguistics. In literacy research we also use other methods of investigating productive and receptive language use (testing written production of target user groups, surveys). The research topics of the programme are in line with the aims of the current Resolution on the National Programme for Language Policy (2014-2018), as well as the Action Plan for Language Infrastructure and for Education (2015).

Significance for science

The impact of the research results will be directly and indirectly visible mainly in the field of language infrastructure for Slovene. It is anticipated that the programme will enable successful participation of Slovene in state-of-the-art technological trends, which demand automatic language processing for different applications, from virtual assistants (e.g. Siri, Cortana, Alexa), machine translation systems, to artificial intelligence. In these applications, Slovene will need to be on the same level as languages with considerably higher numbers of speakers; this cannot be achieved without research focused on specific characteristics of Slovene in terms of language technology needs. Considering the fact that the members of the research group are already involved in international research, especially in lexicography and machine learning, we expect that the results will achieve international impact and recognition and be relevant for other languages.

The results of research into literacy will have an important impact on all fields where individual’s ability to participate and function in democratic society requires appropriate delivery, understanding and interpretation of language information. The most immediate impact will be made on the quality of literacy acquisition in education: on the one hand, the results will provide relevant resources and materials for language teachers, and on the other hand the results will offer students the access to individualised content, developed by using artificial intelligence and cognitive modelling techniques. In addition, the results will facilitate the improvement of national language testing, international literacy research and diagnostic measures of specific learning difficulties in reading and writing; for the identified problems solutions will be offered which can be applied in teaching practice.

From the data mining point of view the programme will develop methodology and tools that allow integration of different information resources, with emphasis on textual information, and their exploitation with automatic machine learning methods. The proposed programme will improve current state-of-the-art approaches in the area of heterogeneous data networks and allow their application in the areas of knowledge databases, corpora, and linked open data. It will develop new machine learning algorithms for learning deep neural networks and new feature subset selection algorithms, which will both be generally applicable and adapted to specific language technologies and for Slovene.

The proposed programme solves open and topical problems with long term goals which have high scientific and technological potential. The successful implementation of presented challenges requires strong cooperation between language and data science experts. The programme has a potential to introduce new theoretical and methodological paradigm for addressing language problems and for semantic analysis. The developed methodology as well as its open-source implementations will be of interest to investors from the industry.

Significance for the country

Language technologies are one of the important enabling technologies in today’s information society, they can be found in all applications that require interaction between humans and machines or acquisition of knowledge from large data resources in Slovene. The research conducted in the proposed programme will make an important contribution to the integration of Slovene into products that use these services, e.g. those described in the Smart specialization strategy (smart cities). The interest of the industry is also evidenced by the participation of language technology companies in the Consortium for language resources and technologies led by CLRT. Language resource and technology infrastructure for Slovene is mentioned in several strategic national documents: The National Programme for Culture (pp. 98-103), the Information Society Development Strategy to 2020 – DIGITAL SLOVENIA 2020 (p. 20), partnership agreement between Slovenia and the European Commission for the period 2014-2020 (p. 89), Resolution on the national programme for language policy 2014-2018 etc. The proposed research programme aims to address one of the key challenges of information society, namely the ability to use distributed, heterogeneous resources of information and knowledge, so that scientists and other users can interactively discover and interpret new knowledge. In addition to having objectives which are internationally relevant, the importance of the research programme lies in its aim to facilitate language technology maturity of Slovene, which will help in keeping it scientifically and economically equal to other languages.

Most important scientific results

Interim report

Most important socioeconomically and culturally relevant results

Interim report

Language Resources and Technologies for Slovene

Views history

Favourite

Language Resources and Technologies for Slovene

FRASCATI classification

CERIF classification

FORD classification

Confirmation required

Views history

Favourite