Projects / Programmes source: ARIS

Digital humanities: resources, tools and methods

January 1, 2022 - December 31, 2027
Research activity

Code Science Field Subfield
6.05.00  Humanities  Linguistics   
2.07.00  Engineering sciences and technologies  Computer science and informatics   

Code Science Field
6.02  Humanities  Languages and Literature 
1.02  Natural Sciences  Computer and information sciences 
Digital Humanities, Digital Editions, Historical Collections, Oral History, Corpus Linguistics, Machine Learning, Mixed Methods, Language Technologies, Speech Technologies, Computer Vision
Evaluation (rules)
source: COBISS
Data for the last 5 years (citations for the last 10 years) on June 11, 2024; A3 for period 2018-2022
Data for ARIS tenders ( 04.04.2019 – Programme tender, archive )
Database Linked records Citations Pure citations Average pure citations
WoS  99  559  490  4.95 
Scopus  211  1,464  1,268  6.01 
Researchers (17)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  53681  PhD Ksenija Bogetić Pejović  Linguistics  Researcher  2023 - 2024  76 
2.  55593  David Bordon  Linguistics  Junior researcher  2022 - 2024  13 
3.  57130  Filip Dobranić  Linguistics  Researcher  2022 - 2024 
4.  57652  Bojan Evkoski  Linguistics  Researcher  2023 
5.  26294  PhD Darja Fišer  Linguistics  Head  2022 - 2024  419 
6.  14681  PhD Vojko Gorjanc  Linguistics  Researcher  2022 - 2024  480 
7.  16131  PhD Alenka Kavčič  Computer science and informatics  Researcher  2022 - 2024  138 
8.  56952  PhD Ganna Kryvenko  Linguistics  Researcher  2022 - 2024  53 
9.  50983  PhD Jakob Lenardič  Linguistics  Researcher  2022 - 2024  62 
10.  15677  PhD Matija Marolt  Computer science and informatics  Researcher  2022 - 2024  396 
11.  53255  Kristina Pahor de Maiti  Linguistics  Researcher  2022 - 2024  31 
12.  16350  PhD Andrej Pančur  Historiography  Researcher  2022 - 2024  264 
13.  08411  PhD Jurij Perovšek  Historiography  Retired researcher  2022  841 
14.  35071  PhD Matevž Pesek  Computer science and informatics  Researcher  2022 - 2024  147 
15.  38461  PhD Ajda Pretnar Žagar  Computer science and informatics  Researcher  2022 - 2024  47 
16.  56554  Jure Skubic  Sociology  Researcher  2022 - 2023  23 
17.  17106  PhD Mojca Šorn  Historiography  Researcher  2022 - 2024  179 
Organisations (3)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  0501  Institute for Contemporary History  Ljubljana  5057116000  5,306 
2.  0581  University of Ljubljana, Faculty of Arts  Ljubljana  1627058  98,457 
3.  1539  University of Ljubljana, Faculty of Computer and Information Science  Ljubljana  1627023  16,564 
Digitisation of cultural heritage in Slovenia falls below the European average. With the exception of corpus linguistics and literary history where richly annotated corpora and critical editions are plentiful and internationally renowned, datalization work on DH collections has been highly fragmented and largely incompatible due to different encoding standards used. This clearly shows that the requirements of supporting state-of-the art data-driven DH research has not yet been met in Slovenia which significantly hinders research in the national context but also its international visibility. The research programme addresses three broad research problems: 1) the development and integration of advanced workflows for creating and publishing reliable, interpretable, interoperable and richly annotated complex digital editions; 2) the development and promotion of novel interdisciplinary quantitative and qualitative methods for Slovenian DH; and 3) the development and evaluation of advanced technologies for the processing, enrichment and visualization of historical heterogeneous multilingual and multimodal data. The scientific framework of the proposed research programme is highly interdisciplinary but rooted in corpus and computational linguistics and machine learning of heterogeneous structured and unstructured collections from Slovenian contemporary history, periodical studies, political studies and anthropology that are both mono- and multilingual, and contain textual, speech and image data and metadata. Developing and integrating these methods into Slovene DH is crucial because of their importance and potential to contribute to a comprehensive understanding of past and present cultural phenomena in the European context. The programme will also contribute novel technologies to support digitization of DH research data and cultural heritage. The programme will perform the complete cycle of DH research activities: capture, organization and storage; enrichment; analysis, visualization, interpretation, and dissemination of the results. While the methods and technologies developed within the programme are not limited to specific data collections, the programme will contribute 6 major new open-source datasets containing text, speech and images. It will improve methods for digitizing complex historical documents, develop novel methods for transcribing oral history recordings, and engage in development of internationally novel methods for processing images in historical documents. The programme will adapt language and text enrichment technologies to historical and dialectal language as well as develop support for exploring multilingual documents important for investigations of Slovenian history. The programme will develop advanced mixed methods for network, geospatial and temporal analysis of DH data.
Significance for science
The impact of the proposed digital scholarship research programme will be manifold. First and foremost, it will engage in traditional humanities research in a significant new way but will also generate novel research questions, methodological approaches, findings and theoretical paradigms at the interface between language-driven analysis of historical multimodal data and data science. Beyond ensuring preservation and accessibility of cultural heritage data, it will enable the enrichment of DH data and metadata using state-of-the-art data science methods. The programme will also address the exploitation of the contents of the created digital resources and the adaptation and development of appropriate language technologies to search and retrieve information from Big Data of the Past. Considering the fact that the members of the research group are already involved in international DH research, especially in DH text and speech processing, we expect that the results will achieve international impact and recognition and be relevant for other languages. The proposed programme will solve open and topical problems with long-term goals which have a high scientific and technological potential. It will promote critical, transparent and reproducible research in humanities, covering data, code, workflows, methods and documentation. Successful implementation of the presented challenges requires strong cooperation between humanities and data science experts. By engaging in cross-domain knowledge-transfer and promoting an interdisciplinary approach focusing on new digital methods and tools for digital humanities research and teaching, it will have a lasting disciplinary impact.
Significance for the country
The proposed programme will importantly contribute to the infrastructure for cultural heritage, which is also addressed by the following objectives of the Resolution of the national programme for language policy 2021-2025: digitization, description, preservation and open access of cultural heritage Slovene language materials. The resources, tools and methods developed in the programme will be woven into humanities and data science curricula, which will develop interdisciplinary profiles of human data analysts that are in great demand in the public sector but also in the industry. The results of the project will also enable cultural innovation and algorithmic creativity in the field of representation and mediation of the arts and humanities. With the open-source historical digital collections, language models, training sets and toolchains that will enable automated handling of complex documents, audio archives and images containing non-canonical Slovene, the proposed programme will close the gap on processing of non-canonical (historical, dialectal, multilingual) language data that is important in today's information society and economy. The urgent need for digitisation of all aspects of the society and economy has been recognized by the newly established national Strategic council for digitization which shares many objectives of the proposed research programme.
Views history