Loading...
Projects / Programmes source: ARIS

Collocation as a basis for language description: semantic and temporal perspectives

Research activity

Code Science Field Subfield
6.05.02  Humanities  Linguistics  Theoretical and applied linguistics 

Code Science Field
H350  Humanities  Linguistics 

Code Science Field
6.02  Humanities  Languages and Literature 
Keywords
collocation, corpus, semantics, neologisms, machine learning
Evaluation (rules)
source: COBISS
Researchers (14)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  27674  PhD Špela Arhar Holdt  Linguistics  Researcher  2017 - 2020 
2.  36914  PhD Jaka Čibej  Linguistics  Researcher  2018 - 2020 
3.  36491  PhD Kaja Dobrovoljc  Linguistics  Researcher  2019 - 2020 
4.  16313  PhD Apolonija Gantar  Linguistics  Researcher  2017 - 2020 
5.  52176  Teja Goli    Technical associate  2019 - 2020 
6.  14681  PhD Vojko Gorjanc  Linguistics  Researcher  2017 - 2020 
7.  37694  MSc Maja Jančič  Economics  Researcher  2018 - 2020 
8.  32887  MSc Bojan Klemenc  Computer science and informatics  Technical associate  2017 - 2020 
9.  33796  PhD Iztok Kosem  Linguistics  Head  2017 - 2020 
10.  26166  PhD Simon Krek  Linguistics  Researcher  2017 - 2020 
11.  37653  PhD Cyprian Adam Laskowski  Linguistics  Researcher  2017 - 2020 
12.  36871  PhD Nikola Ljubešić  Linguistics  Researcher  2017 - 2020 
13.  20482  PhD Nataša Logar  Linguistics  Researcher  2017 - 2020 
14.  51456  PhD Eva Pori  Linguistics  Researcher  2019 - 2020 
Organisations (3)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  0106  Jožef Stefan Institute  Ljubljana  5051606000  18 
2.  0581  University of Ljubljana, Faculty of Arts  Ljubljana  1627058  15 
3.  0582  University of Ljubljana, Faculty of Social Sciences  Ljubljana  1626957 
Abstract
The main objective of the proposed project is to conduct basic research into semantic and temporal aspects of collocation, as well as statistics for measuring it, areas that have been so far largely neglected in Slovenian linguistics, and to some extent also internationally. The second objective is the development and a thorough linguistic evaluation of machine learning methods for analyses of the Slovene language and extraction of lexical information from corpora. By doing this we want to introduce into the Slovene research environment a closer cooperation and synergy between lexicography and linguistics on the one side, and computational linguistics and natural language processing on the other. The third objective is a systematic integration of the results, obtained from various user studies, into the development of project methods and tools, and the preparation of methodological descriptions for transferring project results into practice in order to ensure their optimal applicability.   The project will address four different aspects of collocation: statistics for measuring collocation, semantic sets or categories of collocation, the role of collocation as a distinguishing characteristic between semantically related words (e.g. synonyms), and the role of collocation in detecting semantic and related changes in the use of words over time.   In terms of Slovenian research, the originality is found in all aspects of the project, as the proposed research studies and tools for Slovene do not exist at this time; the timeliness and relevance of this research is particularly vital as there are several ongoing lexicographic and other projects in Slovenia that would benefit significantly from the project results. Each work package of the project is expected to bring important new knowledge to the field of language description, and also importantly influence approaches and analysis of collocation in other disciplines such as linguistics and language learning.   Several aspects of the proposed research are likely to be of interest to the international research community, and will contribute to the development of new research directions in Slovenia as well as in other language communities, particularly those with a morphologically rich language. With the project results, we also aim to further and stimulate research into theoretical and applied aspects of collocation, colligation, and multi-word expressions.
Significance for science
By producing original results, and theoretical and methodological approaches, the project will establish a dialogue between different field and disciplines in Slovenia, as well as a dialogue between Slovenian experts and international experts, especially those from Europe. In the last three decades we have witnessed in linguistics a noticeable shift from researching language system, typical especially for structuralism, to comprehensive and empirically based analysis of language, which attempts to investigate real language use; this is also key for interlanguage comparisons, and research and monitoring of language phenomena over time. Because in Slovenia language description is still predominantly structuralistic, we are witnessing an increasing gap between current language descriptions in other languages and in Slovene. The project will thus contribute to reducing this gap, as it draws on the state-of-the-art methods in lexicographic theory and practice used around the world, especially on the ones that have recently led to the methodological framework of e-lexicography and represent a cross-disciplinary approach of corpus linguistics and lexicography, computational linguistics and information technologies. The timeliness of this research is also vital as there are several lexicographic and other projects in Slovenia that are currently ongoing or being planned and would benefit significantly from the project results; namely, each work package of the project will bring important new knowledge to the description of multi-word units. The proposed research project will include natural language processing methods in a systematic manner, making the project particularly innovative for Slovenian research area. In addition, the project will develop new analytical methods in linguistics in general, bringing the research at the level of state-of-the-art linguistics in the world. More specifically, the research will make an important contribution to the identification of the optimal statistical measure for automatic detection of collocativity for lexicographic purposes and for language description in general; furthermore, the research will considerably improve state-of-the-art in using collocation for (semi-)automatic distinguishing of synonyms. On the basis of this research, recommendations for the implementation of analytical methods into lexicography will be prepared.
Significance for the country
The development of state-of-the-art language resources, developed mainly with digital media in mind, and the research supporting and facilitating this, is among the objectives stressed by the Resolution on the National Programme for Language Policy 2014–18 and related Action plan for Language Infrastructure. Therefore, the proposed project directly addresses the topic that has been recognized in Slovenia as one of the key topics in language policy and language planning. It is based on the belief that in a globally connected world access to information in a certain language is of vital importance, as well as the links between different language, which is why the development of state-of-the-art language resources and related research is of extreme importance for the vitality of Slovenian language community. Project results will be systematically added to the infrastructure of the Centre for Language Resources and Technologies, University of Ljubljana, and freely available to the general public, and in this way we are responding to the calls the for systematic development of language infrastructure for Slovene, and following the European strategy for open science. The research project will include a number of evaluation studies, conducted by both linguists of different profiles and different user groups (translators, teachers, students etc.). In this way, the project will directly address users of language description works, enter into dialogue with them, and then incorporate their feedback in the development and improvement of analytical methods in a systematic way. In this way, the research will not remain limited to academic community, but will rather maintain a dialogue with wider public throughout the project, which is actually the target audience of the specific language description tasks in the project. Although research into user needs and expectations is well-established in lexicography around the world, it has started to be conducted systematically in Slovenia only in recent years. By incorporating user studies in the tasks, the proposed project is thus following the current lexicographical trends in this area. To facilitate the communication between academic community and general public, one of the project objectives is the development of a tool for systematic monitoring of the life-cycle of words; for researchers, this product presents a great challenge, and will bring a direct impact for the cultural development and the development of an infrastructure important for the Slovene language.
Most important scientific results Interim report, final report
Most important socioeconomically and culturally relevant results Interim report, final report
Views history
Favourite