Loading...
Projects / Programmes source: ARIS

Resources, methods and tools for the understanding, identification and classification of various forms of socially unacceptable discourse in the information society

Research activity

Code Science Field Subfield
6.05.02  Humanities  Linguistics  Theoretical and applied linguistics 

Code Science Field
H350  Humanities  Linguistics 

Code Science Field
6.02  Humanities  Languages and Literature 
Keywords
Socialy unacceptable discourse; Computer Mediated Communication; Corpus Linguistics; Critical Discourse Analysis; Language Technologies
Evaluation (rules)
source: COBISS
Researchers (17)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  28195  PhD Veronika Bajt  Social sciences  Researcher  2017 - 2020  240 
2.  30704  PhD Jernej Berzelak  Public health (occupational safety)  Researcher  2017 - 2020  122 
3.  30672  PhD Maja Bitenc  Linguistics  Researcher  2018  60 
4.  53338  Monika Bohinec  Criminology and social work  Technical associate  2020 
5.  36914  PhD Jaka Čibej  Linguistics  Researcher  2017 - 2020  152 
6.  05023  PhD Tomaž Erjavec  Linguistics  Head  2017 - 2020  636 
7.  26294  PhD Darja Fišer  Linguistics  Researcher  2017 - 2020  412 
8.  14681  PhD Vojko Gorjanc  Linguistics  Researcher  2017 - 2020  478 
9.  24365  PhD Dejan Jontes  Social sciences  Researcher  2019 - 2020  293 
10.  27894  PhD Neža Kogovšek Šalamon  Law  Researcher  2017 - 2020  384 
11.  50983  PhD Jakob Lenardič  Linguistics  Researcher  2018 - 2020  61 
12.  36871  PhD Nikola Ljubešić  Linguistics  Researcher  2017 - 2020  397 
13.  39534  Andrej Motl  Sociology  Researcher  2017 - 2020  22 
14.  03323  PhD Igor Mozetič  Computer science and informatics  Researcher  2017 - 2020  184 
15.  20544  Irena Salmič    Technical associate  2017 - 2020 
16.  37977  PhD Jasmina Smailović  Computer science and informatics  Researcher  2017  40 
17.  10155  PhD Vasja Vehovar  Sociology  Researcher  2017 - 2020  840 
Organisations (4)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  0106  Jožef Stefan Institute  Ljubljana  5051606000  90,695 
2.  0366  Peace Institute  Ljubljana  5498295000  3,571 
3.  0581  University of Ljubljana, Faculty of Arts  Ljubljana  1627058  97,945 
4.  0582  University of Ljubljana, Faculty of Social Sciences  Ljubljana  1626957  40,409 
Abstract
Socially unacceptable discourse, such as hate, discriminatory, offensive or threatening speech is by no means a new phenomenon. It has, however, recently gained significant momentum due to a number of substantial societal, cultural and economic changes. Furthermore, the boom of the information-communication technology and the speed at which information is spread on the Internet have given such discourse practices an unprecedented reach and impact that can only be studied and efficiently mitigated with interdisciplinary methods and automatic approaches. The project combines state-of-the-art quantitative and qualitative multidisciplinary approaches which will be employed to investigate the use of socially unacceptable discourse in its sociocultural context. The use of novel data-driven approaches on unstructured and semi-structured data will move the frontiers of the traditional humanities and social sciences. As a side-effect, the project will also support the development of the new field of Digital Humanities and Social Sciences, which combines tools and methods from computer science with those of humanities and social sciences. In the scope of the project we will construct large corpora of Slovene computer mediated communication in general and socially unacceptable discourse in particular, which will serve as the basis for our empirically based research. The collected corpora will be highly structured and their texts linguistically processed as well as enriched with various metadata. We will develop a typology of socially unacceptable discourse and its targets, and manually annotate a representative sample of texts with this typology. This will result in a gold-standard dataset for researching such communication. By using machine learning techniques on this dataset, an automatic method to flag and categorise SUD texts and their targets will be developed and applied to the compiled corpora. Interdisciplinary sociolinguistic analyses will be performed on the basis of the collected and processed resources, focusing on migrants and Islamophobia, homophobia and gay rights, and sexism and misogyny. We will use the methodologies and instruments of corpus linguistics, critical discourse analysis and inferential statistics. These approaches will be supplemented with a corpus analysis of legal aspects of socially unacceptable discourse and surveys on its the perception in the Slovene society. The project will organise an international interdisciplinary workshop and publish a monograph. It is important to note that the project will enable free and open access to the research results through the research infrastructure CLARIN.SI and the Social Science Data Archive. The research data will consist of the developed language resources and software. All legal and ethical issues with regard to personal data distribution will be taken into account. Through this, the project will also support the move to open science, enabling reproducibility of its research results.
Significance for science
The proposed project is an important milestone in Slovene humanities and social sciences as there have been no previous attempts of comprehensive, inter- and multidisciplinary, data-driven research of SUD. The relevance and impact of the project for the development of science is four-fold: A tangible result of the project is the large, richly annotated corpora of socially unacceptable CMC and of general CMC as well as manually annotated datasets giving the SUD type and target the SUD is aimed at. These language resources will enable a comprehensive insight into the characteristics of various forms of SUD practices in the information society and will facilitate a number of novel research approaches in the fields of linguistics, sociolinguistics, critical discourse analysis and anthropology as well as support the development of technologies for content analysis and text analytics for Slovene that can be widely employed in the Digital Humanities and Social sciences. The project will result in a theoretically grounded and thoroughly tested tool for automatic detection and classification of socially unacceptable web content that will be directly applicable in social science, law and criminology. Such services are becoming increasingly important in the knowledge-based, hi-tech society, where Slovene is still lagging far behind most European languages, which puts it into unequal position compared to others and consequently hinders the development of Slovene society and Slovene language. In contrast to many Slovene projects, its research results in terms of created resources, manually annotated datasets, models and technologies, will be published, taking into account legal and ethical limitations, under an open-source research licence (Creative Commons) according to the EU open science guidelines. This enables the reproducibility and enhancements of the research results achieved in the scope of the project by other interested researchers, be it in Slovenia or abroad. Combining methods in humanities and social sciences with those from computer science, the project will support the development of the new research field of Digital Humanities and Social Sciences. In the field of legal sciences the novel contribution of the project will be the legal analysis of the corpus, showing the extent of legally prosecutable amount of SUD which remains untackled, the legal characteristics of the SUD corpus, its targets and level of severeness. This will importantly contribute to knowledge and understanding of SUD and its prevention, which is the preferred tool as opposed to criminal prosecution.
Significance for the country
With a combination of methods and approaches from various fields of Digital Humanities and Social Sciences the project facilitates the perception of SUD in the society and provides tools and guidelines to combat elements of extremism and intolerance in our society. Apart from the society in general, the direct beneficiaries will be newspaper publishers and online content providers as well as governmental and non-governmental institutions. In the field of legal sciences the novel contribution of the project will be the legal analysis of the corpus, showing the extent of legally prosecutable amount of SUD which remains untackled, the legal characteristics of the SUD corpus, its targets and level of severeness. This will importantly contribute to knowledge and understanding of SUD and its prevention, which is the preferred tool as opposed to criminal prosecution. The project will also produce a number of open-source language technology tools and resources for dealing with Slovene CMC, which will significantly outperform existing ones. Furthermore, the project will compile annotated datasets of Slovene, a key resource for training language analysis tools. These tools and resources will be directly accessible for use by other researchers, and, where possible, for commercial use as well. This will facilitate further development of language technologies for Slovene.
Most important scientific results Interim report, final report
Most important socioeconomically and culturally relevant results Interim report, final report
Views history
Favourite