Projects / Programmes
Language Resources for Slovene
Code |
Science |
Field |
Subfield |
6.05.00 |
Humanities |
Linguistics |
|
Code |
Science |
Field |
H350 |
Humanities |
Linguistics |
H352 |
Humanities |
Grammar, semantics, semiotics, syntax |
H360 |
Humanities |
Applied linguistics, foreign languages teaching, sociolinguistics |
P175 |
Natural sciences and mathematics |
Informatics, systems theory |
P176 |
Natural sciences and mathematics |
Artificial intelligence |
language resources, corpus linguistics, corpora, slovene language, linguistics annotation, lematization, disambigouation, parsing, text-minin, semantic-web
Researchers (10)
Organisations (3)
Abstract
The aim of project "Language resources for the Slovene language" is to develop text corpora and software tools for researching Slovene texts and the Slovene language in general. It is designed as the qualitative and quantitative upgrading of the Slovene reference corpus FIDA with the involvement of the original partners in the FIDA project (Faculty of Arts - University of Ljubljana, Jozef Stefan Institute, DZS d.d., Amebis d.o.o.) and one new partner (Faculty of Social Studies - University of Ljubljana). The upgrading will consist of several components: the size of the corpus will be doubled (200.000.000 words), spoken corpus component and internet texts will be added and new guidelines for balancing the corpus will be implemented.
Parallel to corpus enlargement, software tools for automatic processing of the incoming texts will be developed, as well as software for extraction and analysis of
linguistic information. All the results will be publicly available for research and pedagogic purposes and from that point of view, the project will represent a major step forward in developing research and language-policy infrastructure in linguistics, social studies and information technology.