Semantically annotated corpora are indispensible in natural language processing tasks, such as automatic word sense disambiguation, information retrieval and machine translation. For Slovene, no previous attempt has been made to obtain such a corpus. This paper presents and discusses a project in which the most frequent nouns from a corpus of Slovene were manually annotated with wordnet senses.
COBISS.SI-ID: 43099234
The paper presents an innovative approach to extract Slovene definition candidates from domain-specific corpora using morphosyntactic patterns, automatic terminology recognition and semantic tagging with wordnet senses. The results of the experiment are encouraging, with accuracy ranging from 67% to 71%. The paper also addresses some drawbacks of the approach and suggests ways to overcome them in future work.
COBISS.SI-ID: 43122530