Presentation of an approach for bootstrapping bilingual lexicons from comparable corpora for closely related languages
COBISS.SI-ID: 46703714
Presentation of language resources and tools for semantic processing of Slovene
COBISS.SI-ID: 50261858
Presentation of the approaches for the extension and cleaning of Slovene wordnet
COBISS.SI-ID: 47786850
In this paper we present a method for extracting a bilingual lexicon for closely related languages from comparable corpora. We take advantage of the similarities between languages to build a seed lexicon to compare context vectors in both languages and use cognates for reranking translation candidates. The results are very encouraging, suggesting that other similar languages could benefit from the same approach.
COBISS.SI-ID: 47260258
This paper presents an approach to extract translation equivalents from comparable corpora for polysemous nouns. As opposed to the standard approachesthat build a single context vector for all occurrences of a given headword, we first disambiguate the headword with third-party sense taggers and then build a separate context vector for each sense of the headword. Sincestate-of-the-art word sense disambiguation tools are still far from perfect, we also tried to improve the results by combining the sense assignments provided by two different sense taggers. Evaluation of the resultsshows that we outperform the baseline (0.473) in all the settings we experimented with, even when using only one sense tagger, and that the best-performing results are indeed obtained by taking into account the intersection of both sense taggers (0.720).
COBISS.SI-ID: 50058338