1.

Bootstrapping bilingual lexicons from comparable corpora for closely related languages

Presentation of an approach for bootstrapping bilingual lexicons from comparable corpora for closely related languages

COBISS.SI-ID: 46703714

2.

Language resources and tools for semantically enhanced processing of Slovene

Presentation of language resources and tools for semantic processing of Slovene

COBISS.SI-ID: 50261858

3.

sloWNet 3.0: Development, extension and cleaning

Presentation of the approaches for the extension and cleaning of Slovene wordnet

COBISS.SI-ID: 47786850

4.

Automatic extraction of Croatian-Slovene lexicon from comparable corpora

In this paper we present a method for extracting a bilingual lexicon for closely related languages from comparable corpora. We take advantage of the similarities between languages to build a seed lexicon to compare context vectors in both languages and use cognates for reranking translation candidates. The results are very encouraging, suggesting that other similar languages could benefit from the same approach.

COBISS.SI-ID: 47260258

5.

Addressing polysemy in bilingual lexicon extraction from comparable corpora

This paper presents an approach to extract translation equivalents from comparable corpora for polysemous nouns. As opposed to the standard approachesthat build a single context vector for all occurrences of a given headword, we first disambiguate the headword with third-party sense taggers and then build a separate context vector for each sense of the headword. Sincestate-of-the-art word sense disambiguation tools are still far from perfect, we also tried to improve the results by combining the sense assignments provided by two different sense taggers. Evaluation of the resultsshows that we outperform the baseline (0.473) in all the settings we experimented with, even when using only one sense tagger, and that the best-performing results are indeed obtained by taking into account the intersection of both sense taggers (0.720).

COBISS.SI-ID: 50058338

Z6-3668 — Final report

1.

Bootstrapping bilingual lexicons from comparable corpora for closely related languages

2.

Language resources and tools for semantically enhanced processing of Slovene

3.

sloWNet 3.0: Development, extension and cleaning

4.

Automatic extraction of Croatian-Slovene lexicon from comparable corpora

5.

Addressing polysemy in bilingual lexicon extraction from comparable corpora