We developed a novel Service-oriented Knowledge Discovery framework and its implementation in a service-oriented data mining environment Orange4WS (Orange for Web Services), based on the existing Orange data mining toolbox and its visual programming environment, which enables manual composition of data mining workflows. The new service-oriented data mining environment Orange4WS includes the following new features: simple use of web services as remote components that can be included into a data mining workflow; simple incorporation of relational data mining algorithms; a knowledge discovery ontology to describe workflow components (data, knowledge and data mining services) in an abstract and machine-interpretable way, and its use by a planner that enables automated composition of data mining workflows. These new features are show-cased in three real-world scenarios.
COBISS.SI-ID: 25004071
The paper presents the MULTEXT-East language resources, a multilingual dataset for language engineering research, focused on the morphosyntactic level of linguistic description. The dataset includes the morphosyntactic specifications, lexica and a sentence-aligned manually annotated parallel corpus. The resources are encoded using the Text Encoding Initiative Guidelines, TEI P5, and cover 16 languages, mainly from Central and Eastern Europe. This dataset, unique in terms of languages covered and the wealth of encoding, is extensively documented, and freely available for research purposes.
COBISS.SI-ID: 25372199
The paper proposes the OntoPlus methodology for semi-automatic ontology extension based on text mining methods. It allows for the effective extension of large ontologies, providing a ranked list of potentially relevant concepts and relationships given a new concept to be inserted in the ontology. Experiments evaluating measures for ranking correspondence using Cyc ontology on real-world data from financial and aquaculture domains show that the best results are achieved by combining ontology content, structure and co-occurrence information.
COBISS.SI-ID: 25127463