1.

Neurodegenerative disease data ontology (NDDO)

Within the project, we developed an ontology for the representation of brain diseases data – NDDO. The proposed ontology facilitates semantic annotation of datasets containing neurodegenerative diagnostic data (i.e. clinical, imaging, biomarker, etc.) and disease progression data collected on patients by different hospitals. Rich semantic annotation of datasets is essential for efficient support of data mining, for example for the identification of suitable algorithms for data analytics, text mining, and reasoning over distributed data and knowledge sources. To address the data analytics perspective, we reused and extended our previous work on ontology of data types (OntoDT) and ontology of core data mining entities (OntoDM-core) to represent specific domain datatypes that occur in the domain datasets. We demonstrated the utility of NDDO in two use cases: semantic annotation of datasets, and incorporating information about clinical procedures used to produce neurodegenerative data.

COBISS.SI-ID: 32864807

2.

Towards reusable models of dynamical systems: Ontology for Process-Based Modelling of Dynamical Systems (Onto-PBM)

Storing metadata about models of dynamical systems in a machine readable form is one of the key steps towards their accessibility and reusability. In the domain of process-based modeling of dynamical systems, the task is to construct an explanatory model of a dynamical system from domain knowledge and data. Within the project, we developed a workflow for annotation, storage and querying of process-based models specifically in the domain of aquatic ecosystems. To provide the vocabulary of key terms about the process-based modeling paradigm, we developed the Ontology for Process-Based Modeling of Dynamical Systems (OntoPBM). Next, to capture the domain-specific characteristics, we extended OntoPBM with terms specific for aquatic ecosystems. The annotations for each process-based model are stored in an RDF triple store. This enables us to execute SPARQL queries on facts asserted in the annotations, as well as facts inferred from the domain knowledge encoded in the ontology. Finally, by following the proposed workflow, we generated the minimal information about a model, which takes us one step closer towards reusable research.

COBISS.SI-ID: 32541991

3.

Web genre classification with methods for structured output prediction

The number of available web pages is ever increasing and searching through them is typically performed by providing search keywords to a search engine. The search engine then returns an ordered list of results. The user can, however, obtain more precise results by specifying the web genres s/he is searching for. We investigated 10 different semantic representations of web pages containing features ranging from expert derived features (context, presentation etc.) to character n-grams and paragraph vector embeddings. Furthermore, typically, web genre prediction is addressed as a multi-class classification task. Here, we advocate and demonstrate that web genre prediction is a structured output prediction task where a web page can be labelled with multiple genres (multi-label classification – MLC) and the genres can be organized into a hierarchical taxonomy (hierarchical MLC). We executed an extensive set of experiments that confirm our position and further reveal that: 1) with structured output prediction premium predictive performance is obtained, 2) data-driven construction of a hierarchy of web genres yields equally good performance as expert constructed hierarchy, and 3) surface and paragraph vector embeddings offer the best performance.

COBISS.SI-ID: 32528679

J2-9230 — Interim report

1.

Neurodegenerative disease data ontology (NDDO)

2.

Towards reusable models of dynamical systems: Ontology for Process-Based Modelling of Dynamical Systems (Onto-PBM)

3.

Web genre classification with methods for structured output prediction