1.

Wordification: propositionalization by unfolding relational data into bags of words

We have developed a new relational data mining technique, called wordification, which performs a transformation of a given relational database into a corpus of text documents. Wordification constructs simple, easy to understand features, acting as words in the transformed Bag-Of-Words representation. The paper presents the wordification methodology, together with the experimental comparison of several propositionalization approaches on seven relational datasets. The main advantages of the approach are the achieved accuracy comparable to competitive methods, and greater scalability, as it performs several times faster on all experimental databases. The wordification methodology and the evaluation procedure have been implemented as executable workflows in our novel web-based data mining platform ClowdFlows. The implemented workflows include also several other ILP and RDM algorithms, as well as the utility components that were added to the platform to enable access to these techniques to a wider research audience, which contributes to open science and experiment repeatability. The developed workflow is publicly available at http://clowdflows.org/workflow/4018/.

COBISS.SI-ID: 28609575

2.

Sentiment of Emojis

We established the first emoji sentiment lexicon and drew a sentiment map of the 751 most frequently used emojis. The sentiment of emojis was computed from the sentiment of tweets in which they occur. We have engaged 83 human annotators to label over 1.6 million tweets in 13 European languages by the sentiment polarity (negative, neutral, or positive). It turns out that the plarity of most of the emojis is positive, especially the most popular ones have positive sentiment polarity. The sentiment distribution of the tweets with and those without emojis is significantly different. The inter-annotator agreement on the tweets with emojis is higher. Emojis tend to occur at the end of the tweets, and their sentiment polarity increases with the distance. We observe no significant differences in emoji rankings between the 13 languages.

COBISS.SI-ID: 29085223

3.

Interdisciplinarity of scientific fields and its evolution based on graph of project collaboration and co-authoring

The paper investigates interdisciplinarity of scientific fields based on graph of collaboration between the researchers. A new measure for interdisciplinarity is proposed that takes into account graph content and structure. Similarity between science categories is estimated based on text similarity between their descriptions. The proposed new measure is applied in exploratory analysis of research community in Slovenia. We found that Biotechnology and Natural sciences are the most interdisciplinary in their publications and collaborations on research projects. In addition evolution of interdisciplinarity of scientific fields in Slovenia is observed, showing that over the last decade interdisciplinarity increases the fastest in Medical sciences mainly due to collaborations with Natural and Technical sciences.

COBISS.SI-ID: 28426791

4.

Dynamic system modeling using ensembles of model trees

We addressed the task of discrete-time modeling of nonlinear dynamic systems with multiple outputs using measured data. We proposed, implemented and empirically evaluated three extensions of fuzzy linear model trees, by using the LoLiMoT (Local Linear Model Trees) algorithm. These extensions were multi-output models, ensembles of such models, and a search heuristic based on simulation error. We performed an empirical evaluation and compared these extensions on a variety of dynamic system case studies. We showed that ensembles improve the performance of both single and multi-output trees, and we provided an overall recommendation to use bagging of single-output LoLiMoT models, with the simulation error as a search heuristic.

COBISS.SI-ID: 28967207

5.

Active learning for sentiment analysis on data streams

We have expanded the ClowdFlows data mining platform to enable the analysis of data streams and active learning. By utilizing the data and workflow sharing capabilities of ClowdFlows we have shown that labeling of examples can be distributed through crowdsourcing. The platform was stress tested and the limits of processing multiple concurrent data streams were determined. We have implemented an active learning scenario for sentiment analysis on data streams and enabled its use and reuse via the web application. Furthermore we have shown that machine learning methods are suitable for sentiment analysis and that active learning improves the accuracy of sentiment classification.

COBISS.SI-ID: 28251943

P2-0103 — Annual report 2015

1.

Wordification: propositionalization by unfolding relational data into bags of words

2.

Sentiment of Emojis

3.

Interdisciplinarity of scientific fields and its evolution based on graph of project collaboration and co-authoring

4.

Dynamic system modeling using ensembles of model trees

5.

Active learning for sentiment analysis on data streams