1.

Foundations of rule learning

Monograph "Foundations of Rule Learning" (Springer 2012, 334 pages), co-authored by Prof. Nada Lavrač as the leading author, is the result of several years of research work in the area of machine learning. The monograph presents the fundamentals of rule learning as investigated in classical machine learning and modern data mining. The book can be used as a comprehensive reference to research in the field of inductive rule learning and can also serve as a textbook for teaching machine learning. Parts of the book are available on the Springer website http://link.springer.com/book/10.1007/9783540751977/.

COBISS.SI-ID: 26327591

2.

Computational protein function prediction

We developed a new method for computational gene (or protein) function prediction. (i) The method, which was published in PLOS Computational Biology, is based on the principles of homology and phyletic profiles and uses ensembles of trees for hierarchical multi-label classification. In addition, the method was experimentally evaluated with wet lab experiments. The results show that the confidence estimates, obtained by applying our method, can be used to make informed decisions on experimental validation of computational predictions. (ii) The method was used in the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment, which was published in Nature Methods, where 54 state of the art for protein function prediction methods were evaluated on a target set of 866 proteins from 11 organisms.

COBISS.SI-ID: 26510887

3.

Class imbalance and the curse of minority hubs

The paper deals with evaluating the impact of hubness on learning under class imbalance with nearest neighbor methods. Our results show that, contrary to the common belief, minority class hubs might be responsible for most misclassification in many high-dimensional datasets.

COBISS.SI-ID: 27022119

4.

Multi-target regression with rule ensembles

We developed the FIRE algorithm for multi-target regression, which employs the rule ensemble approach. The accuracy of the algorithm was improved by adding simple linear functions to the ensemble. We also extensively evaluated the algorithm and the results show that multi-target regression rule ensembles are more accurate than, for instance, multi-target regression trees, but not quite as accurate as multi-target random forests. Rule ensembles have the advantage that they are significantly more concise than random forests, and it is also possible to create compact rule sets that are smaller than single regression trees but still comparably accurate.

COBISS.SI-ID: 26134055

5.

Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining

This paper gives a survey of contrast set mining (CSM), emerging pattern mining (EPM), and subgroup discovery (SD) in a unifying framework named supervised descriptive rule discovery. The paper contributes a novel understanding of these sub-areas of data mining by presenting a unified terminology, by explaining the apparent differences between the learning tasks and by exploring the apparent differences between the approaches. The paper also provides a critical survey of existing supervised descriptive rule discovery visualization methods.

COBISS.SI-ID: 22475303

P2-0103 — Final report

1.

Foundations of rule learning

2.

Computational protein function prediction

3.

Class imbalance and the curse of minority hubs

4.

Multi-target regression with rule ensembles

5.

Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining