Z2-1867 — Interim report
1.
Identifying practical significance through statistical comparison of meta-heuristic stochastic optimization algorithms

We proposed a practical Deep Statistical Comparison (pDSC), which takes into account practical significance when making a statistical comparison of meta-heuristic stochastic optimization algorithms for single-objective optimization. For achieving practical significance, two variants of the standard DSC ranking scheme are proposed. The first is called sequential pDSC, and takes into account practical significance by preprocessing of the independent optimization runs in a sequential order. The second is called Monte Carlo pDSC, and avoids any dependency of practical significance with regard to the ordering of optimization runs. The analysis of identifying practical significance on benchmark tests for single-objective problems, shows that for some cases, both variants of pDSC compared to the Chess Rating System for Evolutionary Algorithms (CRS4EAs) approach give different conclusions. Preprocessing for practical significance is carried out in a similar way, but there are cases when the conclusion for practical significance differ, which comes from the different statistical concepts used to identify practical significance.

COBISS.SI-ID: 32837159
2.
Mix and Rank: A Framework for Benchmarking Recommender Systems

Recommender systems use big data methods, and are widely used in various social-network, e-commerce, and content platforms. With their increased relevance, online platforms and developers are in need of better ways to choose the systems that are most suitable for their use-cases. At the same time, the research literature on recommender systems describes a multitude of measures to evaluate the performance of different algorithms. For the end-user however, the large number of available measures do not provide much help in deciding which algorithm to deploy. Some of the measures are correlated, while others deal with different aspects of recommendation performance like accuracy and coverage. To address this problem, we propose a novel benchmarking framework that mixes different evaluation measures in order to rank the recommender systems on each benchmark dataset, separately. Additionally, our approach discovers sets of correlated measures as well as sets of evaluation measures that are least correlated. We investigate the robustness of the proposed methodology using published results from an experimental study involving multiple big datasets and evaluation measures. Our work provides a general framework that can handle an arbitrary number of evaluation measures and help end-users rank the systems available to them.

COBISS.SI-ID: 33220391
3.
DSCTool: A web-service-based framework for statistical comparison of stochastic optimization algorithms

DSCTool is a statistical tool for comparing performance of stochastic optimization algorithms on a single benchmark function (i.e. single-problem analysis) or a set of benchmark functions (i.e., multiple-problem analysis). DSCTool implements a recently proposed approach, called Deep Statistical Comparison (DSC), and its variants. DSC ranks optimization algorithms by comparing distributions of obtained solutions for a problem instead of using a simple descriptive statistic such as the mean or the median. The rankings obtained for an individual problem give the relations between the performance of the applied algorithms. To compare optimization algorithms in the multiple-problem scenario, an appropriate statistical test must be applied to the rankings obtained for a set of problems. The main advantage of DSCTool are its REST web services, which means all its functionalities can be accessed from any programming language. In this paper, we present the DSCTool in detail with examples for its usage.

COBISS.SI-ID: 32930343