1.

On estimation in relative survival

Estimation of relative survival has become the first and the most basic step when reporting cancer survival statistics. Standard estimators are in routine use by all cancer registries. However, it has been recently noted that these estimators do not provide information on cancer mortality that is independent of the national general population mortality. Thus they are not suitable for comparison between countries. Furthermore, the commonly used interpretation ofthe relative survival curve is vague and misleading. The present article attempts to remedy these basic problems. The population quantities of the traditional estimators are carefully described and their interpretation discussed. We then propose a new estimator of net survival probability that enables the desired comparability between countries. The new estimator requires no modeling and is accompanied with a straightforward variance estimate. The methods are described on real as well as simulated data.

COBISS.SI-ID: 28569561

2.

A measure of explained variation for event history data

There is no shortage of proposed measures of prognostic value of survival models in the statistical literature. They come under different names, including explained variation, correlation, explained randomness, and information gain, but their goal is common: to define something analogous to the coefficient of determination R2 in linear regression. None however have been uniformly accepted, none have been extended to general event history data, including recurrent events, and many cannot incorporate time-varying effects or covariates. We present here a measure specifically tailored for use with general dynamic event history regression models. The measure is applicable and interpretable in discrete or continuous time; with tied data or otherwise; with time-varying, time-fixed, or dynamic covariates; with time-varying or time-constant effects; with single or multiple event times; with parametric or semiparametric models; and under general independent censoring/observation. For single-event survival data with neither censoring nor time dependency it reduces to the concordance index. We give expressions for its population value and the variance of the estimator and explore its use in simulations and applications. A web link to R software is provided.

COBISS.SI-ID: 27655129

3.

Class prediction for high-dimensional class-imbalanced data

The goal of class prediction studies is to develop rules to accurately predictthe class membership of new samples. The rules are derived using the values of the variables available for each subject: the main characteristic ofhigh-dimensional data is that the number of variables greatly exceeds the number of samples. Frequently the classifiers are developed using class-imbalanced data, i.e., data sets where the number of samples in each class is not equal. Standard classification methods used on class-imbalanced data often produce classifiers that do not accurately predict the minority class; the prediction is biased towards the majority class. In this paper we investigate if the high-dimensionality poses additional challenges when dealing with class-imbalanced prediction. We evaluate the performance of six types of classifiers on class-imbalanced data, using simulated data and a publicly available data set from a breast cancer gene-expression microarray study. We also investigate the effectiveness of some strategies that are available to overcome the effect of class imbalance.Results Our results show that the evaluated classifiers are highly sensitive to class imbalance and that variable selection introduces an additional bias towards classification into the majority class. Most new samples are assigned to the majority class from the training set, unless the difference between the classes is very large. As a consequence, the class-specific predictive accuracies differ considerably. When the class imbalance is not too severe, down-sizing and asymmetric bagging embedding variable selection work well, while over-samplingdoes not. Variable normalization can further worsen the performance of the classifiers. (Abstract truncated at 2000 characters)

COBISS.SI-ID: 27503321

4.

On standardization of the activity index

Relative Specialization Index (RSI) was introduced as a simple transformation of the Activity Index (AI), the aim of this transformation being standardization of AI, and therefore more straightforward interpretation. RSI is believed to have values between -1 and 1, with -1 meaning no activity of the country (institution) in a certain scientific field, and 1 meaning that the country is only active in the given field. While it is obvious from the definition of RSI that it can never be 1, it is less obvious, and essentially unknown, that its upper limit can be quite far from 1, depending on the scientific field. This is a consequence of the fact that AI has different upper limits for different scientific fields. This means that comparisons of RSIs, or AIs, across fields can be misleading. We therefore believe that RSI should not be used at all. We also show how an appropriate standardization of AI can be achieved.

COBISS.SI-ID: 31282137

5.

Comparison of the citation distribution and h-index between groups of different sizes

Evaluating the performance of institutions with different resources is not easy, any citation distribution comparisons are strongly affected by the differences in the number of articles published. The paper introduces a methodfor comparing citation distributions of research groups that differ in size. The citation distribution of a larger group is reduced by a certain factor and compared with the original distribution of a smaller group. Expected values and tolerance intervals of the reduced set of citations are calculated. A comparison of both distributions can be conveniently viewed in agraph. The size-independent reduced Hirsch index - a function of reducing factor that allows the comparison of groups within a scientific field - is calculated in the same way. The method can be used for comparing groups or units differing in full-time equivalent, funding or the number of researchers,for comparing countries by population, gross domestic product, etc. It is shown that for the calculation of the reduced Hirsch index, the upper part of the original citation distribution is sufficient. The method is illustrated through several case comparisons.

COBISS.SI-ID: 30069465

P3-0154 — Final report

1.

On estimation in relative survival

2.

A measure of explained variation for event history data

3.

Class prediction for high-dimensional class-imbalanced data

4.

On standardization of the activity index

5.

Comparison of the citation distribution and h-index between groups of different sizes