Methodology for data analysis in medical sciences

Code Science Field Subfield
3.08.00  Medical sciences  Public health (occupational safety)   

Code Science Field
B110  Biomedical sciences  Bioinformatics, medical informatics, biomathematics biometrics 
biostatistics, survival analysis, Cox model, explained variation, frailties, relative survival, logistic regression, goodness of fit, scientometrics, OLAP, data mining, information retrieval, generating scientific hypotheses, search algorithms, scientific indicators
The topic of our research program is methodology for discovering actual or possible patterns, trends and assotiations in medical data. These are the methods for discovering new knowledge or generating new hypotheses that may lead to knowledge. The data that we are analysing manly arise from research, but also from routine practice in medicine. In the recent years, we are paying special attention to methods for generating hypotheses from bibliographic data. We are also cautiously expanding the scope of our research, presently into the field of electric stimulation of smooth muscles and associated electromiography. In brief, our research can be divided into three sub-fields: 1. Biostatistics 2. Scientometrics 3. Data mining in v bibliographic databases The focus of our research in biostatistics is on regression models for survival analysis, especially the Cox model. In addition to explained variation, prognostic value and frailties, which have so far been our main topics related to the Cox model, we will concentrate our efforts on time-varying coefficients and testing specific alternative hypotheses, such as crossing hazards, during the forthcomming five-year period. The presently available tests do not distinguish the later situations from the null hypothesis. Beside the Cox model, we will continue studying intensely the field of relative survival, where we have recently developed an entirely new method. We will also extend the scope of our research outside survival analysis. We will investigate the methods for assessing goodness-of-fit of the logistic regression model. The present approaches are based on unit grouping, which has several disadvantages. Our approach will be based on application of results from the theory of stochastic processes, especially Brownian motion. Research in scientometrics is relatively new in general, while it is virtually nonexistent in Slovenia. It is almost impossible to conduct such research without an adequate bibliographic database, so the Biomedicina Slovenica is of fundamental importance for us. Another indispensable tool is a system for automated citation analysis, which we have also developed. The third key factor is selection of appropriate indicators, which has also been a field of our experiseje for a number of years. It is only the combination of these three components that gives one with the possibility to work on research evaluation, even though the methodological problems do not end there. Namely, all the various bibliographic databases are organised in a way that prevents the usage of standard data-analytic approaches. Hence, we have developed a system based on OLAP (On Line Analytical Processing) methodology that transforms data from bibliographic databases into a multidimensional orthogonal structure, which can then be analysed by means of the usual (statistical) methods. Two of our staff recently published an article on this in Scientometrics, the leading journal in the field. During the next five years, we will be mainly interested in research trends in Slovene medicine, as well as the ifluence of the number of authors, inter-institutional co-operation, authors' citation history and other factors on the impact of publications. Data mining in bibliographic databases is a novel approach to browsing such databases. So far, we have developed a system for supporting biomedical discovery. The system aids researchers in creating new hypotheses, which can then be tested using the established research methods. Our approach treats hypotheses as relations between biomedical concepts that have not been published in the scientific literature yet.The core of the system is the Medline bibliographic database, which is joined with the LocusLink, HUGO, OMIM and UniGene genetic databases in the present version. This makes the system particularly useful for discovering new relations in the field of genetics, such as predicting candidate genes for a new disease.
Significance for science
We have already been for quite a time among the world leading groups regarding the field of the measures of explained variation in survival analysis, and probably the only team having the programme tools for the calculation of all important measures. Now, the same could be said for the field of the relative survival. Our package relsurv (authored by Maja Pohar Perme) is the only integral package in the world. It is included into the R repository CRAN and is being used widely. We had also contributed important theoretical contributions to the methodology of the relative survival, which can be supported by the fact that Janez Stare was invited lecturer on three international conferences (ISCB 2004, RoeS 2005 and ELN 2007) and will be invited discussant on ISI 2009 conference where Maja Pohar Perme will be invited lecturer. Four invited lectures at the Universities of Milano, Torino (Stare), Vienna and Copenhagen (Pohar Perme) should also be added here. The tools, which we are developing in the field of the knowledge discovery in literature databases, are actively helping the researchers while searching for relevant literature and with the generation of new research hypotheses. Our tools are very well known and cited worldwide in the medical informatics research community, but we would like to spread the knowledge and use of these tools among the end users – scientists and practitioners from the biomedical field. Presently we co-operate the most closely with the genetics community in Slovenia.
Significance for the country
All our research stems from our basic maxim – to offer the best possible support to the research work in Slovenian medicine. Without our research the support of the fields of biostatistics and scientific informatics would be much poorer. Our contribution to the methodology of the assessment of the research work had undoubtedly been crucial for the fact that in the past the medicine had the most transparent and objectively regulated assessment of the research projects and programmes. A good part of our experiences has been used with the development of criteria presently used by ARRS. Our work supports the research and routine work in medicine and, consequently, technological development in Slovenian medicine. Each activity with the wide international response strengthens the national identity. Our research undoubtedly echoes in the circles that were never before addressed from Slovenia.
