Projects / Programmes source: ARIS

Statistical methods for high-dimensional “omics” data integrated with semantic relations from the biomedical literature

Research activity

Code Science Field Subfield
3.08.00  Medical sciences  Public health (occupational safety)   

Code Science Field
B110  Biomedical sciences  Bioinformatics, medical informatics, biomathematics biometrics 

Code Science Field
3.05  Medical and Health Sciences  Other medical sciences 
Bioinformatics, medical informatics, biomathematics, biometrics
Evaluation (rules)
source: COBISS
Researchers (16)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  24866  Samo Belavič Pučnik  Public health (occupational safety)  Researcher  2014  88 
2.  30722  PhD Rok Blagus  Systems and cybernetics  Researcher  2011 - 2014  195 
3.  20134  PhD Mojca Čižek Sajko  Human reproduction  Researcher  2011 - 2013  106 
4.  22621  PhD Polonca Ferk  Metabolic and hormonal disorders  Researcher  2011  141 
5.  11373  PhD Dimitar Hristovski  Computer science and informatics  Researcher  2011 - 2014  145 
6.  29102  Irena Janjić    Technical associate  2011 - 2014 
7.  29860  PhD Voyko Kavcic  Neurobiology  Researcher  2014  153 
8.  24344  PhD Nataša Kejžar  Systems and cybernetics  Researcher  2011 - 2014  156 
9.  15355  PhD Branimir Leskošek  Public health (occupational safety)  Researcher  2011 - 2013  181 
10.  29917  PhD Lara Lusa  Public health (occupational safety)  Researcher  2011 - 2014  250 
11.  30409  PhD Stojan Pečlin  Information science and librarianship  Researcher  2011 - 2012  18 
12.  23437  PhD Maja Pohar Perme  Public health (occupational safety)  Researcher  2011 - 2013  291 
13.  34292  Gregor Ramovš    Technical associate  2012 - 2014 
14.  32048  PhD Anamarija Rebolj Kodre  Systems and cybernetics  Junior researcher  2011 - 2014  23 
15.  08992  PhD Janez Stare  Public health (occupational safety)  Head  2011 - 2014  277 
16.  32581  PhD Minja Zorc  Computer science and informatics  Researcher  2011  180 
Organisations (1)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  0381  University of Ljubljana, Faculty of Medicine  Ljubljana  1627066  46,179 
The high-throughput technologies developed in the late Nineties for the measurement of gene expression, triggered the research and development of similar tools in all the “omics” fields (genomics, proteomics, …). The main characteristic of the high-dimensional data produced using high-throughput technologies is that the number of measured variables greatly exceeds the number of samples included in the experiment. Over the last ten years there has been an increasing production and use of high-dimensional data in biomedical research, and many statistical methods were developed to appropriately deal with them.   Two common scientific goals of biomedical studies that use high-dimensional data are (i) to develop rules to accurately predict the class membership of new samples (class prediction) and (ii) to identify the variables whose distributions differ between some pre-specified classes (class comparison). A third emerging goal is (iii) the integration of different data sources to obtain more reliable results from class prediction or class comparison studies. The different data sources can be either previously published data addressing the same scientific question, or measurements regarding different “omics” and possibly clinical aspects on the same samples (for example when gene expression measurements and copy number analysis are performed on the same subjects and clinical information is available).   The aim of this project is to continue our ongoing statistical methodological research for high-dimensional class prediction and class comparison studies, and to evaluate strategies and develop some new and much needed methods for the integration of high-dimensional data deriving from different sources. Additionally, we will integrate semantic relations extracted from the biomedical literature. The biomedical literature is still the largest source of biomedical knowledge. However, due to its huge volume computer-based methods are needed to process it and to make it available to other applications. We will extract semantic relations from the literature by using natural language processing methods. The extracted semantic relations represent the facts expressed in the literature in a machine-processable form suitable to be integrated with our statistical methods, too.
Significance for science
The members of our research group have been working in the fields of time-to-event data, producing many scientific contributions documented in our bibliography. In the last few years we have also been working on methodological and applied projects with high-dimensional data. Furthermore, we developed publicly available tools for the analysis of genomic data and for their interpretation. This project gave us the opportunity to put together the different know-how of our group and to continue our research in this fast developing field. We believe that, as expected, this project contributed many important results from the methodological and from the applied point of view. Our aim was to provide a better understanding of the properties of some existing methods and to develop new methods and tools for the analysis and interpretation of high-dimensional “omics” data. We think that we achieved this goal because we contributed methodologically innovative results and developed new publicly available tools for the analysis and interpretation of high-dimensional “omics” data. Moreover, we collaborated with biomedical researchers, jointly planning and analyzing their high-dimensional "omics" data.
Significance for the country
The technologies used in the “omics” research are rapidly evolving, producing data that require the development of new statistical methods and of new tools for their analysis and interpretation. The number of biomedical researchers that use "omics" data has been rapidly increasing. It is well known that many of the research papers that use novel technologies have many flaws in the design and analysis of the data, because the understanding of problems related to a new type of data is initially very limited. The statisticians and bioinformaticians involved in this project provided support to the slovenian researchers. We analyzed high-dimensional "omics" data that were generated by slovenian researchers and helped them in the planning of the experiments and with the interpretation of the obtained results. We are convinced that the methodological knowledge that we acquired during the project significantly improved the quality of the collaborations and of the scientific outputs. This constitutes one of the missions of our institute, which is to provide the highest quality support to the research in medicine in Slovenia.
Most important scientific results Annual report 2011, 2012, 2013, final report, complete report on dLib.si
Most important socioeconomically and culturally relevant results Annual report 2012, 2013, final report, complete report on dLib.si
Views history