Projects / Programmes source: ARIS

Development and applications of new semantic data mining methods in life sciences

Research activity

Code Science Field Subfield
2.07.07  Engineering sciences and technologies  Computer science and informatics  Intelligent systems - software 

Code Science Field
P176  Natural sciences and mathematics  Artificial intelligence 

Code Science Field
1.02  Natural Sciences  Computer and information sciences 
Data mining, knowledge discovery, semantic data mining, semantic web services, workflows
Evaluation (rules)
source: COBISS
Researchers (18)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  19116  PhD Špela Baebler  Biotechnology  Researcher  2013 - 2016  313 
2.  06989  PhD Andrej Blejec  Mathematics  Researcher  2013 - 2016  287 
3.  28806  PhD Miha Grčar  Computer intensive methods and applications  Researcher  2013 - 2016  85 
4.  12688  PhD Kristina Gruden  Biotechnology  Researcher  2013 - 2016  985 
5.  28291  PhD Petra Kralj Novak  Computer science and informatics  Researcher  2013 - 2016  130 
6.  34098  PhD Janez Kranjc  Computer science and informatics  Researcher  2013 - 2016  25 
7.  08949  PhD Nada Lavrač  Computer science and informatics  Head  2013 - 2016  867 
8.  36912  PhD Dragana Miljković  Computer science and informatics  Researcher  2014 - 2016  71 
9.  21397  PhD Helena Motaln  Biochemistry and molecular biology  Researcher  2013 - 2016  207 
10.  03323  PhD Igor Mozetič  Computer science and informatics  Researcher  2013 - 2016  184 
11.  29617  PhD Marko Petek  Biotechnology  Researcher  2013 - 2014  168 
12.  29539  PhD Vid Podpečan  Computer science and informatics  Researcher  2013 - 2016  103 
13.  34502  PhD Živa Ramšak  Biology  Researcher  2013 - 2016  118 
14.  27503  PhD Ana Rotter  Biotechnology  Researcher  2013 - 2014  328 
15.  07736  PhD Bojan Sedmak  Biochemistry and molecular biology  Researcher  2013 - 2016  232 
16.  34262  PhD Anže Vavpetič  Computer science and informatics  Junior researcher  2013 - 2016  30 
17.  32811  PhD Urška Verbovšek  Biotechnology  Junior researcher  2013 - 2015  30 
18.  23582  PhD Martin Žnidaršič  Computer science and informatics  Researcher  2013 - 2016  165 
Organisations (2)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  0105  National Institute of Biology  Ljubljana  5055784  13,239 
2.  0106  Jožef Stefan Institute  Ljubljana  5051606000  90,600 
Knowledge discovery in databases is the area of computer science aimed at automatic search and exploration of large volumes of data with the goal of finding new hypotheses in the form of models and patterns automatically induced from the data. The discovered models/patterns are especially interesting if they are unexpected or if they contribute to the confirmation of yet unproven hypotheses. The limitation of current publicly available data mining and knowledge discovery platforms is their capacity of dealing only with simple tabular data. However, motivated by the increasing volume of semi-structured, heterogeneous and distributed data, the objective of the proposed SemDM project is to address this challenge and enhance the currently available data mining platforms by the ability to make use of distributed, heterogeneous information and knowledge sources, required for data analysis in knowledge-intensive domains.  The project has the following objectives:  - To develop new algorithms for Semantic Data Mining (SemDM) which will enable knowledge discovery from data stored in heterogeneous (structured, semi-structured and unstructured) and distributed data and knowledge sources, including semantically annotated data stored in publicly available ontologies (Gene Ontology and other knowledge sources available in the Linked Open Data cloud). - To develop a novel, science-oriented data mining platform ClowdFlows which will upgrade our recently developed Orange4WS platform, to enable browser-based construction of innovative data mining workflows from local and distributed data processing and mining services. - To apply and validate the proposed service-oriented Semantic Data Mining approach to two case studies, one in breast cancer data analysis and another in the discovery of glioma patients subgroups to validate novel molecular markers.  In the glioma case study, JSI and NIB researchers will jointly try to find new discoveries concerning glioblastoma (GBM), the most common and most aggressive form of glioma cancer. Recently, several biomarkers have been proposed as prognostic and predictive factors with respect to the patient’s therapy responsis, but so far none of them was applied in therapeutics. There is a need to decipher the interactive relationships among contributing genes in the clinical arena to make fast and accurate diagnosis of tumor grade and predict the prognosis of a particular patient. We argue that this can be achieved by a systems biology approach based on discovering subgroups of GBM patients, most likely based on their cell of origin (stem cells) and infiltrating stromal (stem) cells, resulting in distinct patterns of tumor progression. The project application aims to take advantage of studying GBM cancer stem cells and stromal supporting cells to identify genes - biomarkers that are relevant for GBM prognosis and targeting. The project will contribute to the development of new Semantic Data Mining algorithms, the improvement of their public accessibility through the web-based ClowdFlows platform, and to the generation of new knowledge in medical and bioinformatics domains. The work on this project will be performed in close collaboration of data mining experts from Jožef Stefan Institute (JSI) with domain experts from National Institute of Biology (NIB).
Significance for science
The importance of the SemDM project is demonstrated by the development of a new knowledge discovery paradigm, which was initially implemented in the Orange4WS system and then transferred to the ClowdFlows web-based platform for data mining. Compared to the current data mining technology, the paradigm shift was achieved by the development of the following approaches: 1. System for semantic data mining g-SEGS and SDM-SEGS, which use ontologies as background knowledge in the learning process and are available in the Orange4WS data mining platform. 2. The Hedwig algorithm for semantic data mining, which offers improved search of semantic rules and was used in a new domain for interpretating groups of cancer patients. By using new algorithms for semantic data mining and the platform ClowdFlows, which was developed at JSI, we have already improved results in several application domains with the focus on medicine and bioinformatics. The results were evaluated by the experts from the National Institute of Biology in Ljubljana. The development of the ClowdFlows platform has enabled further research in this area. The new ClowdFlows platform can be used for building and executing data mining workflows in all modern web browsers.
Significance for the country
The project on semantic data mining in life sciences is multidisciplinary and has successfully integrated the work of two research groups (JSI in NIB) from two different scientific disciplines (computer science and biology). Algorithm Hedwig, various propositionalization methods, the approach to constructing biological models based on expert knowledge and scientific literature, and the new approach to incremental development of biological networks based on natural language processing enabled the discovery of new biological knowledge. The developed approaches are actively used at the National Institute for Biology in Ljubljana. The project has enabled also high quality education of young researchers, their integration into current research activities as well as their active international collaboration. In addition, the basic research contributes also to advances in the field of information technologies, while the applications in interdisciplinary areas, such as bioinformatics, contribute to the creation of new ideas in applied fields of research and to raising the practical utility of advanced information technologies.
Most important scientific results Annual report 2013, 2014, 2015, final report
Most important socioeconomically and culturally relevant results Annual report 2013, 2014, 2015, final report
Views history