Projects / Programmes
Development and applications of new semantic data mining methods in life sciences
Code |
Science |
Field |
Subfield |
2.07.07 |
Engineering sciences and technologies |
Computer science and informatics |
Intelligent systems - software |
Code |
Science |
Field |
P176 |
Natural sciences and mathematics |
Artificial intelligence |
Code |
Science |
Field |
1.02 |
Natural Sciences |
Computer and information sciences |
Data mining, knowledge discovery, semantic data mining, semantic web services, workflows
Researchers (18)
no. |
Code |
Name and surname |
Research area |
Role |
Period |
No. of publicationsNo. of publications |
1. |
19116 |
PhD Špela Baebler |
Biotechnology |
Researcher |
2013 - 2016 |
320 |
2. |
06989 |
PhD Andrej Blejec |
Mathematics |
Researcher |
2013 - 2016 |
292 |
3. |
28806 |
PhD Miha Grčar |
Computer intensive methods and applications |
Researcher |
2013 - 2016 |
85 |
4. |
12688 |
PhD Kristina Gruden |
Biotechnology |
Researcher |
2013 - 2016 |
999 |
5. |
28291 |
PhD Petra Kralj Novak |
Computer science and informatics |
Researcher |
2013 - 2016 |
130 |
6. |
34098 |
PhD Janez Kranjc |
Computer science and informatics |
Researcher |
2013 - 2016 |
25 |
7. |
08949 |
PhD Nada Lavrač |
Computer science and informatics |
Head |
2013 - 2016 |
873 |
8. |
36912 |
PhD Dragana Miljković |
Computer science and informatics |
Researcher |
2014 - 2016 |
71 |
9. |
21397 |
PhD Helena Motaln |
Biochemistry and molecular biology |
Researcher |
2013 - 2016 |
213 |
10. |
03323 |
PhD Igor Mozetič |
Computer science and informatics |
Researcher |
2013 - 2016 |
184 |
11. |
29617 |
PhD Marko Petek |
Biotechnology |
Researcher |
2013 - 2014 |
179 |
12. |
29539 |
PhD Vid Podpečan |
Computer science and informatics |
Researcher |
2013 - 2016 |
106 |
13. |
34502 |
PhD Živa Ramšak |
Biology |
Researcher |
2013 - 2016 |
123 |
14. |
27503 |
PhD Ana Rotter |
Biotechnology |
Researcher |
2013 - 2014 |
335 |
15. |
07736 |
PhD Bojan Sedmak |
Biochemistry and molecular biology |
Researcher |
2013 - 2016 |
237 |
16. |
34262 |
PhD Anže Vavpetič |
Computer science and informatics |
Junior researcher |
2013 - 2016 |
30 |
17. |
32811 |
PhD Urška Verbovšek |
Biotechnology |
Junior researcher |
2013 - 2015 |
30 |
18. |
23582 |
PhD Martin Žnidaršič |
Computer science and informatics |
Researcher |
2013 - 2016 |
168 |
Organisations (2)
Abstract
Knowledge discovery in databases is the area of computer science aimed at automatic search and exploration of large volumes of data with the goal of finding new hypotheses in the form of models and patterns automatically induced from the data. The discovered models/patterns are especially interesting if they are unexpected or if they contribute to the confirmation of yet unproven hypotheses. The limitation of current publicly available data mining and knowledge discovery platforms is their capacity of dealing only with simple tabular data. However, motivated by the increasing volume of semi-structured, heterogeneous and distributed data, the objective of the proposed SemDM project is to address this challenge and enhance the currently available data mining platforms by the ability to make use of distributed, heterogeneous information and knowledge sources, required for data analysis in knowledge-intensive domains.
The project has the following objectives:
- To develop new algorithms for Semantic Data Mining (SemDM) which will enable knowledge discovery from data stored in heterogeneous (structured, semi-structured and unstructured) and distributed data and knowledge sources, including semantically annotated data stored in publicly available ontologies (Gene Ontology and other knowledge sources available in the Linked Open Data cloud).
- To develop a novel, science-oriented data mining platform ClowdFlows which will upgrade our recently developed Orange4WS platform, to enable browser-based construction of innovative data mining workflows from local and distributed data processing and mining services.
- To apply and validate the proposed service-oriented Semantic Data Mining approach to two case studies, one in breast cancer data analysis and another in the discovery of glioma patients subgroups to validate novel molecular markers.
In the glioma case study, JSI and NIB researchers will jointly try to find new discoveries concerning glioblastoma (GBM), the most common and most aggressive form of glioma cancer. Recently, several biomarkers have been proposed as prognostic and predictive factors with respect to the patient’s therapy responsis, but so far none of them was applied in therapeutics. There is a need to decipher the interactive relationships among contributing genes in the clinical arena to make fast and accurate diagnosis of tumor grade and predict the prognosis of a particular patient. We argue that this can be achieved by a systems biology approach based on discovering subgroups of GBM patients, most likely based on their cell of origin (stem cells) and infiltrating stromal (stem) cells, resulting in distinct patterns of tumor progression. The project application aims to take advantage of studying GBM cancer stem cells and stromal supporting cells to identify genes - biomarkers that are relevant for GBM prognosis and targeting.
The project will contribute to the development of new Semantic Data Mining algorithms, the improvement of their public accessibility through the web-based ClowdFlows platform, and to the generation of new knowledge in medical and bioinformatics domains. The work on this project will be performed in close collaboration of data mining experts from Jožef Stefan Institute (JSI) with domain experts from National Institute of Biology (NIB).
Significance for science
The importance of the SemDM project is demonstrated by the development of a new knowledge discovery paradigm, which was initially implemented in the Orange4WS system and then transferred to the ClowdFlows web-based platform for data mining. Compared to the current data mining technology, the paradigm shift was achieved by the development of the following approaches: 1. System for semantic data mining g-SEGS and SDM-SEGS, which use ontologies as background knowledge in the learning process and are available in the Orange4WS data mining platform. 2. The Hedwig algorithm for semantic data mining, which offers improved search of semantic rules and was used in a new domain for interpretating groups of cancer patients. By using new algorithms for semantic data mining and the platform ClowdFlows, which was developed at JSI, we have already improved results in several application domains with the focus on medicine and bioinformatics. The results were evaluated by the experts from the National Institute of Biology in Ljubljana. The development of the ClowdFlows platform has enabled further research in this area. The new ClowdFlows platform can be used for building and executing data mining workflows in all modern web browsers.
Significance for the country
The project on semantic data mining in life sciences is multidisciplinary and has successfully integrated the work of two research groups (JSI in NIB) from two different scientific disciplines (computer science and biology). Algorithm Hedwig, various propositionalization methods, the approach to constructing biological models based on expert knowledge and scientific literature, and the new approach to incremental development of biological networks based on natural language processing enabled the discovery of new biological knowledge. The developed approaches are actively used at the National Institute for Biology in Ljubljana. The project has enabled also high quality education of young researchers, their integration into current research activities as well as their active international collaboration. In addition, the basic research contributes also to advances in the field of information technologies, while the applications in interdisciplinary areas, such as bioinformatics, contribute to the creation of new ideas in applied fields of research and to raising the practical utility of advanced information technologies.
Most important scientific results
Annual report
2013,
2014,
2015,
final report
Most important socioeconomically and culturally relevant results
Annual report
2013,
2014,
2015,
final report