Loading...
Projects / Programmes source: ARIS

Analysis of heterogeneous information networks for knowledge discovery in life-sciences

Research activity

Code Science Field Subfield
7.00.00  Interdisciplinary research     

Code Science Field
P176  Natural sciences and mathematics  Artificial intelligence 

Code Science Field
1.02  Natural Sciences  Computer and information sciences 
Keywords
Data mining, knowledge discovery, semantic data mining, workflows, heteregenous networks, plant immune signalling
Evaluation (rules)
source: COBISS
Researchers (16)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  19116  PhD Špela Baebler  Biotechnology  Researcher  2016 - 2018  313 
2.  34130  PhD Anna Coll Rius  Biochemistry and molecular biology  Researcher  2016 - 2018  163 
3.  12688  PhD Kristina Gruden  Biotechnology  Researcher  2016 - 2018  985 
4.  36355  PhD Jan Kralj  Communications technology  Researcher  2017 - 2018  38 
5.  08949  PhD Nada Lavrač  Computer science and informatics  Head  2016 - 2018  867 
6.  50070  PhD Matej Martinc  Linguistics  Researcher  2017 - 2018  84 
7.  36836  PhD Biljana Mileva Boshkoska  Computer science and informatics  Researcher  2017 - 2018  156 
8.  36912  PhD Dragana Miljković  Computer science and informatics  Researcher  2016 - 2018  71 
9.  35475  PhD Matic Perovšek  Computer science and informatics  Researcher  2016  15 
10.  29539  PhD Vid Podpečan  Computer science and informatics  Researcher  2016 - 2018  103 
11.  31844  PhD Senja Pollak  Linguistics  Researcher  2016 - 2018  288 
12.  18467  PhD Maruša Pompe Novak  Biotechnology  Researcher  2016 - 2018  291 
13.  34502  PhD Živa Ramšak  Biology  Researcher  2016 - 2018  118 
14.  37679  Andraž Repar  Linguistics  Researcher  2018  33 
15.  04586  PhD Tanja Urbančič  Computer science and informatics  Researcher  2018  290 
16.  34262  PhD Anže Vavpetič  Computer science and informatics  Researcher  2016 - 2017  30 
Organisations (2)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  0105  National Institute of Biology  Ljubljana  5055784  13,256 
2.  0106  Jožef Stefan Institute  Ljubljana  5051606000  90,682 
Abstract
The proposal addresses knowledge discovery in complex data mining scenarios in life-sciences. With the development of high-throughput molecular biology techniques the data generated are getting into the range of so-called Big Data. Information relevant to a certain biological question is scattered in different public resources in heterogeneous formats and in the form inaccessible to typical biologists. To circumvent this situation, we need to fuse this information into a unique data source to be mined. The aim of the proposed project is to develop, implement, evaluate and apply a new methodology for analyzing large heterogeneous data in the area of life-sciences. The development of the proposed methodology is motivated by a tremendous increase in data generation within life-sciences research, while the means for explanatory knowledge discovery from these large heterogeneous data sources is still lagging behind. We aim to improve the existing data analysis approaches by extending and combining text mining, relational data mining and information fusion methods. In order to evaluate the proposed methodology we will use several benchmark and real-world problems in the area of life-sciences, aiming to advance translational research in agriculture by extracting novel knowledge on plant immune signaling. The project has the following objectives: 1. Development of a new methodology, which will enable fusing texts and complex relational background knowledge into the form of a large heterogeneous information network. This will be achieved by extending our own methodology for mining heterogeneous information networks through contextualizing the information on data instances in terms of available semantic background knowledge (domain taxonomies and ontologies), and by adapting the methodology to big data and complex life-science scenarios.  2. Implementation of the methodology in the ClowdFlows or TextFlows and experimental evaluation of the proposed methodology on publicly available benchmark data sets, including selected medical problems for which large public heterogeneous data sets exist. 3. Application of the methodology to three life-science application scenarios: (i) cross-domain knowledge discovery from documents from two unrelated life-science problems, aiming to uncover yet unknown relations between "redox status" and "plant immune signaling", (ii) mining a time stamped stream of heterogeneous experimental data in the domain of plant immune signaling, and (iii) identification of key components in plant immune signaling determining the outcome of a disease.  The project will contribute to the development of new algorithms for mining large heterogeneous data. Accessibility of the developed methodology will be ensured by implementing the methodology in one of our web data mining platforms ClowdFlows or TextFlows, which will enable the use of the developed technology to the broader research audience and increase its relevance also for life science experts. The research will be performed in close collaboration of data mining experts from JSI with domain experts from NIB.
Significance for science
This project addresses the open problem of assisting scientists with the increasingly daunting task of heterogeneous and distributed information fusion and knowledge discovery. Solving this problem requires the development of a new computational paradigm that integrates ideas from different supporting domains. An adequate solution to this problem will result in new technologies that are relevant to a range of applications, some of which are also mentioned in the EU FP7 ICT work programme, such as Challenge 4 on Content and Challenge 5 on Healthcare. It covers issues such as knowledge management and creation, but goes beyond them in assisting users (particularly scientists) in knowledge discovery across distributed information repositories. The project will advance the state-of-the-art by developing a framework for mining heterogeneous information networks, new data mining algorithms and a new approach to interactively formulate and refine powerful knowledge discovery workflows. Evidently, the proposed project solves an open problem and it is clearly pursing a long term objective with a high technological potential. Successful results of the MinHIN project can contribute to Europe’s knowledge industry enabling it to become more effective, efficient and competitive. The challenges addressed by the MinHIN project cannot be adequately addressed with existing ICT methodologies or their incremental improvements since the methods developed within MinHIN will be substantially different from existing information fusion and knowledge discovery technologies and will require the collaboration of scientists with diverse backgrounds to tackle challenges in innovative information fusion, data mining, distributed information retrieval, and sophisticated user interfaces. A successful outcome of the project may have, firstly, a significant impact on the data mining technology and on science, and in a longer term, when adapted to knowledge discovery, also a considerable impact on the ability of Europe’s private and public sector in public data analysis. The proposed MinHIN project has the potential to implement and demonstrate a paradigm shift in information and knowledge management, discovery, fusion and understanding. The MinHIN prototype will establish a strong scientific and technological basis for a broader, interdisciplinary research community as well as help cultivating the underlying methodologies to a level at which it can attract investment from industry, especially in the pharmaceutical and biotechnology sector.
Significance for the country
Since the project aims at analysis of heterogeneous information networks of potato the project results will directly influnce food industry. Potato is currently the third most important food crop world-wide. It produces high amounts of non-allergic vegetable proteins per hectare and contains many vitamins and health promoting compounds and has thus an increasing significance in the developing world as food crop. Yet its production is currently not optimal due to the high input costs during cultivation needed to achieve appropriate yield and susceptibility to biotic and abiotic factors. EU potato industry is very competitive and is continuously gaining shares worldwide. Hundreds of cultivars are used, many with close cultural and regional ties. In Slovenia, in the 80s, the PVY epidemic completely eliminated sensitive, but at that time leading, potato cultivars which virtually terminated Slovenian seed potato production. Currently there are only a few completely resistant cultivars, but their growing is problematic due to specific Slovenian climate as well as from the perspective of genetic diversity. The research findings of the proposed project will be a basis for precision breeding of environment resilient cultivars.
Most important scientific results Interim report, final report
Most important socioeconomically and culturally relevant results Interim report, final report
Views history
Favourite