Loading...
Projects / Programmes source: ARIS

BIOMEDICAL DATA FUSION BY NONNEGATIVE MATRIX TRI-FACTORIZATION

Research activity

Code Science Field Subfield
1.07.02  Natural sciences and mathematics  Computer intensive methods and applications  Optimisations 

Code Science Field
P160  Natural sciences and mathematics  Statistics, operations research, programming, actuarial mathematics 

Code Science Field
1.01  Natural Sciences  Mathematics 
Keywords
non-negative matrix factorization, p-matching problem; data co-clustering; patients sub-typing; drug repurposing; high-performance computing
Evaluation (rules)
source: COBISS
Researchers (14)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  25993  PhD Sergio Cabello Justo  Mathematics  Researcher  2017 - 2020 
2.  35058  PhD Primož Drešar  Process engineering  Junior researcher  2017 
3.  02859  PhD Jože Duhovnik  Mechanical design  Retired researcher  2017 - 2020 
4.  24802  PhD Tomaž Finkšt  Mechanical design  Researcher  2019 - 2020 
5.  29631  PhD Boštjan Gabrovšek  Mathematics  Researcher  2019 
6.  50783  PhD Timotej Hrga  Computer intensive methods and applications  Junior researcher  2018 - 2020 
7.  22314  PhD Peter Korošec  Computer science and informatics  Researcher  2017 - 2020 
8.  18291  PhD Gregor Papa  Computer science and informatics  Researcher  2017 - 2020 
9.  24328  PhD Aljoša Peperko  Mathematics  Researcher  2018 - 2019 
10.  22649  PhD Janez Povh  Computer intensive methods and applications  Researcher  2017 - 2020 
11.  34728  PhD Nataša Pržulj  Computer science and informatics  Head  2017 - 2020 
12.  51223  Laurentino Quiroga Moreno    Technical associate  2018 - 2020 
13.  30891  PhD Vida Vukašinović  Computer science and informatics  Researcher  2017 - 2020 
14.  03430  PhD Janez Žerovnik  Mathematics  Researcher  2017 - 2020 
Organisations (3)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  0101  Institute of Mathematics, Physics and Mechanics  Ljubljana  5055598000 
2.  0106  Jožef Stefan Institute  Ljubljana  5051606000  18 
3.  0782  University of Ljubljana, Faculty of Mechanical Engineering  Ljubljana  1627031 
Abstract
The central research objective of the proposed project is the development of new, efficient and accurate methods for non-negative matrix factorization problems applied to real-world, complex biomedical data. The goal is to help answer foremost biomedical questions of precision medicine: patient stratification, biomarker discovery and drug-repurposing. The central mathematical optimization problem that we will study is penalized nonnegative matrix tri-factorization (PNMTF), which is a non-convex high-dimensional optimization problem, hence (unless P=NP) there is no efficient algorithm to solve it to optimality. Therefore, we will focus on developing state-of-the-art approximate algorithms using Fixed Point Method and Coordinate Descend Method combined with variants of first and second order methods. Analysis of theoretical and practical performance will be provided. The central data science problem that we will work on is how to use the near optimum solutions of PNMTF to obtain good co-clustering and good multi-associations within the underlying heterogeneous networked data points. The bottleneck of this part is another NP-hard problem, the k-partite matching problem for k)=3. We will design new heuristics for k-partite matching (k)=3) carefully tuned to approximately solve this problem for our particular applications, and finally, develop state-of-the-art algorithms for co-clustering and multi-association detection of biomedical networked data. The new methods mentioned above will be applied to Rheumatoid Arthritis (RA) patients' data and to cancer patients' data to co-cluster them and infer new relations from the wealth of all data collectively. I expect to: identify novel genes and single-nucleotide polymorphisms (SNPs) important for the Methotrexate treatment of RA (possibly new biomarkers) and to characterize the patients at higher risk of discontinuation of this treatment due to adverse effects (patient stratification); provide the best patient stratification thus far by identifying the clusters of patients with significantly different clinical outcomes based on simultaneous co-clustering of all heterogeneous data; better identify the clusters of genes that are enriched in known driver mutations for particular cancers, hence identifying potential new biomarkers for cancer patients that would be taken back to clinic by my medical collaborators; predict new drug-target relations and therefore identify new drug candidates that could be re-purposed for the treatment of specific cancer patients or patient groups, hence aiding personalizing treatment. All new medial predictions revealed within this project will be medically validated and taken back to clinic by my medical collaborators from University of Ljubljana and University College London. The participating institutions already possess state-of-the art knowledge in mathematical optimization, computer and data science and also have the state-of-the-art middle-size high-performance computing infrastructure available for the purposes of this project. I will therefore merge these competencies and enhance them with my experience and expertise in building and applying new data mining algorithms carefully tuned for extraction of biomedical knowledge from large and complex networked heterogeneous biomedical data to achieve the goals of the project. A special advantage of this project is that the new methods to solve PNMTF and to perform data co-clustering and multi-associations detection will be coded to run efficiently on a high-performance computing facility and will be made available to the scientific community as a free, open-source, user-friendly software package.
Significance for science
I believe that we are currently at a unique time in the history of science when we have accumulated large amounts of versatile systems-level complex, interconnected molecular and clinical data, so that the advances in data integration methods proposed within this project will contribute to biomedical understanding and therapeutics, thus potentially having ground-breaking impacts on public health. Beside this, the proposed project will also contribute to the mathematics, computer science and data science by: providing world class evidence that these fields can benefit a lot from using high-performance computers, which is aligned with priorities from Mathematics for Digital Science (http://www.euro-math-soc.eu/system/files/news/Mathematics for Digital Science.pdf), published by EC; providing new algorithms and efficient (parallelised) code to solve special variants of the non-negative matrix tri-factorization and related p-matching problem; providing new methods for data fusion based on the solutions of (penalised) non-negative matrix tri-factorizations; demonstrating pathway how mathematics, computer science and data scinence can jointly contribute to solving the foremost open problems in medicine, like precision medicine and drug-repurposing.
Significance for the country
A particular strength of this proposal is that it will be guided by needs of my medical and pharmaceutical collaborators, who will biologically validate our in-silico findings related to precision medicine and drug-repurposing. I already have a working collaboration with Prof. Vita Dolžan from the University of Ljubljana who has provided us via legal way with her Rheumatoid Arthritis patient and molecular data. My University College London (UCL) - based medical collaborators, including Prof. Mark Emberton, the dean of UCL Faculty of Medical Sciences, will medically test my predictions and bring them back to clinic. My industrial partners, including Diagenomi d.o.o. from Ljubljana, GSK and J&J will be also involved for testing the outcomes of the project. The project will make broader impacts in several areas. It will increase the Slovenian and EU competence in Big Data Analytics and underpin research in mathematical optimization, computer and data science to stake a claim and make a larger impact at the heart of the emerging interdisciplinary field of precision medicine and drug-repurposing. The project will also demonstrate that medicine is the area where the researchers from mathematics, computer science, data science and medicine should cooperate and use state-of-the-art computer infrastructure to achieve real impact. This is a direct response to the challenges posed by EC in the document Mathematics for Digital Science. (http://www.euro-math-soc.eu/system/files/news/Mathematics for Digital Science.pdf).
Most important scientific results Interim report, final report
Most important socioeconomically and culturally relevant results Interim report, final report
Views history
Favourite