Projects / Programmes
BIOMEDICAL DATA FUSION BY NONNEGATIVE MATRIX TRI-FACTORIZATION
Code |
Science |
Field |
Subfield |
1.07.02 |
Natural sciences and mathematics |
Computer intensive methods and applications |
Optimisations |
Code |
Science |
Field |
P160 |
Natural sciences and mathematics |
Statistics, operations research, programming, actuarial mathematics |
Code |
Science |
Field |
1.01 |
Natural Sciences |
Mathematics |
non-negative matrix factorization, p-matching problem; data co-clustering; patients sub-typing; drug repurposing; high-performance computing
Researchers (14)
no. |
Code |
Name and surname |
Research area |
Role |
Period |
No. of publicationsNo. of publications |
1. |
25993 |
PhD Sergio Cabello Justo |
Natural sciences and mathematics |
Researcher |
2017 - 2020 |
214 |
2. |
35058 |
PhD Primož Drešar |
Engineering sciences and technologies |
Junior researcher |
2017 |
24 |
3. |
02859 |
PhD Jože Duhovnik |
Engineering sciences and technologies |
|
2017 - 2020 |
1,027 |
4. |
24802 |
PhD Tomaž Finkšt |
Engineering sciences and technologies |
Researcher |
2019 - 2020 |
30 |
5. |
29631 |
PhD Boštjan Gabrovšek |
Natural sciences and mathematics |
Researcher |
2019 |
68 |
6. |
50783 |
PhD Timotej Hrga |
Natural sciences and mathematics |
Junior researcher |
2018 - 2020 |
23 |
7. |
22314 |
PhD Peter Korošec |
Engineering sciences and technologies |
Researcher |
2017 - 2020 |
229 |
8. |
18291 |
PhD Gregor Papa |
Engineering sciences and technologies |
Researcher |
2017 - 2020 |
349 |
9. |
24328 |
PhD Aljoša Peperko |
Natural sciences and mathematics |
Researcher |
2018 - 2019 |
186 |
10. |
22649 |
PhD Janez Povh |
Natural sciences and mathematics |
Researcher |
2017 - 2020 |
324 |
11. |
34728 |
PhD Nataša Pržulj |
Engineering sciences and technologies |
Principal Researcher |
2017 - 2020 |
95 |
12. |
51223 |
Laurentino Quiroga Moreno |
|
Technician |
2018 - 2020 |
0 |
13. |
30891 |
PhD Vida Vukašinović |
Engineering sciences and technologies |
Researcher |
2017 - 2020 |
58 |
14. |
03430 |
PhD Janez Žerovnik |
Natural sciences and mathematics |
Researcher |
2017 - 2020 |
794 |
Organisations (3)
Abstract
The central research objective of the proposed project is the development of new, efficient and accurate methods for non-negative matrix factorization problems applied to real-world, complex biomedical data. The goal is to help answer foremost biomedical questions of precision medicine: patient stratification, biomarker discovery and drug-repurposing.
The central mathematical optimization problem that we will study is penalized nonnegative matrix tri-factorization (PNMTF), which is a non-convex high-dimensional optimization problem, hence (unless P=NP) there is no efficient algorithm to solve it to optimality. Therefore, we will focus on developing state-of-the-art approximate algorithms using Fixed Point Method and Coordinate Descend Method combined with variants of first and second order methods. Analysis of theoretical and practical performance will be provided.
The central data science problem that we will work on is how to use the near optimum solutions of PNMTF to obtain good co-clustering and good multi-associations within the underlying heterogeneous networked data points. The bottleneck of this part is another NP-hard problem, the k-partite matching problem for k)=3. We will design new heuristics for k-partite matching (k)=3) carefully tuned to approximately solve this problem for our particular applications, and finally, develop state-of-the-art algorithms for co-clustering and multi-association detection of biomedical networked data.
The new methods mentioned above will be applied to Rheumatoid Arthritis (RA) patients' data and to cancer patients' data to co-cluster them and infer new relations from the wealth of all data collectively. I expect to:
identify novel genes and single-nucleotide polymorphisms (SNPs) important for the Methotrexate treatment of RA (possibly new biomarkers) and to characterize the patients at higher risk of discontinuation of this treatment due to adverse effects (patient stratification);
provide the best patient stratification thus far by identifying the clusters of patients with significantly different clinical outcomes based on simultaneous co-clustering of all heterogeneous data;
better identify the clusters of genes that are enriched in known driver mutations for particular cancers, hence identifying potential new biomarkers for cancer patients that would be taken back to clinic by my medical collaborators;
predict new drug-target relations and therefore identify new drug candidates that could be re-purposed for the treatment of specific cancer patients or patient groups, hence aiding personalizing treatment.
All new medial predictions revealed within this project will be medically validated and taken back to clinic by my medical collaborators from University of Ljubljana and University College London.
The participating institutions already possess state-of-the art knowledge in mathematical optimization, computer and data science and also have the state-of-the-art middle-size high-performance computing infrastructure available for the purposes of this project. I will therefore merge these competencies and enhance them with my experience and expertise in building and applying new data mining algorithms carefully tuned for extraction of biomedical knowledge from large and complex networked heterogeneous biomedical data to achieve the goals of the project.
A special advantage of this project is that the new methods to solve PNMTF and to perform data co-clustering and multi-associations detection will be coded to run efficiently on a high-performance computing facility and will be made available to the scientific community as a free, open-source, user-friendly software package.
Significance for science
I believe that we are currently at a unique time in the history of science when we have accumulated large amounts of versatile systems-level complex, interconnected molecular and clinical data, so that the advances in data integration methods proposed within this project will contribute to biomedical understanding and therapeutics, thus potentially having ground-breaking impacts on public health.
Beside this, the proposed project will also contribute to the mathematics, computer science and data science by:
providing world class evidence that these fields can benefit a lot from using high-performance computers, which is aligned with priorities from Mathematics for Digital Science (http://www.euro-math-soc.eu/system/files/news/Mathematics for Digital Science.pdf), published by EC;
providing new algorithms and efficient (parallelised) code to solve special variants of the non-negative matrix tri-factorization and related p-matching problem;
providing new methods for data fusion based on the solutions of (penalised) non-negative matrix tri-factorizations;
demonstrating pathway how mathematics, computer science and data scinence can jointly contribute to solving the foremost open problems in medicine, like precision medicine and drug-repurposing.
Significance for the country
A particular strength of this proposal is that it will be guided by needs of my medical and pharmaceutical collaborators, who will biologically validate our in-silico findings related to precision medicine and drug-repurposing. I already have a working collaboration with Prof. Vita Dolžan from the University of Ljubljana who has provided us via legal way with her Rheumatoid Arthritis patient and molecular data. My University College London (UCL) - based medical collaborators, including Prof. Mark Emberton, the dean of UCL Faculty of Medical Sciences, will medically test my predictions and bring them back to clinic. My industrial partners, including Diagenomi d.o.o. from Ljubljana, GSK and J&J will be also involved for testing the outcomes of the project.
The project will make broader impacts in several areas. It will increase the Slovenian and EU competence in Big Data Analytics and underpin research in mathematical optimization, computer and data science to stake a claim and make a larger impact at the heart of the emerging interdisciplinary field of precision medicine and drug-repurposing.
The project will also demonstrate that medicine is the area where the researchers from mathematics, computer science, data science and medicine should cooperate and use state-of-the-art computer infrastructure to achieve real impact. This is a direct response to the challenges posed by EC in the document Mathematics for Digital Science. (http://www.euro-math-soc.eu/system/files/news/Mathematics for Digital Science.pdf).
Most important scientific results
Interim report,
final report
Most important socioeconomically and culturally relevant results
Interim report,
final report