Loading...
Projects / Programmes source: ARIS

Using Literature-based Discovery for Interpretation of Next Generation Sequencing Results

Research activity

Code Science Field Subfield
5.13.00  Social sciences  Information science and librarianship   

Code Science Field
H100  Humanities  Documentation, information, library science, archivistics 

Code Science Field
5.08  Social Sciences  Media and communications 
Keywords
information science; text mining; literature-based discovery; clinical diagnostic support system
Evaluation (rules)
source: COBISS
Researchers (13)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  50804  Gaber Bergant  Human reproduction  Researcher  2019 - 2022  20 
2.  54468  Tomaž Bratanič    Technical associate  2020 - 2022 
3.  22621  PhD Polonca Ferk  Metabolic and hormonal disorders  Researcher  2019 - 2020  141 
4.  11373  PhD Dimitar Hristovski  Computer science and informatics  Head  2019 - 2022  145 
5.  26484  PhD Andrej Kastrin  Medical sciences  Researcher  2020 - 2022  145 
6.  10467  PhD Matevž Kovačič  Medical sciences  Researcher  2021 - 2022  14 
7.  30697  PhD Anja Kovanda  Neurobiology  Researcher  2019 - 2022  74 
8.  15355  PhD Branimir Leskošek  Public health (occupational safety)  Researcher  2019 - 2020  181 
9.  33230  PhD Nina Ružić Gorenjec  Mathematics  Researcher  2019 - 2020  51 
10.  08992  PhD Janez Stare  Public health (occupational safety)  Researcher  2019 - 2020  277 
11.  54467  Petar Statevski    Technical associate  2020 - 2022 
12.  36368  PhD Marko Vidak  Medical sciences  Researcher  2019 - 2020  22 
13.  56196  PhD Klemen Žiberna  Medical sciences  Researcher  2022  27 
Organisations (2)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  0312  University Medical Centre Ljubljana  Ljubljana  5057272000  76,282 
2.  0381  University of Ljubljana, Faculty of Medicine  Ljubljana  1627066  46,263 
Abstract
Literature-based discovery (LBD) is a text mining technology for automatically generating research hypotheses. The main aim of LBD is uncovering hidden, previously unknown relationships from the scientific literature (existing domain knowledge). LBD approach is based on the assumption that there are two nonintersecting scientific domains. Knowledge in one domain may be related to knowledge in the other, without the relationship being known. Next Generation Sequencing (NGS) is a term collectively describing several different technologies that integrate a massively parallel sequencing approach, thus enabling the sequencing of whole human genomes within reasonable timescales. The development of NGS technologies successfully spread the utility of (clinical) DNA sequencing by reaching an unprecedented speed at a reduced cost, enabling widespread clinical and research use and thus fueling the rapid growth of genomic sciences. This presents a challenge in that pursuing and incorporating the newly discovered data in analytical pipelines becomes more and more difficult. Targeting this issue, we propose this research in using the LBD paradigm to improve the interpretation of NGS results. The major research problem which we address in this project proposal is composed of the following components: (i) theoretical analysis of the MEDLINE bibliographic database and SemMedDB database of semantic relations extracted from MEDLINE as large-scale networks of biomedical concepts which can reveal novel characteristics important for the interpretation of NGS data using LBD methodology; (ii) development of a theoretical model and a data model for interpretation of NGS data using LBD methodology; (iii) development of an open-source Web application for interactive interpretation of NGS data using LBD methodology which will serve as a clinical genetics diagnostics support tool; (iv) development of a methodology for filtering false-positive relations during LBD processing using machine learning methods; (v) mimic the LBD process as a link prediction problem on heterogeneous networks; we will extend and validate the proposed LBD approach on a derived knowledge network using the concept of meta-path and network embedding to further boost the predictive performance of LBD and (vi) validation of the developed methodology by domain experts. We deal with the data from a single patient at a time. The input is two sets of data for each patient, the genotype of discovered genomic variants and the phenotype as observed by the clinical geneticist. The genotype set X contains the genes with mutations as found by diagnostic NGS. The phenotype set Z contains the clinical observations provided by the clinical geneticist described using the human phenotype ontology (HPO) terms. After gathering the relevant datasets, we constructed a graph database in Neo4j. The graph database consists of two major types of nodes, patients and concepts of several types including phenotypes, genes, proteins, cell functions, genetic disorders, and many other biomedical types. Connecting these nodes, we have several different relationship types. For example, the relationship PHENO connects patients with their corresponding phenotype nodes and the relationship GENO connects patients with their respective mutated genes. Additionally, we have included the 30 different types of semantic relations as extracted by SemRep from all of MEDLINE serving as a backbone for patient and phenotype node connection. The output of the algorithm is a set of relevant intermediate concepts Y (such as genetic functions and/or diseases) that link the genotype X to the phenotype Z. These Y concepts should provide a hypothesis that explains the mechanisms for the novel associations that link the genotype to the phenotype. This project will formalize and strengthen our longstanding and recognizable research work on LBD.
Significance for science
The analysis of textual data worldwide is experiencing a remarkable upswing. On the one hand there is the easy availability of such data and the increasing processing capabilities, and on the other hand, there is the need of institutions and industry to deal with complex problem situations related to the understanding of complex systems. The importance of knowledge and technologies for managing complex (i.e., relational) datasets is even greater because they serve as a basis for other scientific fields (e.g., analysis of semantic Web, bioinformatics, economics, and linguistics). The significance of the contents of the proposed research is foreseen in the achievement of excellence in the field of literature-based discovery, text mining, managing relational data set, and network analysis. We strongly believe that the results of the proposed project will contribute significantly to the global knowledge in the field of information technologies, to a further establishment of Slovenian science in the field of text mining, literature-based discovery, and network analysis on the European and global scale and to the transfer of scientific knowledge into practice.
Significance for the country
The analysis of textual data worldwide is experiencing a remarkable upswing. On the one hand there is the easy availability of such data and the increasing processing capabilities, and on the other hand, there is the need of institutions and industry to deal with complex problem situations related to the understanding of complex systems. The importance of knowledge and technologies for managing complex (i.e., relational) datasets is even greater because they serve as a basis for other scientific fields (e.g., analysis of semantic Web, bioinformatics, economics, and linguistics). The significance of the contents of the proposed research is foreseen in the achievement of excellence in the field of literature-based discovery, text mining, managing relational data set, and network analysis. We strongly believe that the results of the proposed project will contribute significantly to the global knowledge in the field of information technologies, to a further establishment of Slovenian science in the field of text mining, literature-based discovery, and network analysis on the European and global scale and to the transfer of scientific knowledge into practice.
Views history
Favourite