Projects / Programmes
Analysis of large text datasets
Code |
Science |
Field |
Subfield |
2.07.07 |
Engineering sciences and technologies |
Computer science and informatics |
Intelligent systems - software |
Code |
Science |
Field |
T171 |
Technological sciences |
Microelectronics |
machine learning, text learning, learning on the Web, information retreival
Researchers (1)
no. |
Code |
Name and surname |
Research area |
Role |
Period |
No. of publicationsNo. of publications |
1. |
12570 |
PhD Dunja Mladenić |
Computer science and informatics |
Head |
1999 - 2001 |
662 |
Organisations (1)
no. |
Code |
Research organisation |
City |
Registration number |
No. of publicationsNo. of publications |
1. |
0106 |
Jožef Stefan Institute |
Ljubljana |
5051606000 |
90,724 |
Abstract
The research will be focused at the development of new and improvement of the existing computer methods for the analysis of large text datasets. Special emphasis will be put on the analysis of Slovenian text. The developed methods will enable automatic document categorization of Slovenian text, adaptation of the existing methods for text-learning to Slovenian texts, analysis of text datasets based on the new, extended document representation and better Web browsing by using a personal browsing assistant based on the new text analysis methods. The development of different applications will be enabled, including automatic updating of some existing document categorizations that are currently updated manualy, like for example, the categorization of Slovene Web documents named žMat Kurja'' or the specialized categorization of Slovenian text documents žBiomedicina Slovenica’, a national bibliography for biomedicine.