Loading...
Projects / Programmes source: ARIS

Structured output prediction with applications in sustainable agricultural production

Research activity

Code Science Field Subfield
2.07.07  Engineering sciences and technologies  Computer science and informatics  Intelligent systems - software 

Code Science Field
T000  Technological sciences   

Code Science Field
1.02  Natural Sciences  Computer and information sciences 
Keywords
machine learning, data mining, structured output prediction, multi-label classification, multi-target regression, sustainable agriculture
Evaluation (rules)
source: COBISS
Researchers (7)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  15660  PhD Marko Debeljak  Biology  Researcher  2016 - 2018  313 
2.  11130  PhD Sašo Džeroski  Computer science and informatics  Head  2016 - 2018  1,204 
3.  32282  PhD Aneta Ivanovska  Computer science and informatics  Researcher  2016 - 2018  125 
4.  31050  PhD Dragi Kocev  Computer science and informatics  Researcher  2016 - 2018  204 
5.  36356  PhD Aljaž Osojnik  Computer science and informatics  Researcher  2018  47 
6.  27759  PhD Panče Panov  Computer science and informatics  Researcher  2016 - 2018  155 
7.  22279  PhD Bernard Ženko  Computer science and informatics  Researcher  2016 - 2018  172 
Organisations (2)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  0106  Jožef Stefan Institute  Ljubljana  5051606000  90,724 
2.  2338  Jožef Stefan International Postgraduate School  Ljubljana  1917544  11,430 
Abstract
Given the increased pervasiveness of data analysis tasks in almost every area of life and their increasing complexity, we are now often facing tasks of structured output prediction (SOP).  Situated within the areas of data mining and machine learning, SOP is concerned with predicting complex structured values (e.g., vectors of real numbers) rather than a scalar value (e.g., a single real number). Examples of SOP tasks can be encountered in many different areas, including agriculture, which is facing an ever increasing array of seemingly conflicting demands: The fast increase of the Earth’s population calls for high yields of quality crops, while increasing environmental pressures on Earth’s ecosystems dictate sustainable agricultural production that burdens the environment as little as possible.  SOP approaches can be roughly divided into local and global. The former decompose the structured output into scalar components and use standard classification/ regression approaches to learn a set of models, each predicting one component. The latter adapt standard approaches for predicting scalar values to handle structured outputs directly and learn a single model that predicts complete structured values.  Despite recent advances, current SOP approaches leave much to be desired. The most successful SOP approaches include tree ensembles for global SOP, which are both effective and efficient: Unfortunately they produce complex models that are not easy to inspect and understand. There is also a lack of a clear understanding whether, when and why global SOP methods perform better than local ones: This is still far from resolved, despite recent investigations concerning interdependencies between output space components and their influence on the relative performance of the two approaches. The goals of the proposed project are to develop novel methods for SOP that will overcome the shortcomings of current SOP methods and apply the newly developed methods to practically relevant problems from the area of sustainable agricultural production. To develop new SOP methods, we will first find ways to estimate the dependencies between the dimensions of the input space (features) and output space (targets), as well as discovering interdependencies in the output space. We will then use the latter to structure and/or decompose the output space and propose methods that exploit this and are situated on the spectrum between local and global methods for SOP. We will also develop novel methods for SOP (such as rule ensembles and option trees) to learn both accurate and understandable models. Finally, we will apply the developed approaches to SOP problems from the area of sustainable agriculture production, where predictive models need to be built that relate agricultural practices to many different aspects of water pollution and different functions of agricultural soils.  We will learn models than can predict the quantity of different types of water outflows from agricultural fields, the content of active substances of phyto-pharmaceuticals therein, and the different aspects/overall risk of exceeding pollution thresholds for these substances. We will also consider the complex task of predicting multiple aspects of agricultural soil functions, incl. i) primary productivity; ii) water regulation and purification; iii) carbon sequestration and regulation; iv) habitat for functional and intrinsic biodiversity and v) nutrient cycling and provision. All of these tasks are of direct practical relevance to and their results will be exploited by the end-user and co-financer of the project, i.e., ARVALIS - Institut du vegetal (France). They will clearly contribute towards achieving its mission of bringing the latest scientific knowledge to French farmers and advising them on selecting best agricultural practices in the context of sustainable agricultural production.
Significance for science
The proposed research will significantly advance the state of the art in the general area of computer science, the specific area of machine learning and data mining, an particularly for the topic of structured output prediction. It will develop new methods for estimating feature relevance, decomposing the output space and learning ensemble models, all in the context of SOP. It will improve the understanding of the relative performance of global and local SOP methods and propose novel methods that would profit from this understanding. It will also contribute to the development of the scientific fields of agricultural ecology and environmental protection. The models learned from data will represent new knowledge about processes of pollutant transfer to water outflows from agricultural fields. Also, models will be learned from data that will contribute new knowledge about the effects of agricultural practices on different ecosystem functions of agricultural soils. Finally, it will contribute to the development of other scientific disciplines where the developed methods for SOP can be applied. This will include other environmental sciences, such as forest ecology, where one might study the relations between environmental factors and community structure, e.g., in the context of predicting the influence of climate change. It will also include other life sciences, such as functional genomics: Tree ensembles for SOP (in particular HMLC) are already one of the very best methods for predicting gene function, and the improved methods to be developed would push even further the envelope of accurately predicting gene functions. Predicted gene functions, e.g., in bacterial genomes can further have many different, practically relevant uses, e.g., in personalized medicine.
Significance for the country
The results of the project will be directly relevant to the end-user/ co-financer ARVALIS. Some of the results of applying SOP methods to data from agricultural ecosystems will be exploited immediately and will be deployed for everyday use by a wide circle of ARVALIS advisors. Other will be exploited in the medium to longer term. The models for predicting water outflows and risks of pesticide/ active substance transfer to water outflows will be deployed for use by the consortium of ARVALIS advisors. They will be part of a decision support system (DSS) for recommending appropriate plant protection measures which are both effective in protecting crops and as environmentally friendly as possible, maximally reducing impact on water pollution. The DSS system will be used by ARVALIS advisors (on desktop and hand-held devices) that give advice to farmers on plant protection measures, taking into account the characteristics of the specific farm and cropping system, as well as their immediate context. In a wider societal context, the project will make it easier for agriculture to meet the many, increasingly conflicting, demands that it faces. On one hand, it will help feed the increasing human population of the Earth. On the other hand, it will contribute to the more rational use of natural resources and the preservation of a clean environment.
Most important scientific results Interim report, final report
Most important socioeconomically and culturally relevant results Interim report, final report
Views history
Favourite