Projects / Programmes source: ARIS

Learning a large number of visual object categories for content-based retrieval in image and video databases

Research activity

Code Science Field Subfield
2.07.07  Engineering sciences and technologies  Computer science and informatics  Intelligent systems - software 

Code Science Field
P176  Natural sciences and mathematics  Artificial intelligence 

Code Science Field
1.02  Natural Sciences  Computer and information sciences 
Computer vision, modeling visual categories of objects, learning visual categories, visual categorization, hierarchical shape modeling, image databases, video databases, interactive user interfaces.
Evaluation (rules)
source: COBISS
Researchers (7)
no. Code Name and surname Research area Role Period No. of publicationsNo. of publications
1.  19284  PhD Marko Boben  Computer intensive methods and applications  Researcher  2010 - 2013  84 
2.  29381  PhD Luka Čehovin Zajc  Computer science and informatics  Researcher  2010 - 2013  124 
3.  24057  PhD Sanja Fidler  Mathematics  Researcher  2010  32 
4.  30155  PhD Matej Kristan  Computer science and informatics  Researcher  2010 - 2013  323 
5.  05896  PhD Aleš Leonardis  Computer science and informatics  Head  2010 - 2013  455 
6.  18198  PhD Danijel Skočaj  Computer science and informatics  Researcher  2010 - 2013  309 
7.  34398  PhD Domen Tabernik  Computer science and informatics  Researcher  2011 - 2013  50 
Organisations (1)
no. Code Research organisation City Registration number No. of publicationsNo. of publications
1.  1539  University of Ljubljana, Faculty of Computer and Information Science  Ljubljana  1627023  16,226 
In the last decade we have witnessed a significant growth in the number and size of digital image and video databases. Due to an increasing importance of visual information efficient organization and access to a desired content is becoming a crucial issue. However, the exploitation of image databases is still far from optimal. The reason is that we still lack general means of machine-based image interpretations which would reduce the need for laborious manual annotation yet enable retrieval with semantic keys. Image-based concepts thus have to be general and should comply with the human visual interpretations of images. We therefore need to develop methods which are able to learn from large amounts of data and from user's interaction and produce semantically rich visual representations that could be used for indexing into image and video databases. Current approaches are predominantly based on extracting low-level features and classification of entire images and video segments. These kind of approaches are relatively ineffective, and generally unintuitive for the user, due to the "semantic gap", i.e., the lack of meaningful connections between the low-level image features, which are perceived by the computer, and the semantic meaning of the visual content, as perceived by the user. We thus require new models, which, by modeling directly on the level of context, would bridge this semantic gap. In this project we will develop large-scale hierarchical object class models that are based on the intuitive principle of compositionality for the purpose of visual retrieval. Preliminary steps towards such representations have been made within the EU project POETICON where we developed methods for autonomous learning of compositional hierarchies for a few object classes. The main focus of this project, however, will be on modeling and learning a larger number of visual object categories within a hierarchical compositional framework which will allow for a computationally efficient recognition, online object learning and semantic visual retrieval. This approach will enable continuous learning of novel object categories through user interaction and autonomous indexing of object categories in image databases. It will also open up new views of computer-user interaction in terms of continuous user-in-the-loop based semantic queries and queries at different levels of detail which retain their semantic meaning. The major contribution of the project is thus three-fold: we will extend a hierarchical compositional representation of object shape to model a larger number of object classes, develop algorithms that will learn these representations in an online fashion within the continuous interaction with the user and use the learned representation for vision-based retrieval in large image and video databases. Since the compositional architecture of the representation reflects the human perception of objects and their parts, the user will be able to interact with the proposed model on a highly semantic level, guide online learning of novel concepts and build on the existing knowledge to make more and more complex and semantic queries. The interaction with the user will play an important role, since it is the user who structures and measures the suitability of semantic representations. The project takes its place at the very peak of the scientific fields of computer, as well as, artificial cognitive vision. The project is the first holistic proposal of using hierarchical categorical representations for learning, indexing and querying in visual databases. For this reason, it has a very high relevance and scientific excellence within both scientific areas. The results of this project will be published in major journals in the field of computer vision and visual databases. We also foresee an immediate application of the project's results to the media and telecommunications industries as well as in the emerging area of cognitive robotics.
Significance for science
The research and development of the presented methods for recognition of large number of categories is current highly active research field with many application opportunities. Our achieved project goals are a step forward in research of categorization and detection of objects from images and video sequences as they are based on a compressed object representation which allow for quick image analysis with small spatial requirements. This also opens a possibility for new research areas such as visual querying using mobile devices which have low computational capabilities. The improvements of the hierarchical methods are in large applicable to the wide spectrum of hierarchical compositional models. Proposed methods of the improved shape representations are an important step towards integration of the articulated categories into the compositional models. With the generality of our developed methods for automatic taxonomy construction using coarse-to-fine procedures and parallel implementations, we showed a possible direction for efficient implementation of wide spectrum of otherwise slower compositional methods. Our methods for online learning present a progress in a large field of online learning methods as they can be applied to generative as well as discriminative models. Proposed histogram of compositions descriptor and methods for selection of discriminative parts contribute to the research area of the feature selection, and partially also contribute to the specific area of object categorization and detection. Additionally, our project results are also a step towards bridging the semantic gap between features used by the computers and semantic meaning of the image as interpreted by the user.
Significance for the country
Successful upgrade and speed-up of the hierarchical models used for the detection of visual categories is a step towards a fast and accurate methods for visual querying over the image collections. This will have a positive economic benefits and it could also lead to new business models. Besides direct economic effects steaming from the knowledge of the efficient representation of the visual categories, we also expect positive effects for broader society. As demonstration of visual object detection we have also published our developed distributed platform for querying visual information in a form of a special web service. As an example we have also developed and published the same service as an application for mobile devices running Android operating system. We are also transferring the knowledge obtained during the project to the industry by collaborating with companies on the development of computer vision applications for mobile devices. The approaches for massive image processing on parallel architectures developed during the project are also a basis for providing computer vision as a service in a cloud which is currently highly interesting commercial area. The implemented system points to a direction of developing modern methods for querying image collections without any intermediate textual metadata. The technologies we are developing are a part of enabling technologies for new economies, which are based on services that provide visual querying and are a good basis for opening new high-tech companies.
Most important scientific results Annual report 2010, 2011, 2012, final report, complete report on dLib.si
Most important socioeconomically and culturally relevant results Annual report 2010, 2011, 2012, final report, complete report on dLib.si
Views history