Learning a large number of visual object categories for content-based retrieval in image and video databases

Code

J2-3607 (B) - included in ARIS records

Head

PhD Aleš Leonardis

Period

5/1/2010 - 4/30/2013

Range in 2013

0.55 FTE

Science

Natural sciences and mathematics (2)
Engineering sciences and technologies (5)

Reseacher status

Researcher (7)
Junior expert or technical associate (0)

Education

Doctoral degree (7)

Sex

Woman (1)
Man (6)

Status

Employed at RO and RRD (6)
No data on employment in RO (1)

No. of publications

10–99 (3)
100–999 (4)

Projects / Programmes source: ARIS

Learning a large number of visual object categories for content-based retrieval in image and video databases

Research activity

Code	Science	Field	Subfield
2.07.07	Engineering sciences and technologies	Computer science and informatics	Intelligent systems - software

Code	Science	Field
P176	Natural sciences and mathematics	Artificial intelligence

Code	Science	Field
1.02	Natural Sciences	Computer and information sciences

Keywords

Computer vision, modeling visual categories of objects, learning visual categories, visual categorization, hierarchical shape modeling, image databases, video databases, interactive user interfaces.

Evaluation (metodology)

Evaluation of bibliographic research performance indicators according to ARIS methodology

Citations Citations for bibliographic records in COBIB.SI that are linked to records in citation databases

Organisations (1) , Researchers (7)

1539 University of Ljubljana, Faculty of Computer and Information Science

no.	Code	Name and surname	Research area	Role	Period	No. of publicationsNo. of publications
1.	19284	PhD Marko Boben	Computer intensive methods and applications	Researcher	2010 - 2013	84
2.	29381	PhD Luka Čehovin Zajc	Computer science and informatics	Researcher	2010 - 2013	158
3.	24057	PhD Sanja Fidler	Mathematics	Researcher	2010	32
4.	30155	PhD Matej Kristan	Computer science and informatics	Researcher	2010 - 2013	368
5.	05896	PhD Aleš Leonardis	Computer science and informatics	Head	2010 - 2013	457
6.	18198	PhD Danijel Skočaj	Computer science and informatics	Researcher	2010 - 2013	357
7.	34398	PhD Domen Tabernik	Computer science and informatics	Researcher	2011 - 2013	59

Abstract

In the last decade we have witnessed a significant growth in the number and size of digital image and video databases. Due to an increasing importance of visual information efficient organization and access to a desired content is becoming a crucial issue. However, the exploitation of image databases is still far from optimal. The reason is that we still lack general means of machine-based image interpretations which would reduce the need for laborious manual annotation yet enable retrieval with semantic keys. Image-based concepts thus have to be general and should comply with the human visual interpretations of images. We therefore need to develop methods which are able to learn from large amounts of data and from user's interaction and produce semantically rich visual representations that could be used for indexing into image and video databases. Current approaches are predominantly based on extracting low-level features and classification of entire images and video segments. These kind of approaches are relatively ineffective, and generally unintuitive for the user, due to the "semantic gap", i.e., the lack of meaningful connections between the low-level image features, which are perceived by the computer, and the semantic meaning of the visual content, as perceived by the user. We thus require new models, which, by modeling directly on the level of context, would bridge this semantic gap. In this project we will develop large-scale hierarchical object class models that are based on the intuitive principle of compositionality for the purpose of visual retrieval. Preliminary steps towards such representations have been made within the EU project POETICON where we developed methods for autonomous learning of compositional hierarchies for a few object classes. The main focus of this project, however, will be on modeling and learning a larger number of visual object categories within a hierarchical compositional framework which will allow for a computationally efficient recognition, online object learning and semantic visual retrieval. This approach will enable continuous learning of novel object categories through user interaction and autonomous indexing of object categories in image databases. It will also open up new views of computer-user interaction in terms of continuous user-in-the-loop based semantic queries and queries at different levels of detail which retain their semantic meaning. The major contribution of the project is thus three-fold: we will extend a hierarchical compositional representation of object shape to model a larger number of object classes, develop algorithms that will learn these representations in an online fashion within the continuous interaction with the user and use the learned representation for vision-based retrieval in large image and video databases. Since the compositional architecture of the representation reflects the human perception of objects and their parts, the user will be able to interact with the proposed model on a highly semantic level, guide online learning of novel concepts and build on the existing knowledge to make more and more complex and semantic queries. The interaction with the user will play an important role, since it is the user who structures and measures the suitability of semantic representations. The project takes its place at the very peak of the scientific fields of computer, as well as, artificial cognitive vision. The project is the first holistic proposal of using hierarchical categorical representations for learning, indexing and querying in visual databases. For this reason, it has a very high relevance and scientific excellence within both scientific areas. The results of this project will be published in major journals in the field of computer vision and visual databases. We also foresee an immediate application of the project's results to the media and telecommunications industries as well as in the emerging area of cognitive robotics.

Significance for science

The research and development of the presented methods for recognition of large number of categories is current highly active research field with many application opportunities. Our achieved project goals are a step forward in research of categorization and detection of objects from images and video sequences as they are based on a compressed object representation which allow for quick image analysis with small spatial requirements. This also opens a possibility for new research areas such as visual querying using mobile devices which have low computational capabilities.  The improvements of the hierarchical methods are in large applicable to the wide spectrum of hierarchical compositional models. Proposed methods of the improved shape representations are an important step towards integration of the articulated categories into the compositional models. With the generality of our developed methods for automatic taxonomy construction using coarse-to-fine procedures and parallel implementations, we showed a possible direction for efficient implementation of wide spectrum of otherwise slower compositional methods. Our methods for online learning present a progress in a large field of online learning methods as they can be applied to generative as well as discriminative models. Proposed histogram of compositions descriptor and methods for selection of discriminative parts contribute to the research area of the feature selection, and partially also contribute to the specific area of object categorization and detection. Additionally, our project results are also a step towards bridging the semantic gap between features used by the computers and semantic meaning of the image as interpreted by the user.

Significance for the country

Successful upgrade and speed-up of the hierarchical models used for the detection of visual categories is a step towards a fast and accurate methods for visual querying over the image collections. This will have a positive economic benefits and it could also lead to new business models. Besides direct economic effects steaming from the knowledge of the efficient representation of the visual categories, we also expect positive effects for broader society. As demonstration of visual object detection we have also published our developed distributed platform for querying visual information in a form of a special web service. As an example we have also developed and published the same service as an application for mobile devices running Android operating system. We are also transferring the knowledge obtained during the project to the industry by collaborating with companies on the development of computer vision applications for mobile devices. The approaches for massive image processing on parallel architectures developed during the project are also a basis for providing computer vision as a service in a cloud which is currently highly interesting commercial area. The implemented system points to a direction of developing modern methods for querying image collections without any intermediate textual metadata. The technologies we are developing are a part of enabling technologies for new economies, which are based on services that provide visual querying and are a good basis for opening new high-tech companies.

Most important scientific results

Annual report 2010, 2011, 2012, final report

Most important socioeconomically and culturally relevant results

Annual report 2010, 2011, 2012, final report

Learning a large number of visual object categories for content-based retrieval in image and video databases

Views history

Favourite

Learning a large number of visual object categories for content-based retrieval in image and video databases

FRASCATI classification

CERIF classification

FORD classification

Confirmation required

Views history

Favourite