Reliable visual tracking is beneficial for various techniques of motion analysis, as it can be applied as a preprocessing step for region stabilization, removing the large-scale motions and enabling detailed low-level motion analysis. But the question of what constitutes “good” tracking has not been fully addressed so far. The problem of visual tracking evaluation thus offers an abundance of performance measures, but largely suffers from lack of consensus about which measures should be preferred. This is hampering the cross-paper tracker comparison and faster advancement of the field. The paper provided a critical analysis of the popular measures for short-term tracking performance evaluation and evaluated them experimentally by a large-scale tracking experiment. Analysis of various visualizations of the performance measures was developed as well. Results show that several measures are equivalent from the point of information they provide for mid-level motion estimator comparison and, crucially, that some are more brittle than the others. Based on this analysis, the spectrum of available measures is narrowed down to only a few complementary ones, thus approaching towards homogenization of the evaluation methodology.
We present a novel approach to activity recognition, which is based on primitive features that encode pure motion. These are coupled with a hierarchical scheme to learn motion patterns (compositions) from a single short training video. During the inference process, these learned patterns are extracted from the analyzed videos and used with chi-square SVM classifier in a ”bag of compositions” approach. The process is computationally efficient and the method is well-suited for implementation on massively parallel architectures. Due to their compositional nature, motion patterns can be trained incrementally (layer by layer) and stored efficiently. Inference is fast and the final feature vectors are of relatively low dimension, thus enabling fast classifier training. On the standard UCF Sports Action Dataset, the presented method outperforms pure-motion-based state-of-the art approaches.
Paper describes a parallel asynchronous master-slave implementation of DEMO, an evolutionary algorithm for multiobjective optimization. The implementation extends the use of DEMO from single-processor use, to multiple interconnected multi-processor computers. It achieves high efficiency even on heterogeneous computer architectures. The paper describes a parallel algorithm, its differences from the serial algorithm, and introduces a new measure of parallelism efficiency for the evolutionary algorithms.
Autonomous robots that operate in unstructured environments must be able to seamlessly expand their knowledge base. To identify and manipulate previously unknown objects, a robot should be able to acquire new object knowledge when no prior information about the objects or the environment is available. We propose to improve incremental visual object learning and recognition by exploiting motion cues, induced by interactive manipulation and foveated vision. We proposed two methods for validating object hypotheses in the foveal view and experimentally show the advantage of foveated vision for object learning.
This work focuses on a detailed analysis of discriminative capabilities of a hierarchical compositional model, where we have identified the use of only generative learning, combined with excessive feature sharing as an important factor contributing to poor discriminative performance. Histogram of Compositions is introduced as a viable solution to address this issue. It is independent of specific modalities, which allows its application to shape features, as well as to 3D, motion, or music features. HoC improves the discriminative power by merging the high-level detection information with the low-level features that are important for discrimination. We apply our solution to the problem of object detection using shape-specific features, where an extensive evaluation on five datasets shows a significant improvement of the overall detection performance. The extended version of this technical report has been submitted to the journal Computer Vision and Image Understanding and has passed the first round of reviewing.