Face recognition systems exploiting Gabor filters are at present amongst the most robust and efficient face-based biometric systems. Commonly, these systems adopt a number of complex Gabor filters using the magnitude responses of the filtering operation to derive useful features for the recognition task. In our paper, we extend this common approach and introduce a novel feature type derived from Gabor phase responses. We show that our Gabor phase features contain complementary information to Gabor magnitude features and that the two feature types result in competitive recognition performance.
Due to the widespread use of web-cams and mobile devices embedded with a camera, it is now possible to realize facial video recognition, rather than resorting to just still images. This paper presents an evaluation of person identity veri?cation using facial video data. It involves 18 systems made available by seven academic institutes such as IDIAP, the University of Surrey, the University of Ljubljana, etc. These systems provide for a diverse set of assumptions, allowing us to assess the effect differences in approaches for video-to-video face authentication.
This paper presents the results of our efforts to obtain the minimum possible finite-state representation of a pronunciation dictionary. Finite-state transducers are widely used to encode word pronunciations and our experiments revealed that the conventional redundancy-reduction algorithms developed within this framework yield suboptimal solutions. We found that the incremental construction and redundancy reduction of acyclic finite-state transducers creates considerably smaller models (up to 60%) than the conventional, non-incremental (batch) algorithms implemented in the OpenFST toolkit.
A gradient descent transformation for the decoupling of emotion and speaker information contained in the acoustic features is presented. The Interspeech ’09 Emotion Challenge feature set is used as the baseline for the audio part. For the video signal the nuisance attribute projection is used to derive the transformation matrix representing emotional state features. The audio and video sub-systems are combined at the matching score level. The presented system is assessed on the eNTERFACE ’05 database where improved recognition performance is observed compared to the stat-of-the-art systems.