Study On Audio-Visual Speaker Localization And Tracking

Posted on:2009-07-14

Degree:Doctor

Type:Dissertation

Country:China

Candidate:N G Jin

Full Text:PDF

GTID:1118360242984633

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Speaker localization and tracking is an active research topic with the increasing applications in multimedia field, including human computer interaction, video conferencing and robot navigation. The proposed traditional approaches to speaker localization and tracking problem include face tracking based on computer vision and sound source localization based on microphone array. However, face tracking suffers from the change of illumination and pose, and sound source localization is affected by background noise and room reverberation. These single-modal approaches are not robust to complex dynamic environments. How to improve the precision and robustness of localization and tracking system is an open problem.Audio-visual speaker localization and tracking problem is a important research on the fusion of computer vision and computer audition. Its aim is to estimation the position of speaker using both audio and visual information. Multisensor data fusion technology is employed to solve audio-visual speaker localization and tracking problem.The main contributions of this thesis can be summarized as follows:(1) Multi-sensor data fusion technique is applied to speaker tracking problem, and a novel audio-visual speaker tracking approach based on dynamic Bayesian network is proposed. Based on the complementarity and redundancy between speech and image of speaker, three kinds of perception methods, including sound source localization based on microphone array, face detection based on skin color information, and maximization mutual information based on audio-visual synchronization are proposed to acquire the tracking information. In the framework of dynamic Bayesian network, particle filtering is used to fuse the tracking information, and perception management is achieved to improve the tracking efficiency by information entropy theory.(2) A new method based on weighted subspace fitting is presented for sound source localization and tracking. In the framework of Bayesian estimation, the dynamical model of speaker motion and the likelihood function suited for wide-band speech signal is constructed, and sound source location is estimated by particle filtering.(3) A new speaker localization and tracking method based on microphone array is presented in noisy and reverberant environments. In the framework of particle filtering, the echo-free onset signal is extracted according to the echo-avoidance model of the precedence effect, and likelihood function is constructed by the output power of beamformer. Considering...

Keywords/Search Tags:

Audio-Visual

PDF Full Text Request

Related items

1	Study On Generation Of Spatial Audio Using Audio-Visual Cues
2	Research On Semantic Analysis And Understanding Of Multimodal Video
3	Multimodal Cognitive Learning For Audio-visual Data
4	Research On Algorithm Of Audio-visual Event Recognition And Sound Source Localization Based On Audio-visual Fusion
5	Audio And Visual Expression Of Words Profile-The Analysis Of The TV Program Chasing The World In Books
6	Speech Endpoint Detection Based On Audio And Visual Features
7	Study On Audio-Visual Speaker Localization And Tracking
8	The Development And Evolution Of Network Audio Visual Regulation In China
9	Audio-Visual Speech Recognition And Its Applications
10	Robust and efficient techniques for audio-visual speech recognition