| Speaker localization and tracking is an active research topic with the increasing applications in multimedia field, including human computer interaction, video conferencing and robot navigation. The proposed traditional approaches to speaker localization and tracking problem include face tracking based on computer vision and sound source localization based on microphone array. However, face tracking suffers from the change of illumination and pose, and sound source localization is affected by background noise and room reverberation. These single-modal approaches are not robust to complex dynamic environments. How to improve the precision and robustness of localization and tracking system is an open problem.Audio-visual speaker localization and tracking problem is a important research on the fusion of computer vision and computer audition. Its aim is to estimation the position of speaker using both audio and visual information. Multisensor data fusion technology is employed to solve audio-visual speaker localization and tracking problem.The main contributions of this thesis can be summarized as follows:(1) Multi-sensor data fusion technique is applied to speaker tracking problem, and a novel audio-visual speaker tracking approach based on dynamic Bayesian network is proposed. Based on the complementarity and redundancy between speech and image of speaker, three kinds of perception methods, including sound source localization based on microphone array, face detection based on skin color information, and maximization mutual information based on audio-visual synchronization are proposed to acquire the tracking information. In the framework of dynamic Bayesian network, particle filtering is used to fuse the tracking information, and perception management is achieved to improve the tracking efficiency by information entropy theory.(2) A new method based on weighted subspace fitting is presented for sound source localization and tracking. In the framework of Bayesian estimation, the dynamical model of speaker motion and the likelihood function suited for wide-band speech signal is constructed, and sound source location is estimated by particle filtering.(3) A new speaker localization and tracking method based on microphone array is presented in noisy and reverberant environments. In the framework of particle filtering, the echo-free onset signal is extracted according to the echo-avoidance model of the precedence effect, and likelihood function is constructed by the output power of beamformer. Considering... |