Font Size: a A A

The Research Of Lip-Reading And Its Application

Posted on:2006-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:G M JieFull Text:PDF
GTID:2144360182475104Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Human speech perception is inherently a multimodal process. Visual information in the speaker's mouth region is also an important speech information source except for acoustic signal source. Visual speech has many potential applications and has motivated significant interest in automatic recognition of visual speech, formally known as automatic lipreading, or speechreading. The past work of this area mainly focuses on integrating speechreading with automatic speech recognition to improve speech recognition system performance. In this paper, we are interested in applying the speechresding technology to rehabilitation of the speech handicapped. We design a speechreading system based on isolated words. In this paper, we firstly review the development of speechreading technology. Then we propose a novel speechreading system. The system consists of three components: a visual front end module, a visual feature extraction module and a speech recognizer module. The visual front end module mainly obtains video sequences of speechreading by an image capturing system and locates the lip movements of a given speaker. Thus, we design an image capturing system. The system uses specific video IC and is designed based on CPLD, DSP and USB. The system has many good performances such as high speed, plug-play and so on. The visual feature extraction module extracts relevant speech features. Lip movements contain most visual speech information, thus visual analysis mainly focuses on lip feature extraction. We design a chromatic filter to enhance the raw images and then use a novel active contour model to extract visual features in the chroma space. Our active contour model adds new adaptive forces on snake points to push and bend the closed snake curve. With this modification, the snake does not depend on the position of original contour and can quickly converge on real target. The recognizer trains and recognizes each word by using the visual features extracted by the visual feature extraction module. Our recognizer is designed based on continuous hidden Markov model. Hidden Markov model is a powerful stochastic tool of modeling and identifying sequential signals. A HMM is a finite state automation with two concurrent stochastic processes. The state sequence models the temporal evolution of speech and the output distributions attached to the states model the visual speech features occurring at the state. We mainly discuss the training problem and recognition problem of the HMM. With the visual information alone, our system can be able to achieve 61% recognition accuracy for 5 isolated words of a single speaker. The last part of this paper gives concluding remarks.
Keywords/Search Tags:Speechreading, Image Capturing, USB, Active Contour Model, Hidden Markov Model
PDF Full Text Request
Related items