| Among the massive research domains of Computer Science,Human Computer Interaction Techniques have been paid much attention from experts and researchers all around the world for the last decades.Understanding the human being language and take the instruction directly will make machine more efficient and powerful.Many approaches have been tried to achieve that goal.For now,speech recognition seems to be a good solution.It has been used in several applications successfully.Unfortunately, when it comes to the environment with noisy or multi-speaker background, performances of current speech recognition systems are not good enough to use.One important reason is that not only the acoustics but also the visual information is useful when we exchange information with other people.Thus,Lip-Reading technology is considered as a reasonable method to enhance the performance.That is the original motivation of Lip-Reading research.Lip-Reading means get the information form the speaker according his or her movement of the lip.The aim of the Lip-Reading research is to find the rules and make the computer being able to read our lips.Therefore,to achieve the goal,we have to solve many really tough problems.Some of them come from our language and some from the capabilities of our machine.For example,it would be very hard to define the relationship between the shapes of lip and its meaning.Because people many be say some different words with same shapes.Furthermore,to complete the function of Lip-Reading,many other technologies have to be used,such as image processing.The performance of a Lip-Reading system would be restricted by these technologies.According the current achievement of this area,we analysis the main processes of a Lip-Reading system construction in this paper.The key point of such a system is the recognition module.We introduce many ways to do this.Considering the successful in speech recognition,we use Hidden Markov Module(HMM).Here we explain the definition and the main idea of HMM.We explain its power and advantage of being a language recognition module.In this part,we also describe the main problems of HMM and the solutions.These algorithms are crucial for the application of HMM.Also,here we introduce the category of HMMs and their structures.In the paper we introduce the technologies of constructing a Lip-Reading system part by part.Generally speaking,the work should be divided into three parts:digital image object recognition for finding the location of lips,feature extraction,and training the HMM.Right now,there are some implementations of HMM available,so for the time being,this paper is focus on the first two parts.In term of digital image object recognition,our method is developed base on Paul's face detection method.First of all,we need some video data preparation.The video records are composed of specific person speech of several works.Then we need to process each frame of the video as on image target one by one to find the location of the lip region.We describe the details of the cascade classifier which was raise by Paul and used in face detection successfully.Roughly,the main idea of it is applying the Haar-like feature on the sub-windows of the image.The using of cascade classifier composed of series weaker classifiers gains a well performance.And the using of integration image algorithm make the calculation fast.Comparing to the previous methods,it is more accurate and faster.And what more is the algorithm is easy to used to detect other object with other set of features.So we use this method to detect the lip region.Ever it is not as good as the performance gained on face detection,the method still could work it out here.Feature extraction is the thing we should do after the region of interest is located. In terms of feature extraction,we introduce the main method which was or could be used in the Lip-Reading system and describe their advantages and disadvantages. According to that,here we used the method base on extract vectors directly from the ROI pixels.To use this,we have to change the ROI image into a gray-scale one,then save each value of the pixel as one element of the feature vector.The most remarkable advantage is it could keep as much information as we need,maybe even more.Since now we do not have much idea about background knowledge from linguistics or lip movement research,this could offer us a simple but effective way to do the work.But the disadvantage is this kind of methods is sensitive to translation,zooming in or zooming out,illumination changing and speaker changing.These could be leave alone for the time being,but what is more is that this kind of vectors is usually redundancy.In terms of compressing the vectors,we choose PCA and LDA to get the work done here.Principal Component Analysis is used to discard the elements which contribute little to the discrimination,such as the element having the same value in all vectors.Discarding these non-contributable elements will make things easy to handle with almost no lost.Linear Discriminate Analysis is meant to maximize the difference among samples from different classes and minimize the difference among the ones from the same class.In this paper,we describe the details of these algorithms and the implementation.For discrete HMM(DHMM),the compressed feature vectors need to be clustered first before using them to train or test the recognition module.The centers of classes are selected by using k means algorithm.For the continues HMM,which is the more general case,we just need to convert the feature vectors to certain format.In terms of the recognition module,we use two HMM implementations to complete the system construction.DHMM for simple test and HTK for more complicated system.In conclusion,our work is mainly about construct a Lip-Reading system for research and test.For each step of this work,we describe the method we used and the methods could be used.There is lots of work could be done to gain a better performance.This system provides us some basic data and makes us know more about what we should do next.The data and conclusion could be used as the reference of the further research and development. |