Font Size: a A A

Research And Implementation On Sign Language-lip Language Conversion System Based On Monocular Vision

Posted on:2021-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhouFull Text:PDF
GTID:2415330620473749Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In the language teaching of specialized school for hearing-impaired students,the bilingual teaching mode can effectively improve the language learning efficiency of hearing-impaired children,but it will cost more patience,time and energy for special teachers.Under the present situation that Chinese special education schools are in short of teachers,sign language recognition technology can help special teachers to complete the language teaching task,where deaf students can record the sign language video as the input of a computer,and learn the output,which are Chinese characters and lip language,without much help from the teachers.The personalized teaching can also complete the study of Chinese written language.In addition,the computer only recognizes the standard sign language(based on Chinese Sign Language),which can also correct the sign language dialect of deaf children.This paper studies the sign language – lip language conversion system based on monocular vision.The main issue and difficulties lie in sign language recognition and the specific work is as follows:1.Extraction of video key frames.Firstly,four common video-key-frame extraction methods are analyzed briefly.In order to eliminate redundant frames as many as possible under the premise of extracting complete key frames,a cluster-based video key frame optimization and extraction algorithm is proposed.The depth features of the video frames are extracted by the convolutional self-encoder(CAE)neural network.After the extracted features are clustered by K-means,the clearest video frames are filtered out as the key frames for the initial extraction.The point density method is used for secondary optimization.Experimental results show that the algorithm can eliminate redundant frames in a large amount and ensure the integrity of key frames at the same time.2.Gesture recognition of the key frames.Aiming at the small target of hand,some improvements are applied on SSD target detection network: the weights of important channel is improved by embedding SE-Net into the feature layer of SSD;the imbalance of positive and negative samples is combated by changing the loss function;network training is optimized by mixup and normalization.The experimental results show that the improved SSD has higher recognition accuracy.3.Realization of sign language-lip language conversion system.For the practicality and popularization of the system,the colored sign language video is recorded by a monocular camera as the input of the system.For the purpose of natural expression of sign language,there is no need to wear any equipment or make any mark on people's hands when they are talking by signs.The first output of the system is Chinese characters and pinyin,and the second is lip language video corresponding to Chinese characters.Finally,Vue.js and Spring Boot technology are used to build a web page for displaying the whole system.The users of this system are deaf children.It is hoped that they can learn Chinese,including written and spoken Chinese,by using their familiar sign language,without repetitive teaching of teachers.This system could play a certain auxiliary role on the language teaching of deaf schools.The whole system only needs a monocular camera,with no assistance of other techniques or objects,which makes the system more practical,more popular,and has greater application prospects.
Keywords/Search Tags:sign language recognition, K-means, key frame extraction, target detection network SSD
PDF Full Text Request
Related items