Font Size: a A A

Research And Implementation Of Sign Language Recognition Based On Deep Learning

Posted on:2024-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:H B ZhangFull Text:PDF
GTID:2568306926967969Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Sign Language Recognition is a major task in the field of Artificial Intelligence in the field of Computer Vision.Sign language recognition algorithms have been widely applied in human-computer interaction,text translation of images and videos,multimodal recognition,motion prediction,and normal communication between spoken and sign language populations.The task goal of sign language recognition is to achieve accurate conversion between action information and text information by detecting the hand movements of characters in images and videos,and then translating them into natural language text information.This type of algorithm establishes a communication bridge between sign language communicators and others,so more and more researchers are investing their time and experience in this field of research.The research work on sign language recognition in this article mainly includes two main aspects:the research and implementation of isolated word sign language recognition and the research and implementation of continuous sentence sign language recognition.Accurate recognition of isolated word actions is achieved by extracting key points of bones.Appropriate backbone network models are explored in continuous sentence sign language recognition.The recognition error rate of the algorithm is further reduced by adding time pyramid structure and feature fusion algorithm based on self attention mechanism.The specific work of this article mainly includes the following points:(1)In order to solve the problem of slow detection speed and difficulty in real-time detection and translation in sign language recognition,this paper tests the backbone network effect of multiple feature extraction,designs a lightweight recognition algorithm network,and explores different recognition methods,such as the recognition method based on hand joint point extraction and the recognition method based on image feature extraction.(2)In the research of isolated word recognition,in order to solve the problems of fast and large action amplitude in sign language,and difficulty in extracting feature information,etc.This article uses the hand bone point extraction method to input the real-time captured data from the camera into the isolated word recognition network to extract the coordinate information of 21 key points of the hand in each frame.The coordinate data is normalized to form a Numpy format file dataset,and the isolated word sign language recognition model is trained using this dataset.(3)In the research of continuous sentence sign language recognition,in order to solve the problem of redundant information in sign language video frame sequences,the recognition accuracy decreases.This article utilizes a three-dimensional convolutional neural network and a two-dimensional plus one-dimensional convolutional network to enhance feature information extraction in videos,in order to reduce the impact of redundant information on the overall translation performance of the network.(4)To solve the problem of inconsistency between sign language grammar and natural language grammar in the process of continuous sign language translation.This article uses a convolutional bidirectional recurrent neural network to establish contextual temporal relationships between video frames,and accurately grasps the relationship between sign language vocabulary using the front to back and back to front timelines.The auxiliary classifier and auxiliary loss function are set by SVM classifier method to enhance the extraction of features to increase accuracy.Solve the problem of misalignment between image information and label information through the CTC(Connectivist temporary classification)algorithm.(5)In order to solve the problem of difficulty in accurately grasping the segmentation of video sequences due to the different durations of sign language actions of different vocabulary in the process of continuous sign language translation.This article introduces a time pyramid segmentation method and a feature fusion algorithm based on attention mechanism.By segmenting and extracting features of different lengths of video frames,the model can pay attention to sign language information of different sizes and lengths,taking into account both fine-grained features and significant action changes.By combining various granularity features organically through attention mechanisms,the need for accurate segmentation of gesture image frames in the model is reduced,and the recognition accuracy of the model is increased.
Keywords/Search Tags:sign language translation, temporal network, feature extraction, attention mechanism, bidirectional circular neural network, three-dimensional convolutional neural network
PDF Full Text Request
Related items