| Sign language is the most natural and effective communication tool for hearing-impaired individuals.Efficient sign language recognition systems help understand the intentions of deaf and mute individuals and make it easier for them to communicate with non hearing-impaired individuals,which has received widespread research attention.Vision based sign language recognition system has the characteristics of fast recognition speed and high accuracy,and has been applied in many fields.However,the existing methods have shortcomings in the use of gesture features,data pre-processing,etc.,and the existence of background interference,gesture deformation and other problems makes similarity and micro gesture recognition recognition less effective,which affects the generalization ability of the model.This thesis studies a visual based sign language recognition method,which mainly includes image based static sign language recognition and video based dynamic sign language recognition.The following work has been done:(1)For static sign language recognition,in order to solve the problem of low accuracy of similar and tiny gesture recognition recognition caused by image background interference,this thesis proposes an image segmentation method based on depth distance information(SD Segment),which realizes pixel alignment under the condition of unknown camera internal parameters,and uses image depth information to separate the gesture background from the sign language image to remove image background interference,To improve the recognition accuracy of similar and small gestures,this thesis proposes a Dual path Feature fusion Attention Network(DFANet)to address the issue of insufficient utilization of gesture features.The network extracts gesture images and depth features respectively,and utilizes the spatial relationship of the depth feature map to strengthen the network’s utilization of gesture distinguishing features and improve network recognition effectiveness.The experimental results show that the accuracy of sign language recognition without background has been significantly improved,and the utilization of gesture distinguishing features has improved the generalization ability of the network.(2)For dynamic sign language recognition,this thesis proposes an adaptive frame fetching algorithm(AFFA)to address the problem of insufficient preprocessing of dynamic sign language data leading to a large amount of model training.This method calculates the similarity of adjacent image frames and effectively screens out most invalid and duplicate data through similarity discrimination rules,This reduces its impact on dynamic sign language recognition and significantly improves model training speed.In response to the difficulty in extracting spatiotemporal features of dynamic sign language,a 3D Feature Attention Convolutional Neural Networks Long Short Term Memory(3D FACNN-LSTM)is proposed to extract spatiotemporal features of dynamic sign language,effectively improving the accuracy of dynamic sign language recognition.At the same time,a new ten person dynamic sign language dataset was built for algorithm evaluation.The experimental results show that the proposed method effectively reduces the training time of the model and significantly improves the recognition accuracy of the model,achieving good recognition results on multiple datasets. |