| Sign language utilizes hand and body movements to convey information.It is the main way of communication for deaf and hearing-impaired people.To help people understand sign language and facilitate the daily life of deaf people,sign language recognition technology has received extensive attention.The study of sign language recognition(SLR)can not only facilitate the development of computer vision and artificial intelligence technology,but also improve the level of intelligent service and humanistic care of the society,promoting its harmonious progress.SLR includes isolated SLR(ISLR)and continuous SLR(CSLR).CSLR is the ultimate goal and the key to application.Since sign language semantics only depend on gestures and actions,skeleton data,which can effectively exclude the influence of background,illumination,and camera angles,has gradually played an important role in SLR research in recent years.The non-Euclidean characteristics of skeleton data also make the introduction of graph structure inevitable.Therefore,this thesis focuses on the study of CSLR methods based on graph attention.The main contents and contributions are as follows:(1)Aiming at the problems of complex backgrounds and difficult extraction of hand details in sign language applications,a CSLR method based on a spatial-temporal graph attention network is proposed.In the data preprocessing stage,the pose estimation algorithm is used to extract the human skeleton nodes,and a spatial-temporal graph attention network is designed,which is composed of graph attention network and temporal convolution network,so that the model paid more attention to the detail features of the hands and the temporal relationship.In the sequence learning phase,BLSTM is used to learn the long-term context dependence of all frames,and CTC is used to align input and output to avoid temporal segmentation.Experimental results on open sign language datasets CSL and Con GD show that the proposed method can extract the spatial-temporal features of skeleton data well to realize CSLR.(2)Aiming at the problems of multi-modal fusion efficiency,interactivity between skeleton nodes and redundant information,a two-stream SLR method based on interactive attention mechanism and improved graph convolutional network is proposed.To compensate for the vulnerability of RGB data to illumination and angles,skeleton features are used as queries in the multi-head attention mechanism,and RGB features are used as keys and values to explicitly construct and detect the important dependency relationship between the features of two streams.A cascaded attention shift graph convolutional network is proposed to extract skeleton features.The features of the two-handed nodes are shifted by hand shift decoupling graph convolution network to optimize the spatial features of the extracted skeleton data,and the decoupling mechanism is used to increase the characterization ability of the network.To eliminate the redundancy,the spatial,temporal and channel cascaded attention modules are used to make the model pay more attention to the important information useful for semantic recognition.The recognition accuracy of CSL dataset is higher than that of method 1,and competitive experimental results are obtained in the domain public dataset PHOENIX 2014.A large number of ablation experiments show that this method can effectively capture the essential features of sign language. |