| Gesture recognition is an important area of human-computer interaction,which uses computer vision and machine learning techniques to transform human gestures into commands that computers can recognize and understand.Gesture recognition is divided into dynamic gesture recognition and static gesture recognition.Dynamic gesture recognition mainly processes gesture videos,involving technologies such as key gesture tracking and frame extraction.Static gesture recognition mainly realizes the recognition and analysis of gesture commands by recognizing features such as hand posture,position,and shape.At present,static gesture recognition still faces some challenges,such as changes in gestures,noise,background interference,etc.,especially for gesture images with complex backgrounds,the robustness and real-time performance of the algorithm need to be further improved.In static gesture recognition,feature extraction is a very important step.It is the process of converting the gesture information in the original image into a recognizable feature representation.The quality of the extracted gesture features directly affects the performance of the final classifier.The field of image recognition has witnessed new opportunities for development in recent times,owing to the emergence of deep learning.Improving the feature extraction of gesture images through convolutional neural networks can be effective for gesture recognition.Therefore,we choose to study static gesture recognition based on deep learning,and the main work is as follows:(1)To accurately extract key gesture features from gesture images,we propose a SK-DC-Res2Net29 multiscale feature extraction network based on dense connections and Res2 Net.Firstly,to enhance the feature extraction capability of the network,the Res2 Net module is improved using dense connections and packet convolution,and a DC-Res2Net(Densely Connected Res2Net)module is proposed.Next,multiple DC-Res2 Net modules are used to construct a DC-Res2Net29 network based on Res2Net29,forming a preliminary architecture for feature extraction networks.Then,the SK-Net module is added to DC-Res2Net2 to form a SK-DC-Res2Net29 network.During feature extraction,feature selection can be performed to enhance the ability of effective features to be expressed.Finally,the low-level and high-level features of gesture images are extracted using this network.(2)To optimize and fuse the low-level and high-level features of gesture features,we design a Feature Fusion Attention module(FFA).The FFA module focuses on effective gesture features through attention mechanisms,suppresses unimportant redundant information,and maps high-level semantic features into low-level positional features through feature weighting,thereby removing redundant information from low-level features and retaining key information from high-level features,improving the capability of feature expression and attaining the integration of high-level and low-level features.Finally,the proposed model achieves 98.68%,99.56%,and 98.98% accuracy on the OUHANDS,ASL,and NUS-II datasets,respectively.The experimental results show the advantages of SK-DC-Res2Net29 network in feature extraction,by resolving the issue of redundant information,the FFA module facilitates the attainment of a high accuracy rate in gesture recognition with the proposed model.(3)With the objective of addressing the issue of low accuracy in gesture recognition through the use of lightweight models,this article improves the lightweight model Ghost Net.Firstly,to enhance the information exchange between features,the channel shuffle Ghost module is used to optimize the Ghost module,and a channel shuffle Ghost is proposed to improve the feature extraction ability of the model.Subsequently,to expedite the learning of gesture features by lightweight models during training,the Re LU function is substituted with the SMU(Smooth Maximum Unit)activation function,enhancing the learning ability of features during back propagation.Then,a lightweight attention removal mechanism is used to extract noise information from gesture features.Additionally,the model is supplemented with an ECA module that utilizes channel attention.Finally,experiments were conducted on the OUHANDS,ASL,and NUS-II datasets,and the accuracy rates were 97.98%,98.82%,and 99.36%,respectively.After testing,the proposed model has a parameter amount of 1.2 M and a FLOPs of 0.29 G,which is lower than other experimental models.Empirical findings indicate that the proposed method can attain a high degree of accuracy in gesture recognition,while preserving the model’s lightweight attributes. |