| Gesture recognition uses computer technology to capture and further analyze the movements and gestures made by human hands,and convert gestures into corresponding computer information.With the rapid development of neural networks in the field of machine vision,many gesture recognition applications based on neural network algorithms have emerged,but the number of parameters and calculations of neural network models are large,and it is difficult to effectively run on embedded devices with limited computing resources.For this reason,some researchers have carried out the design or deployment work of lightweight neural network.The lightweight neural network has fewer parameters and calculations under the premise of ensuring the expressiveness of the model,which makes it possible to run neural network applications on embedded devices,however,in the target or keypoint detection task,there is inevitably a certain gap between the detection accuracy of some lightweight neural networks and large neural networks,and it is necessary to further improve the detection accuracy of lightweight neural networks.Meanwhile,some researchers often choose specific hardware platforms for the deployment of neural networks,the inference process of neural networks heavily relies on the hardware environment specific to these platforms,resulting in the difficulty of the neural network algorithm to be easily transplanted to other embedded platforms,and the price of the hardware platform is too expensive,which limits the promotion of neural network applications.In view of the above problems,this paper is based on two existing lightweight neural networks,and mainly improves the network from the prediction method.And a real-time and universality embedded platform gesture recognition scheme is proposed.Specifically,the main research contents are as follows:(1)Aiming at the defects of the Anchor-Based prediction method in the YOLO-Fastest V2 network,a lightweight hand detection network based on Anchor-Free is studied.This network is responsible for selecting the position of the hand in the RGB camera image,and its parameters are only 0.227 M,giga floating-point operations(GFLOPs)are 0.222.Anchor-Free can effectively alleviate the negative impact of introducing anchor boxes in Anchor-Based and adopts simple optimal transport assignment as the positive and negative sample matching strategy during training.A dedicated dataset Lite Gesture Dataset was established for gesture interaction scenarios,and the network was trained using Mosaic data enhancement.Experiments show that the improved network has slightly reduced parameter and computational complexity while achieving significant improvement in mean average precision(m AP)on Lite Gesture Dataset.(2)Aiming at the problem that the heatmap-based keypoint prediction method in the lightweight high-resolution network will produce large quantization errors under low-resolution input,which will affect the accuracy.To solve this problem,a lightweight hand keypoint detection network based on simple coordinate classification is studied.The network is responsible for locating 21 hand keypoints in the hand area,with only 0.955 M parameters and0.111 GFLOPs.In this paper,two additional fully connected layers are added after the output layer of the heatmap,and each pixel of the input image is divided into several sub-pixels on the abscissa and ordinate,and the keypoint positioning task is changed into a classification task of sub-pixel coordinates.The abscissa and ordinate of the keypoints are located at the sub-pixel level according to the maximum classification probability to compensate for the decrease in accuracy caused by the decrease in resolution.The experimental results also fully confirm that the improved scheme has significantly improved the positioning accuracy of keypoints of the hand in the original network.(3)In response to the issue that researchers currently use specific hardware platforms in the deployment of neural network applications while ignoring the generality and portability of software,a gesture recognition scheme for embedded platforms with high generality and portability is proposed.Using the NCNN inference framework to convert the neural network model and deploy it on the Rock pi 3A embedded platform equipped with domestic RK3568 So C(System on a Chip),when only using CPU for inference,the inference time of hand detection network is 23.85 ms and hand keypoint detection network is 31.35 ms,meeting high real-time requirements.A hand gesture recognition scheme suitable for embedded platforms is composed of a hand detection network and a hand keypoint detection network.The scheme realizes the recognition of static gestures through hand keypoint,the recognition of dynamic gestures through partial keypoint trajectories and dynamic time warping algorithm,and different human-computer interaction functions according to the gesture detection results.Moreover,the scheme does not rely on specific hardware and can be easily deployed to other hardware platforms through different compilation methods. |