| Detecting faces and recognizing the emotions of facial expressions through visual sensors,and identifying emotions of human gesture by extracting human skeleton points is an important way for intelligent robots to carry out human-computer interaction.To explore the high accuracy of facial emotion perception and human gesture emotion perception methods become the key to achieve effective human-computer interaction.On the other hand,in practical applications,there is partial occlusion or short-term no change in face or human gesture,which can easily lead to the failure of single facial expression emotion perception or human gesture emotion perception.Therefore,it is significant to further explore the dualmodal emotion fusion perception method of expression and human gesture.This project aims to explore high-accuracy emotion perception and human gesture emotion perception methods,and dual-modal emotion fusion perception methods of expression and human gesture,respectively,carries out in-depth research from four aspects:face detection,expression emotion perception,skeleton point extraction and human gesture emotion perception,as well as facial expression and human gesture bimodal emotion fusion perception.The main contents include:(1)Face detection based on YOLOv5.The YOLO series and R-CNN series face detection algorithms are compared.The network structure and optimization method of YOLOv5 object detection algorithm are analyzed in detail,and the algorithm is migrated to the face detection task.The comparative experimental results show that the face detection algorithm based on YOLOv5 can achieve higher accuracy and better real-time performance.(2)Construction of ghost asymmetric residual attention Res Net53 network and expression emotion perception.In order to solve the problem that part of feature information of Res Net50 is lost after dimensionality reduction by 1×1 convolution,resulting in low accuracy of the model.By introducing deep separable convolution and Ghost module to replace the 3×3 convolution and 1×1 convolution in the bottleneck structure respectively,and adding the asymmetric residual attention module after the improved bottleneck structure to improve the feature representation ability of the network,a ghosting asymmetric residual attention Res Net53 network for expression emotion perception is proposed.The comparative experimental results show that the expression emotion perception method proposed in this paper has significantly improved the recognition accuracy and computational efficiency.(3)Construction of PSA_DST Simple Baseline network for skeleton point extraction and human gesture emotion perception.In view of the problem that the Simple Baseline network cannot effectively use the spatial information of the feature map during the down-sampling process,and the model has a large amount of parameters and calculations,a new bottleneck structure,GPSAneck,is constructed by introducing PSA attention mechanism and Ghost module,replacing the Bottleneck in the original network,and the three transposed convolutions in the up-sampling stage are replaced by the deep-separatable transposed convolution module,so as a PSA_DST Simple Baseline network for human skeleton point extraction is proposed.On this basis,combined with the Pose C3 D network,the emotional perception of human gesture is realized.The comparative experimental results show that the human gesture emotion perception method based on PSA_DST Simple Baseline model and Pose C3 D model has high recognition accuracy and low computational amount.(4)Expression and human gesture bimodal emotional fusion perception.Aiming at the difference and complementarity between expression and human gesture,three different fusion methods,data layer fusion,decision layer fusion and feature layer fusion,are compared and studied.Combined with the experimental results under a single mode,a dual-modal emotion fusion perception model of expression and human gesture with weighted summation at decision level is established.The experimental results show that the proposed dual-modal emotion fusion perception method of expression and human gesture has high accuracy. |