| Object detection technology,as an important technique in the field of computer vision,has been widely applied in the field of somatosensory interaction.However,single-modal object detection using only RGB images has many limitations,as it is susceptible to environmental factors such as lighting,shadows,occlusion,and background,leading to suboptimal detection results.In the application of object detection for somatosensory interaction,there is not only a high demand for accuracy but also the need to precisely describe the shape and spatial position of the target.Therefore,a multimodal object detection method using RGB-D images is proposed in this paper,and a dualstream asymmetric object detection algorithm is designed,which outputs detection boxes for hands and faces.The position relationship of the detection boxes is used as input,and a somatosensory-based target system is designed.Finally,a complete somatosensorybased target system is designed and implemented based on the two algorithms mentioned above.Firstly,a dual-stream asymmetric RGB-D object detection algorithm is proposed.After preprocessing the depth images,a dual-stream asymmetric feature extraction network is designed,which adopts different feature extraction networks for RGB and depth images,and an attention mechanism is introduced to balance the commonality and differences between RGB and depth images while reducing network complexity.A channel adaptive fusion module is designed based on channel attention mechanisms,which adaptively learns to adjust channel feature responses,achieving efficient fusion of RGB-D multi-modal features.Experiments on the Hand Gesture dataset and a custom dataset verify the accuracy and real-time performance of the algorithm.Secondly,for the somatosensory-based target system,throwing accuracy judgment algorithm,throwing speed judgment algorithm,and scoring algorithm are designed,and collectively referred to as the somatosensory-based target system algorithm.After obtaining the detection box coordinates of the face and hand at the end of the throw in the RGB-D object detection,the depth information of the detection box center point is first extracted,the movement speed of the hand’s center point at the end of the throw is calculated,and compared with the minimum throwing speed to determine whether the throw is off-target.The distance between the center points is calculated,and the throwing accuracy is judged based on the distance between the two points.A difficulty factor is designed according to the throwing distance,and the final score is obtained by multiplying the difficulty factor by the number of rings after the throw.Experiments comparing real throwing situations verify the effectiveness of the somatosensory-based target system algorithm.Finally,a requirement analysis for the somatosensory-based target system is performed,and various system modules including image preprocessing,object detection,and throwing judgment modules are designed.The system’s UI interface is also designed,and the various functions of the system are tested.The results show that the designed somatosensory-based target system can effectively accomplish the tasks and demonstrate the practical application value of using RGB-D images for multi-modal object detection in the field of somatosensory interaction. |