Font Size: a A A

Millimeter-wave Radar Chinese Isolated Sign Language Recognition Based On Cross-modal Supervision Networ

Posted on:2024-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2568307052965949Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Sign language is the primary communication tool for people who are hard of hearing or otherwise unable to communicate verbally.Compared with the existing sign language recognition equipment,millimeter-wave radar has the advantages of small size,non-contact,independent of light,strong anti-interference ability,and it can work in all weathers,so it has received more and more attention.At present,sign language recognition based on millimeter-wave radar mainly has problems such as difficulty in obtaining radar sign language data and low recognition accuracy.In response to the above problems,this paper combined a cross-modal supervision network to build a camera-radar joint data acquisition system.By using a camera with relatively mature technology instead of manual work,the radar sign language data can be automatically intercepted and marked,effectively overcoming the problem of Chinese sign language recognition.At the same time,this paper combines radar signal processing method with deep learning to carry out research on Chinese isolated sign language recognition algorithms based on range-Doppler feature and micro-Doppler feature fusion.The main work and innovations of this paper are as follows:(1)A sign language recognition method based on radar bimodal feature fusion is proposedSign language movements contain rich semantic information,and the movements are complex and changeable.Traditional deep learning methods only extract features in a single signal domain and are unable to recognize sign language effectively.In response to this problem,the 2D Fast Fourier Transform(2D-FFT)and the Short-Time Fourier Transform(STFT)algorithms are used for the intermediate frequency(IF)signal of FMCW radar.The multiframe Range-Doppler Map(RDM)and the single-frame micro-Doppler Feature Map(m DFM)containing the spatial and time-frequency features of the sign language are obtained respectively,and the interference and noise in the radar echo data are processed and suppressed.Meanwhile,this paper proposes a new network architecture that can fuse two modal features,RDM and m DFM,by combining deep learning.The network architecture uses 3D-CNN network based on 3D convolution and a Deep Net8 network based on 2D convolution for feature extraction of RDM and m DFM,respectively,and finally achieves fusion at the feature level.The proposed fusion of bimodal features for sign language recognition achieves 98% recognition accuracy on eight sign languages,which is an 8% improvement in accuracy compared to the traditional method.(2)A sign language recognition method based on cross-modal supervision network is proposedAiming at the problems such as the difficulty of radar sign language data labeling and high labor consumption caused by the need for manual tagging in the traditional supervised learning mode,this paper proposes a sign language recognition method based on crossmodal supervised networks.The method constructs two sign language recognition networks with different modalities,using the R(2+1)D network based on residual network and 3D convolutional decomposition as the "teacher network" and the "student network" as the bimodal feature fusion network mentioned above.In this paper,the "teacher network" is pretrained by developing more mature video sign language data,and the dataset is easy to obtain,so that the "student network" can be trained under the unlabeled dataset under supervision.Finally the "student network" can be separated from the "teacher network" to complete the sign language recognition work alone when the training is completed.The final obtained radar sign language recognition model under cross-modal supervision achieves 95%recognition accuracy on eight sign languages.The experiments demonstrate that the method can effectively solve the problem of difficult radar sign language data standards and still obtain a highly reliable radar sign language recognition model while reducing the manpower consumption.(3)A camera-radar joint data acquisition system based on cross-modal supervision network was builtIn response to the problems that traditional methods to acquire radar sign language datasets require manual interception and annotation,which are inflexible and labor-intensive,this paper builds a joint camera-radar data acquisition system based on cross-modal supervised networks.The core of this system is sign language recognition algorithm of cross modal supervised network.This paper expands the teacher channel into an online real-time sign language recognition system by adding a monocular ranging module and an action detection module.At the same time,a video-radar dual data stream channel is constructed,and the temporal-spatial alignment module is used to automatically intercept and label radar sign language data through cross-modal supervised network.In this paper,the system was finally used to acquire 16 radar sign language data and perform radar signal processing to obtain m DFM maps to validate the data acquisition quality.The experiment proves that the system can effectively reduce the difficulty of radar sign language data acquisition,which has certain practical significance and social value.
Keywords/Search Tags:SLR(Sign Language Recognition), Millimeter-Wave Radar, Cross-Modal Supervised Learning, Multi-Modal Fusion, Convolutional Neural Network
PDF Full Text Request
Related items