| As big data and artificial intelligence continue to advance,speech recognition technology is gradually integrated into smart homes,vehicle systems,smartphones and other scenarios due to its efficiency,accuracy and convenience,and has become an indispensable part of people’s daily life.Speech recognition,as a simple and efficient means of human-computer interaction,is a hot research topic in the field of machine hearing.The current status of speech recognition technology can even match the performance of humans under ideal conditions,but on the contrary,it is unbearable under harsh conditions.Therefore,the application of some practical scenarios,such as reconnaissance vehicles in noisy scenarios,embedded devices with low power consumption and low computing power,has become the trend of speech recognition technology research.Traditional speech recognition technology mainly uses isolated word recognition based on Dynamic Warping Algorithm(DTW),isolated word recognition based on Hidden Markov Model(HMM)or keyword recognition of filler templates.These algorithms have poor performance and low recognition rate.Therefore,in actual research,speech recognition technology based on deep neural network has emerged with the development of deep learning,and the recognition accuracy and performance have been greatly improved.However,speech recognition technology based on deep neural network rely on network resources and high-performance server platforms.They cannot be used on some low-performance hardware with light computing power.Such computing tasks are more suitable for computing in the cloud.However,the network bandwidth issues has become a bottleneck for the applications with high real-time requirements.This thesis mainly studies the high-performance and high-accuracy speech recognition scheme of low-performance and low-computing embedded devices combined with cloud devices in specific noise scenarios.At the same time,in order to truly get rid of hands,it is necessary to switch between different mode,the system needs to recognize different types of switching voice commands,and distinguish between urgent and nonemergency voices.The urgent voices has a higher priority and need to be dealt with as soon as possible.High performance and high accuracy require the system to have higher computing power,but the switching between high priority and low priority needs to meet real-time performance.This thesis designs and implements a speech recognition system based on cloud-edge collaborative computing.At the edge,a small RNN voice noise reduction front-end is used to denoise the voice,and a CGRU-based keyword retrieval model is used to realize the functions of voice wake-up and voice command control mode switching.It uses the priority queue technology to distinguish the high and low priorities for different voices in the cloud.It also uses the thread pool and concurrent processing to accelerate the speech recognition.The deep neural network model is based on the combination of deep convolutional neural network and link temporal classification criterion(DCNN-CTC).This design not only solves the low accuracy of speech recognition of edge devices,but also enables real-time mode switching,and also avoids the delay of the recognition for high-priority voices in the cloud.Cloud-edge collaboration technology closely combines cloud computing and edge computing to make up for the shortcomings of a single solution. |