| In addition to the visual system,the auditory system is one of the most important ways for humans to perceive the environment.It is used to receive external sound signals and obtain the information contained in it after processing by the brain.The types of sounds that exist in human life can be roughly divided into three categories: speech,music,and environmental sounds.Among them,environmental sound contains a wide range of information: it may contain information of a specific scene,such as the sound of rain describing a rainy day;it may also contain the sound that accompanies a certain behavior,such as gunshots when shooting.Therefore,it is of great significance to fully excavate the information contained in environmental sounds and effectively use it to monitor the ecological environment,maintain the safety of public places,and help hearing-impaired people perceive the external environment.At present,with the advent of the era of big data and the perfect development of artificial intelligence-related software and hardware facilities,the use of deep learning methods to solve problems in the field of environmental sound recognition has gradually been favored by researchers.However,there are still many aspects to be solved and optimized in the environmental sound recognition problem.First,even the environmental sound under non-noise conditions does not have a large number of structural features like speech and music,but has a large number of unsteady and unstructured features,it still has the characteristics of large intra-class differences and small inter-class differences,that is,the environmental sound audio belonging to the same category may also have large differences in time domain,frequency domain,and time-frequency domain characteristics,which brings great difficulties to the environmental sound recognition.Second,the environmental sound audio data used in most existing research results is relatively pure and the noise intensity is weak,so these methods can basically obtain better environmental sound recognition results,but there are relatively few studies on environmental sound recognition under noise conditions.There are still many issues worthy of research.Third,apart from technology companies,there are relatively few systems that can provide practical applications for environmental sound recognition researchers,and most research is still inclined to computer theoretical simulation and verification.To sum up,this thesis mainly focuses on the following three aspects:(1)Under non-noise conditions,in view of the complex and changeable timefrequency domain characteristics of environmental sounds,which makes it difficult to accurately extract key salient features,an environmental sound recognition method based on an improved compact bilinear network is studied.While keeping the compact bilinear network architecture unchanged,by using densely connected convolutional networks for feature extraction,replacing the dimensionality reduction mapping function,and introducing a collaborative attention module that can jointly weight and enhance key salient features from the horizontal and vertical directions,an environmental sound recognition based on improved compact bilinear network is built.Tests on public datasets for environmental sound recognition and actual audio datasets demonstrate the effectiveness of the proposed method.Furthermore,the effect of the collaborative attention module is intuitively demonstrated by visualizing the regions that play a major role in model decision-making.(2)Under noisy conditions,aiming at the problem that the environment sound is disturbed by the noise,resulting in poor recognition effect,an environmental sound recognition method based on deep residual shrinkage module with channel-wise thresholds and bidirectional long-short-term memory module combined network is studied.Noise processing is performed on the public dataset of environmental sound recognition,and 8 kinds of noisy datasets with different signal-to-noise ratios are obtained.Two aspects are mainly studied: the impact of different levels of noise on the feature distribution of the environmental sound recognition dataset and the impact on the model recognition performance.The results show that in terms of the impact on the feature distribution of the data set,an appropriate degree of noise can make the feature distribution of the data set relatively loose and enhance the separability;in terms of the impact on the model recognition performance,with the gradual increase of the noise intensity,the overall trend of the model recognition performance is reduced,but there are fluctuations in the middle.(3)Design and implement an environment sound recognition system.Using the Vue framework to build the front end,My SQL database for data management,Python,Tensorflow,and the free open source web framework Django developed using the Python programming language as the back end,an environmental sound recognition system is built.The system has five functions: user registration and login,audio feature visualization,training and optimization model,mode switching,and recognition result display.The system is easy to operate and provides a reference for the practical application of the environmental sound recognition model. |