| As an important part of the natural ecosystem,bird species provide an important basis for understanding regional biodiversity changes and climate and environmental changes.As an important individual feature of birds,birdsong has a high degree of recognition and has been widely used in the research of bird species identification.In recent years,with the rapid development of signal processing and sound recognition technology,bird species monitoring based on birdsong recognition has shown great application prospects due to its advantages of low cost,wide detection range and small restrictions.Under this research background,this paper aims at the needs of bird species monitoring and analysis,and studies the problems of single feature extraction method,insufficient use of feature information,and insufficient amount of bird sound data collected from field monitoring.The main work of this paper is as follows:(1)Aiming at the problem of low classification accuracy caused by single feature extraction method and insufficient use of feature information in traditional bird sound recognition algorithm,a bird sound recognition method combining Transformer network and convolution neural network is proposed.The time and frequency domain information in the spectrogram feature is extracted by using the strong capture ability of the convolutional neural network to the local feature,and the global time sequence information in the Mel differential feature is extracted by using the correlation ability of the multi-head attention mechanism in the Transformer encoder network to the context information.Finally,the local feature and the global feature are fused and input into the Softmax classifier to obtain the test results.Experiments were carried out on the Birdsdata dataset and Xeno-canto database,and the highest accuracy rates of 97.81% and 89.47% were obtained,respectively.The results show that the birdsong feature parameters obtained after feature fusion can obtain better results in the birdsong recognition test.(2)Aiming at the problem of uneven distribution of bird sound audio data collected from field bird monitoring and low recognition accuracy caused by over-fitting of neural network training,a small sample optimized bird sound recognition method based on bridging transformer is proposed.The model takes the Mel spectrogram as the input feature.On the basis of(1),the attention module and convolution module form a bridge transformer module,which preserves the interactive use of local and global information of the model,while optimizing the overall complexity of the model;Finally,we use the cross-attention mechanism of the sample loss optimization module to model the relationship between the output features and complete the internal sample expansion.The method was tested on the Birdsdata dataset and Xeno-canto database after small sample processing,and the highest accuracy rates were 91.34% and 82.63%,respectively.The experimental results showed that the model optimized the bird sound recognition method in the small sample environment and improved the recognition efficiency. |