| Sound event detection(SED)is an important technology in the field of audio content analysis and processing,which goal is to determine the class of events that occur in the audio and detect the start and end time of the audio events.The performance of its algorithm has an important impact on subsequent audio processing and analysis tasks.The SED technology involves machine learning,pattern recognition and other knowledge of related fields,and it has a wide range of applications in reality,such as security monitoring,smart home and multimedia retrieval.SED has variety of label format and corresponding learning algorithm,this thesis takes deep learning technology as the basic theory and makes in-depth research on SED algorithms based on sequential label and proposes an improved method,then constructs corresponding dataset to analyze the performance of the algorithm.The main contents of the paper include:(1).Aiming at the SED algorithm based on sequential label,a new sequential label format and loss function are proposed,which can make use of accurate sequence information of events in audio segments and introduce the concept of composite state to clearly describe the sequential information mathematically.The CTC algorithm is then modified to be able to train the model using the proposed composite state and loss function.(2).We collected and constructed an audio dataset which has variaty of event types and sufficient active duration,going with accurate strong annotation.Then multiple datasets with different statistical characteristics were generated using audio processing algorithms such as mixing,which were used to analyze the impact of statistical characteristics of the audio on the performance of each algorithm.In order to verify the effectiveness of the proposed algorithm,comparative experiments were conducted on three public datasets,TUT Sound Events Synthetic 2016,TUT Sound Events 2016 Develoment,and TUT Sound Events 2017.According to the results of experiments.It can be verified that the algorithm proposed in this thesis has better performance and stability compared with the original SED algorithm based on sequential label. |