| In the research of sound event detection,in order to get better detection results,a large number of manually labeled strong labels which contain sound event type and specific occurrence time are needed.However,labelling strong label is time-consuming and labor expensive,resulting the scale of the publicly available strong labelled dataset is generally about 2 hours,and the number and classes of audio events are less.In order to reduce the dependence on manual labeling and to get rid of the limitations caused by insufficient frame-level strong labels,this paper explores how to effectively learn and infer the information of acoustic events from the insufficient and inaccurate labels based on semi-supervised learning.During the research,the lack of information of sound events can be divided into:① lack of time information of the event,but contains the sequence information of the events;②lack of time and sequence information of the events,but contains a clear type information;③ lack of time information,sequence information,and even type information of the event is uncertain.To effectively complete the classification and detection of sound events when sound event information is insufficient,this paper intends to explore a processing framework for insufficient and inaccurate information of sound events.In the case of sound event labels,when time information is lacking,based on its sequence information,this paper proposes a detection scheme based on sequential label,which aims to mine the time information of events from sequence information based on sequence learning;when time and sequence information are missing,based on its class information,this paper proposes a large-scale weak label detection scheme,which aims to combine semi-supervised learning and weak supervision to learn and infer hidden time information from data without event time-related information;finally,when the reliability of the label is low,this paper proposes a detection scheme based on probabilistic labels,which aims to infer the potential real label from unreliable and noisy labels only by weak supervision.In this paper,labels with inaccurate audio event information:sequential labels,weak labels,and probability labels are collectively referred to as fuzzy labels.The proposed fuzzy label aims to reduce the workload of labeling and the dependence on manual labeling during the research of acoustic event detection.Experimental results show that audio tagging and sound event detection based on fuzzy labels are feasible.And,in some aspects,the performance of models based on fuzzy labels is better than models based on strong labels.Therefore,the promotion of fuzzy label will greatly alleviate the limitation of the research and development work caused by the lack of strongly labeled data in practical application.Based on semi-supervised learning,this paper makes a series of explorations and researches in the task of audio tagging and acoustic event detection.The researches in this paper not only illustrates that it is feasible to perform audio tagging and sound event detection based on fuzzy labels,but also explores the idea that is different from strong labels,which provides more choices for researchers in the direction of audio tagging and sound event detection. |