| Speaker change point detection aims to solve the problem of "who speaks when",and marks the time point of speaker change according to the personality characteristics of different speakers.Speaker change point detection has a very wide range of applications in call systems,such as telephone recording,broadcast news and conferences.It is also commonly used as a front-end system for systems such as automatic speech recognition and speaker recognition.The complete speaker switching point detection system is mainly composed of two modules,speaker embedding and clustering.These two modules jointly determine the performance of the speaker switching point detection system.With the continuous development of deep learning technology,many new speaker embedding and clustering methods have been applied to the two components of the speaker switching point detection system,and have achieved very good performance.This paper mainly focuses on the above two modules,and optimizes the speaker switching point detection system based on ECAPA-TDNN according to the requirements of downstream tasks.The online and offline speaker switching point detection systems are designed respectively:1.For the clustering part,use the unbounded staggered recursive neural network to replace the traditional clustering.Compared with traditional clustering methods,this method makes full use of label information for supervised training,and the number of speakers in a session is no longer a known condition.And the speaker switching point detection system proposed by this method is an online system,which is suitable for the front-end system as a real-time system.The system outperforms existing online speaker switching point detection systems on CALLHOME.2.For the speaker embedding part,construct each conversation as a graph,take the speaker embedding extracted by ECAPA-TDNN as the node of the graph,and determine whether there is an edge connection between the nodes according to the similarity between the speaker embeddings.Through supervised learning,new speaker embeddings are obtained,and speaker switching point detection results are obtained through clustering.The system achieves the state-of-the-art offline speaker switching point detection results on the AMI dataset. |