Font Size: a A A

Research On Topic Segmentation Techniques In Dialogue Text

Posted on:2017-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:B H WangFull Text:PDF
GTID:2348330503987205Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data, the massive content of dialogue was recorded and saved.How to make use of these data to assist people is an urgent problem to be solved.Topic segmentation aims to divide documents, such as news, lectures and meetings,into several topically coherent parts according to the transformation or change of topic. Within each segment, the topic is consistent. But any two adjacent segments differ in the topic. It is a necessary and important step in navigation, indexing,information extraction and summarization from the lengthy text and audio content.In the research on topic segmentation techniques in dialogue text, this thesis mainly focuses the following two aspects: unsupervised topic segmentation methods and supervised topic segmentation methods.The first model introduced in this thesis is based on the topic model. Topic model based approaches are the mainstream in the current research on unsupervised topic segmentation. Inspired by the previous work which introduces the cue phrases to unsupervised methods, this proposed model combines cue phrases, which are explicit markers for topic segmentation boundary, into structured topic segmentation model by a special topic. Segmentation results are obtained by sampling the hidden boundary indicator variables during inference. Experimental results on English meeting corpus show that the proposed method can obtain better topic segmentation results compared with other unsupervised methods.Secondly, this thesis gives an analysis of a supervised topic segmentation approach, which provides state of the art performance in dialogue text. Taking into consideration the problem of sparsity in lexical feature space and single feature type problem, a new supervised topic segmentation model based on support vector machine(SVM) is presented. This model gives better performance comparing the baseline. Furthermore, the contribution of the three kinds of features is verified quantitatively.Finally, the deep learning based representation learning is applied to topic segmentation on dialogue text. This thesis proposes a long short-term memory recurrent neural network based topic segmentation approach. By automatically learning the representations of sentences, speakers, and contexts from data, this method overcomes the feature engineering problem of the traditional supervised machine learning model. Experimental results show that the proposed approach is more effective than the traditional machine learning method based on feature engineering.
Keywords/Search Tags:Topic Segmentation, Dialogue Text Segmentation, Topic Model, Long Short-Term Memory
PDF Full Text Request
Related items