| In recent years,the need to analyze and summarize dialogue data has become increasingly urgent.There are many differences between the summary of dialogue content and the text summarization task for news reports and other content:multi-participation,frequent topic transfer,lack of annotation data resources,long text length,and obvious colloquialism.Therefore,the existing text summarization technology designed for news summarization is not applicable to dialogue summarization.At present,the dialogue summarization is faced with some problems:the modeling of the conversation topic structure depends on the topic segmentation data or other topic segmentation algorithms,which increases the human cost and will cause the error accumulation problem;The news website provides a rich source of text summarization data.In contrast,building a dialogue summarization data set requires manual writing of corresponding summaries for the original dialogue content,so the annotation data set is very scarce;The training method of cross-entropy maximization likelihood estimation adopted by the abstractive dialog summarization model only maximizes the likelihood probability of the annotated summary,while the likelihood probability of any candidate summary except the annotated summary is set to zero,which is not consistent with the feature that the summary task has multiple reasonable outputs.This thesis explores the use of dialogue structure information to solve the above problems.On the basis of full investigation of existing research,the following work has been earried out:1)An abstractive dialog summarization method based on topic segmentation structure modeling is proposed.Two self-supervised comparative learning subtasks,topic coherence detection and subsummary generation,are designed.The former detects the topic segmentation structure of the input conversation content,and the latter generates sub-summaries for each topic segment.The model models these two subtasks together with the conversation summary task,without the need to segment the conversation topic and label the data.The experimental results show that the method proposed in this thesis outperforms the most advanced models in the world in terms of automatic evaluation and human evaluation.2)This thesis proposes a data augmentation method of dialogue summarization based on important/unimportant discrete discourse structure.This thesis designs a method of unsupervised selection of important utterances in conversation,and then proposes three methods of constructing augmentation samples.The experimental results demonstrate that compared to the baseline model,this method improves the quality of summaries,with a 1.94 percentage point increase in the ROUGE-1 score on the SAMSum dataset.3)This thesis proposes a dialogue summarization training method based on topic cluster structure.A two-step sampling method of topic cluster and candidate sub-summary is designed to construct a variety of candidate summaries,and the likelihood probability and summary evaluation criterion ROUGE score of the model output are aligned with the proposed margin-based contrastive learning training objective.Experiments have shown that this method can generate summaries with higher ROUGE scores compared to baseline models.The ROUGE-L score in the SAMSum dataset increased by 2.15 percentage points. |