Font Size: a A A

Research On The Analysis Technology Of Chinese Discourse Content Coherence Based On Sentence Clusters

Posted on:2019-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:T K WeiFull Text:PDF
GTID:2435330569496482Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,artificial intelligence has stirred up waves in all walks of life.After achieving basic intelligence,more scholars have devoted more efforts to the humanization of artificial intelligence.The coherence of text plays an important role in many fields and provides some help for the related research.For example,to make the articles generated by smart writing blunt,it is necessary to analyze the coherence between text sentences.In addition,in the task of reading comprehension,if the coherence between sentences can be well recognized,it can be effectively positioned in the original text when choosing answers.In summary,the coherence is the focus of text analysis,and it is the cornerstone of further research.However,research on coherence at the level of sentences will lose a lot of contextual information,and there are many technical problems with using the whole text for research because of the large granularity.Therefore,this paper studies text coherence on the level of sentence group,and studies the difficulty of dividing sentence group.In addition,this paper studies the distribution characteristics of sentence group in different genres corpus and the automatic segmentation of sentence group.In the end,this paper studies the automatic recognition of sentence relations within sentence group.The studies of this paper are as follows.Firstly,this paper proposed a new computational model to estimate the text coherence annotation difficulty,based on the different statistical features of two independent annotators.The experimental results show that the model can provide.In the end,this paper can accurately distinguish for the news texts with different annotation difficulty.The research work lays a good foundation for the analysis and understanding of relevant text content in the future.Secondly,this paper analyzed the coherence of four different corpora in news,application,prose and encyclopedia,based on the different statistical features of two independent annotators.On this basis,the coherence distribution characteristics of sentence group in four corpus are analyzed,and the differences of sentence group coherence in different genres are compared in detail.The research work lays a good foundation for the automatic segmentation of the boundary of sentence group and automatic analysis of the relationship between sentences in the future.Thirdly,this paper proposed a new method to automatically segment the sentence group in Chinese text.Its key points were to use the CNN and attention to classify the adjacent sentence pairs,and combined with the theme of sentence group to improve the accuracy.The experiment used the large-scale weakly labeled paragraph data-set to solve the problem of sentence group shortage.The result shows that this method can effectively carry on the automatic recognition of the sentence group boundary,and realize the automatic segmentation of the sentence group.Finally,in order to automatically identify the double-nucleus relations,this paper combined CNN and word sequence features,synthetically took into account the semantic and structural characteristics,and added attention to dig the double-nucleus relations.Experiments show that this method can effectively identify the double-nucleus relations,and the method is portability.
Keywords/Search Tags:text coherence, text annotation difficulty, text structure analysis, sentence group segmentation, recognition of Chinese sentence pairs relations
PDF Full Text Request
Related items