Font Size: a A A

Research On Event-oriented Multi-document Automatic Summarization

Posted on:2011-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:P SunFull Text:PDF
GTID:2178330332472255Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
A subject can have multiple events, the subject is abstract, and the event is concrete. Events with the same subject have similarities. To resolve the problem of low quality, low coverage of abstract sentence and poor readability of multi-document automatic summarization, this paper focuses on the same subject under different news reports for event identification, and completes text clustering of the same event, and researches on event-oriented multi-document automatic summarization.Generally speaking, the main work includes four aspects:(1) To resolve the problem of difficult to identify different events with similarity in subject-oriented multi-document summarization, we focus on different events under the subject of unexpected event. ICTCLAS system is embedded into GATE (General Architecture for Text Engineering) in this paper, and we use GATE tool to achieve event trigger recognition.(2) To improve the efficiency of Chinese similarity computing, this paper gives a word similarity and Chinese sentence similarity computing system model based on HNC. This model makes use of semantic and sentence category analysis system of HNC. The experiment shows that this model has advantages in Chinese similarity computing when comparing with traditional similarity computing, and it is the basis of clustering.(3) We resolve the event clustering by using word similarity based on HNC to compute the similarity of triggers to resolve the recognition problem of different events with similarity. For Different documents of the same event, we complete the identification of the side information and abstract the candidate sentences. Then we compute sub-theme importance and sentence importance to extract abstract sentences. It helps to improve the quality of multi-document summarization and can meet the requirements of readers to some extent.(4) We design and implement a prototype multi-document summarization system orient to event. And evaluate it from the information coverage, readability, precision, and combined with manual scoring. The experiment shows that this system partial resolves the problem of low quality, low coverage of abstract sentence and poor readability of automatic summarization comparing with summarization system based on tf*idf.
Keywords/Search Tags:multi-document, automatic summarization, event, HNC, natural language processing
PDF Full Text Request
Related items