The Research Of Automatic Summarization Based On S2AFCM And Text Content Structure Analysis

Posted on:2012-03-30

Degree:Master

Type:Thesis

Country:China

Candidate:S C Wang

Full Text:PDF

GTID:2218330368480936

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the popularity of computers and the Internet, more and more informations are uploaded to the network for exchange. The sharp increase in the amount of information has greatly facilitated the exchange and communication between people, and has made great contribution for the development of human's civilizaion and economic. However, the speed of information generation and dissemination is much faster than people's processing speed. In the face of the emerging mass of information, people is difficult to cover all the information even keeping reading all the time, it is very difficult to find the information what they need. Summarization is an effective tool to improve and solve this problem, but the manual summarization is slow and subjective, automatic summarization was born in this context.This paper firstly introduces the research background, significance and domestic and foreign research situation of automatic summarization. Then introduces the definition and classification of automatic summarization, and the existing automatic summarization techniques are classified into five categories: automatic extraction, understanding-based automatic summarization, information extraction, automatic summariztion based on structure and automatic summarization based on user-query. Their advantages and disadvantages are analysised. And through comparing them on seven aspects, this paper points out that automatic summariztion based on discourse is better than the other four kinds of methods.On this basis, this paper designs and implements an automatic summarization system based on discourse analysis method, which is automatic summarization based on S2AFCM and text content structure analysis. The basic idea of this system are:(1) using S2AFCM to cluster the paragraphs of document, according to the membership matrix, obtaining the division of sub-topics and the transition paragraphs. (2) analyzing the transition paragraphs by combining the research thoery of complex sentences, Rhetorical Structure Theory and discourse structure features of Chinese, building the discourse content structure tree. (3) a sub-topic is as a unit, selecting the candidate summarization senteces to generate the sub-topic summarizations, generating the final summarization based on discourse content structure tree. Author uses vs2008 as development tool, Oracle as backend database, completed the coding of this system. TREC data sets are as test data sets to test this system. According to compare the experiment data, it shows that the Recall, Precision and F-measure value are better than other systems, the summarization generated by this system has higher quality.

Keywords/Search Tags:

automatic summarization, fuzzy clustering, transition paragraphs, Section Set Adaptive Fuzzy C-Means (S2AFCM) clustering method, text content structure analysis

PDF Full Text Request

Related items

1	Automatic Text Summarization And Fuzzy Topics Identification Methods Towards The Design Of Public Opinion Information System
2	The Application Of Fuzzy C-means Clustering In The Stock Investment
3	A Study Of Based On LSA And Paragraph Clustering Of Automatic Abstracting System
4	Research Of Key Techniques In Fuzzy Clustering Based On Objective Function
5	Multi-document Summarization Based On Improved Fuzzy C-means Clustering Algorithm
6	Fuzzy C-means And K-means Clustering Algorithm And Its Parallel
7	Research Of New Fuzzy Clustering Algorithms Based On Objective Function And Its Applications
8	Research On The Segmentation Method Based On Fuzzy Clustering
9	Improved Fuzzy C Means Clustering Algorithm And Its Application
10	Study Of Auto-Adaption Fuzzy C-Means Clustering Algorithm