| With the popularity of computers and the Internet, more and more informations are uploaded to the network for exchange. The sharp increase in the amount of information has greatly facilitated the exchange and communication between people, and has made great contribution for the development of human's civilizaion and economic. However, the speed of information generation and dissemination is much faster than people's processing speed. In the face of the emerging mass of information, people is difficult to cover all the information even keeping reading all the time, it is very difficult to find the information what they need. Summarization is an effective tool to improve and solve this problem, but the manual summarization is slow and subjective, automatic summarization was born in this context.This paper firstly introduces the research background, significance and domestic and foreign research situation of automatic summarization. Then introduces the definition and classification of automatic summarization, and the existing automatic summarization techniques are classified into five categories: automatic extraction, understanding-based automatic summarization, information extraction, automatic summariztion based on structure and automatic summarization based on user-query. Their advantages and disadvantages are analysised. And through comparing them on seven aspects, this paper points out that automatic summariztion based on discourse is better than the other four kinds of methods.On this basis, this paper designs and implements an automatic summarization system based on discourse analysis method, which is automatic summarization based on S2AFCM and text content structure analysis. The basic idea of this system are:(1) using S2AFCM to cluster the paragraphs of document, according to the membership matrix, obtaining the division of sub-topics and the transition paragraphs. (2) analyzing the transition paragraphs by combining the research thoery of complex sentences, Rhetorical Structure Theory and discourse structure features of Chinese, building the discourse content structure tree. (3) a sub-topic is as a unit, selecting the candidate summarization senteces to generate the sub-topic summarizations, generating the final summarization based on discourse content structure tree. Author uses vs2008 as development tool, Oracle as backend database, completed the coding of this system. TREC data sets are as test data sets to test this system. According to compare the experiment data, it shows that the Recall, Precision and F-measure value are better than other systems, the summarization generated by this system has higher quality. |