With the rapid development of web technology, here comes the problem of data explosion and information overload. Therefore the technology of automatic text summarization becomes the hotspot in computer science. In contrast with other NLP tasks, the challenges that automatic summarization face with are that the judge issue of summary is too subjective and there always lots of redundancy lying in the result summary. Most existing models score sentence by predefining some features and select the top-k sentences as result summary. However these ranking models score each sentence independently without considering the relationships between sentences. On the other hand, these predefined features usually are lexical or statistical, which cannot capture the semantic meanings of text. To counter these shortcomings, we assume that a good summary can reconstruct the original document, and we propose the semantic reconstruction model basing on this assumption. The proposed model selects the sentences that can best reconstruct the original document as the result summary. Our work in this paper consists of two parts:1. Semantic representations of sentence. Given that the bag-of-words vector can not capture the semantic meanings, we use two approaches to learn compact and semantic representations for sentence:(1) weighted mean of word embeddings; (2) deep coding. The semantic representations can be used as the input of reconstruction model.2. Reconstruction strategy is the key of semantic reconstruction and aims to find the most relevant sentences. The reconstruction strategy in this paper includes a simple linear function and flexible nonlinear function, respectively basing on quadratic programming and neural network. Besides, redundant sentences can be reduced by redundancy reduction algorithm to improve the summary quality. And the summary experiments basing on the DUC datasets validate the effectiveness of our model. |