| With the arrival of the information era, there has been an explosive growth of network information, resulting in a data overload of the search engine. How to efficiently and effectively mine useful knowledge and extract users’ emotions, opinions and attitudes about events has become an important research topic. Automatic summarization is one of the main technologies to solve the information overload problem. It aims at extracting information automatically for the original document or documents and generates a brief and coherent expression, covering the core content.However, automatic summarization is a challenging task. It contains several subtasks, such as reducing, time dimension, sentence ranking and summarization optimization, which be very complex in text summarization task, especially for generated summarization. At present, summarization optimization is one of the most important subtasks, and the maximal marginal relevance algorithm (MMR) and integer linear programming (ILP) are the classical methods for optimizing summarization. In recent years, summarization optimization methods based on submodular function have gradually become an important research hotspot in this field. In the framework of submodularity, many combinatorial optimization problems can be solved optimally or near-optimally in polynomial. In this paper, we introduce a based on submodular function method for selecting summarization sentences and optimizing summarization.Under the specific constraint condition, maximization of a monotone submodular function can be solved near optimally using a greedy algorithm. So that the generated machine summary is with a constant-factor approximation guarantees to the ideal standard summary. Its specific work mainly includes the following two points:First, we propose a based on joint submodular function improved method of multi-documents summary optimization. We built an undirected graph for the documents, where vertices represent the sentence and edges represent the relationship between the sentences. Both of the content relevance and diversity are taken into account to build a submodular collection function, then a greedy algorithm is used to select the sentence and optimize summary. In addition, based on the traditional TFIDF cosine similarity calculation, we utilize the semantic relations of words and improve sentence similarity from WordNet semantics and word mover’s distance (WMD)respectively. Experiments conducted on the standard multi-document summary dataset DUC2004 demonstrate that the proposed method is feasible and effective.Second, we propose a based on submodular function improved method for opinion summarization. We first create the movie ontology tree of all aspects using WordNet sense propagation algorithm and deal with the sentences classification based on the ontology tree. Then we construct a class objective functions with submodularity to strike a balance between objective content (including content relevance and diversity) and subjective sentiment (subjective coverage),and extract important sentences to generate summary candidate set using a partial enumeration based greedy algorithm. Based on our method, we can dig out the opinion sentences covering multiple aspects and subjective sentiment from movie reviews. Experiments conducted on the Pang’s polarity dataset demonstrate that the proposed method is feasible and effective from the summary quality and sentiment correction respectively. |