Font Size: a A A

Automatic Abstraction Based On Combination Of Hidden Markov Algorithm And Singlepass

Posted on:2018-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z H HouFull Text:PDF
GTID:2348330518982363Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the modern era, Internet technology is developing more and more rapidly, the development of which means that the information age has entered into our life. People can’t live without information. The news media have also grown with the development of information technology. The news coverage has enriched the lives of people. The news media have been reporting the same events in time. People have to face a lot of raw news every day in their lives, and the news is not being reported in a timely manner. In the face of such a situation,people will miss some important information of news because they can’t keep up with the news. Right now you will find they will be more and more urgent to get a tool, which can help to sort out each big reports, with all the information of news events being classified according to certain rules, convenient for people to quickly browse reading news tool, and the tools to the same specific news coverage through seeking common ground while putting aside differences. If there is a tool, people will be able to look at one or more of the categories of news that they care about. In this way, people will not only be able to read the aggregated information quickly, but the most important thing is to save people valuable time. In this paper, the major news events on the Internet, the use of topic detection and single events clustering technology,development of a single event more abstract system document. The system will be able to cluster the news feeds of single events, consolidate and compress the news information available online, and submit it to the user.(1) The main algorithmic system design for single event clustering. In different places, the classification USES the SVM algorithm. That is, a similar weighted vote for a single event set. Second,study the LDA model.In the course of learning,the document can be modeled at the same time,and the document must first be classified using the SVM algorithm before modeling. The main thing after the classification is the calculation of similar degrees, so the calculation of similarity is used in this article by an algorithm combining the LDA model with SVM. Finally, using a combination of single-pass and markov, the collection of news documents obtained from the classification is aggregated by Single event.Design and implementation of multi-event automatic summarization system based on single event. There are many basic technologies involved in this system, one of which is the representation of text. In this module, we add the expression of CNKI in the traditional vector space. The expression combines all the relevant feature words.Construct synonyms by combining the characteristic words together. Construct the SVM model by constructing a synonym set. Using the LexRank algorithm in the calculation of sentence weight. The sentence weight is calculated by this algorithm.The first thing this module does is to combine the features of the sentence itself, and then combine them to make a linear combination of the results. Finally,the weights of sentences are obtained by computing. Using MMR algorithm in the decimation module. MMR algorithm is also called maximum edge correlation algorithm. The algorithm is a method of extracting abstract sentences to remove redundancies. After the result of this algorithm, the sentences obtained are sorted and exported by a certain rule, so finally get the relatively smooth digest.In view of the content described above, the research of automatic abstracting system is in line with the above requirements. In the experiment, we took the corpus used by the Harbin Institute of Technology Research Institute and developed an automatic digest system and made experiments and tests. Experimental results show that the proposed method in this paper has great advantages for topic extraction and multi-document summarization, and achieves satisfactory results.
Keywords/Search Tags:single event, Conversation analysis, Document more abstract, The SVM, Theme LDA model
PDF Full Text Request
Related items