Font Size: a A A

Research On Automatic Abstract Formation Of Private Micro-blog

Posted on:2015-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:W Y GuoFull Text:PDF
GTID:2298330422490285Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The micro-blog has been the rapid popularization and development in all global for micro-blog to publish and convenient way, real-time, smooth communication mode and a low threshold for advantage,since2007.Strongly development momentum in recent years, microblogging has become a indispensable part of people’s lives.In China,the user number of microblogging is surging in the number of hundreds of millions per day, resulting in a large number of micro-blog data.Most of the content is much colloquial serious and contains lot of comments.With the rapid development of microblogging,how to find microblogging data are consistent with personal interests and can provide useful information in a multitude of various types of colored micro-blog data,has become a huge problem.In this paper,we get the source data from Sina micro-blog,and select the personal micro-blog published a historical period as data units to research.Combining with text representation,clustering algorithms and other topics,we do research on automatic summarization technology and the characteristics of the micro-blog data. We designed and implemented a set of data from the acquisition to data processing to the final automatic summary form a complete system.This process mainly through the following steps: acquiring data, data preprocessing; textual representation; feature selection; similarity calculation improvements; clustering algorithm and improved algorithm; formation of integrated automatic summary. The main work of this thesis are:Firstly, to obtain the original data by Sina Weibo open platform.Secondly, To analysis of the data on the microblogging research, combined with the characteristics of the private microblogging text data and review content merged into a pseudo-word document and a series of pre-processing work.After the text word into data format,text reflects the relational data model, and using text similarity calculation method based on this. Then,the clustering algorithm uses the K-means clustering algorithm. Specify the value of K is always the biggest problem encountered in K-means clustering algorithm, usually through experience to judge. Select the center point also has a greater impact on the accuracy of the algorithm, the center often want to have some representation, which has a higher density. We improved so that the improved algorithm can obtain adaptive value of K, and select the center point.Based on the content and timeliness of microblogging popularity, to determine the weight of each cluster are heavy microblogging, first get a summary of each cluster, each cluster are ultimately combined to form the final summary for private microblogging.Finally, the paper by experimental verification, clustering algorithms presented in the paper were analyzed and improved experiment. Compared to the original algorithm improved accuracy and applicability. Through the entire system development, to achieve a private microblogging summary form.
Keywords/Search Tags:Personal microblogging, Automatic summarization, Clustering, algorithm, Pesudo-document
PDF Full Text Request
Related items