Research On Automatic Abstract Formation Of Private Micro-blog

Posted on:2015-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:W Y Guo

Full Text:PDF

GTID:2298330422490285

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The micro-blog has been the rapid popularization and development in all global for micro-blog to publish and convenient way, real-time, smooth communication mode and a low threshold for advantage,since2007.Strongly development momentum in recent years, microblogging has become a indispensable part of peopleâ€™s lives.In China,the user number of microblogging is surging in the number of hundreds of millions per day, resulting in a large number of micro-blog data.Most of the content is much colloquial serious and contains lot of comments.With the rapid development of microblogging,how to find microblogging data are consistent with personal interests and can provide useful information in a multitude of various types of colored micro-blog data,has become a huge problem.In this paper,we get the source data from Sina micro-blog,and select the personal micro-blog published a historical period as data units to research.Combining with text representation,clustering algorithms and other topics,we do research on automatic summarization technology and the characteristics of the micro-blog data. We designed and implemented a set of data from the acquisition to data processing to the final automatic summary form a complete system.This process mainly through the following steps: acquiring data, data preprocessing; textual representation; feature selection; similarity calculation improvements; clustering algorithm and improved algorithm; formation of integrated automatic summary. The main work of this thesis are:Firstly, to obtain the original data by Sina Weibo open platform.Secondly, To analysis of the data on the microblogging research, combined with the characteristics of the private microblogging text data and review content merged into a pseudo-word document and a series of pre-processing work.After the text word into data format,text reflects the relational data model, and using text similarity calculation method based on this. Then,the clustering algorithm uses the K-means clustering algorithm. Specify the value of K is always the biggest problem encountered in K-means clustering algorithm, usually through experience to judge. Select the center point also has a greater impact on the accuracy of the algorithm, the center often want to have some representation, which has a higher density. We improved so that the improved algorithm can obtain adaptive value of K, and select the center point.Based on the content and timeliness of microblogging popularity, to determine the weight of each cluster are heavy microblogging, first get a summary of each cluster, each cluster are ultimately combined to form the final summary for private microblogging.Finally, the paper by experimental verification, clustering algorithms presented in the paper were analyzed and improved experiment. Compared to the original algorithm improved accuracy and applicability. Through the entire system development, to achieve a private microblogging summary form.

Keywords/Search Tags:

Personal microblogging, Automatic summarization, Clustering, algorithm, Pesudo-document

PDF Full Text Request

Related items

1	Research Of Document Summarization Based On Topic Analysis
2	Citation Clustering Based Automatic Multi-Document Summarization
3	Research On Automatic Text Summarization Technique Of News Documents
4	Automatic Summarization Of Multimedia Information And Related Technology Research,
5	Research On Automatic Multi-document Summarization Based On Statistics And Semantic Analysis
6	A Study Of Chinese Multi-document Summarization Based On Adaptive Clustering Algorithm
7	Design And Implementation Of Multi Document Automatic Summarization System In Biomedical
8	Statistic-based Automatic Keypharse Extraction And Summarization From Multi-document
9	Chinese Multi-document Automatic Summarization Extraction Based On The Combination Of LDA And TextRank
10	Research And Implementation On Chinese Web Pages Summarization