Font Size: a A A

Research On Automatic Generation Of Tags For Microblog Users

Posted on:2013-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y B XieFull Text:PDF
GTID:2268330392967963Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, micro-blogging services have attracted more and moreattention as a new type of web applications. Research focused on microblog isgradually carried out and in accumulation in related fileds such as natural languageprocessing, information retriva l and social computing. Tags for a microblog user, asthe description for his/her interests, concerns and occupational characteristics, areplaying an important role in user indexing and searching, personalization and so onin microblog.This work focuses on user tagging which is based on micrblog content. Usertags are automatically generated with the analysis of microblog content.Millions of user tags related data are crawled via sina microblog API, whichare then used to analyze the characteristics of user tag in statistics and semanticrespects. Meanwhile, experiments and analys is have been done about the semanticsimilarity and the contribution to reflect user interests of tag sour ces based on text,the content user writes, repost, comment or favorites. The results show thatmicroblog content users repost reflect user interests most, and comment least. Andcontent used to generate user tags are the decided based on those experimentalresults.Proposed keywords-based and category-based approaches based on the view ofdifferent granularity of user tag. There are two criteria to evaluate generation results.Generations should accurately reflect user interests and be suitable as user tags.Generation based on TextRank is introduced in keywords-based approach.Important words are extacted for user tagging by analyzing the co-occurrence ofwords in microblog and then constructing word network for them. Generation basedon cluster analysis is then proposed to discover user interests on more dimensions,which mainly extract representations from clusters for user tagging. Theexperimental results show two keywords-based approaches pefrom better thanbaseline approach. And comparisions and discussions are made between these twoapproaches.Categories which users concerned are used as user tags in category-basedapproaches. Generation based on short text classification is proposed. Target classesand microblog corpus are construced to recognize user interests. BaiduEncyclopedia based approach is then introduced to for user tagging with its three-level-categroy information of word items. The experimental results show theaccuracy of user tagging of these two category-based approaches achieves70 percent on test data. Meanwhile, comparisions and discussions are made betweenthese two approaches.
Keywords/Search Tags:Tags for Microblog Users, TextRank, Cluster Analysis, TextClassification, Baidu Encyclopeida
PDF Full Text Request
Related items