Detection Of Hot Topics In Certain Field Using Paper Abstracts And Citation Information

Posted on:2013-11-02

Degree:Master

Type:Thesis

Country:China

Candidate:H Cheng

Full Text:PDF

GTID:2298330392469323

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

It is not a easy thing to find the information that you really need from the vastamount of Internet sources in this information era. To those science researchers, it isalmost the same situation.The researchers usually need to learn in which directionthe hot topic will go in his research filed; additionly, it is quite common for them toquickly enter into another research area. One possible solution to this problem is toread the published papers of that field, but it must be a painful process becausethere are large amount of scientific literature and endless papers in each researchfield. It must be of great help to those researchers who need these information ifthere is a system that can process the papers of a certain field and automaticlygenerate the hot topics in the field. This article studys the related techniques to theproblem of the detection of hot topics in certain field using paper abstracts andcitation information. The main aspects on which this article focuses are as follows:Firstly, using the citation relations between the papers in the ACL anthologyand the basic information of these papers,such as, the authors and the year in whichthey are written, this article extracts876papers abstract texts and the citationinformation texts,then establishes the abstracts corpus and the citation informationcorpus. Each of these876papers is cited by other ACL papers at least20times.Secondly, this article proposes a method to detect hot topics in certain fieldusing paper abstracts and citation information, performs the following crucialoperations in the process of detecting hot topics to the corpus texts: filtering thestopwords which have no help to detect hot topics, extracting the features in amodified TF method which can nearly fully cover the content of the text andgenerating vectors to describe the original text with math s language, after that, thisarticle performs the clustering operation which combines the AP algorithm andK-means algorithm based on the vectors and each cluster is a summary to part of thetext s content.Thirdly, the problem of detecting a whole field s hot topic, in its simpler form,may reduce to detecting one particular paper s topic. So this article first solves theproblem of detecting one paper s topic. From each cluster in the clustering results,this article extract a sentence which can best describe the cluster. After that a newsummary is generated which takes into account both the auhtor s opinion and otherauthors s opinion. From the new summary, this article detects the topic of the paper.In the last step, this article collects every paper s topic to compose the hot topics ofthis field; a more detailed collection of the hot topics in the field is generated takinginto account the information of the year in which each paper in the corpus is written. The final results show that the hot topics detected by this method can almost coverthe essence of the field and prove the method is effective.

Keywords/Search Tags:

paper abstracts, citation information, clustering, hot topic detection

PDF Full Text Request

Related items

1	Research Of Paper Originality Detection Technology Based On The Citation
2	Interdisciplinary Topic Identification Based On Citation Relation And Citation Content
3	Literature Topic Extracting Based On Weighted Semantic And Citation Relation
4	Research On Domain-Specific Web Information Collection And Topic Detection And Its Application
5	Research On Algorithm Of Topic Detection And Tracking
6	Research On Topic Clustering Algorithm Based On Topic Models
7	Research On Citation Recommendation Based On Paper Keyphrase
8	Design And Implementation Of An Academic Paper Recommen- Dation System Based On Community Detection
9	Design And Implementation Of The Micro-blog Topic Detection System Based On Incremental Clustering
10	Technology And Research Of The Academic Community Detecting Based On Paper Citation Relations