| It is not a easy thing to find the information that you really need from the vastamount of Internet sources in this information era. To those science researchers, it isalmost the same situation.The researchers usually need to learn in which directionthe hot topic will go in his research filed; additionly, it is quite common for them toquickly enter into another research area. One possible solution to this problem is toread the published papers of that field, but it must be a painful process becausethere are large amount of scientific literature and endless papers in each researchfield. It must be of great help to those researchers who need these information ifthere is a system that can process the papers of a certain field and automaticlygenerate the hot topics in the field. This article studys the related techniques to theproblem of the detection of hot topics in certain field using paper abstracts andcitation information. The main aspects on which this article focuses are as follows:Firstly, using the citation relations between the papers in the ACL anthologyand the basic information of these papers,such as, the authors and the year in whichthey are written, this article extracts876papers abstract texts and the citationinformation texts,then establishes the abstracts corpus and the citation informationcorpus. Each of these876papers is cited by other ACL papers at least20times.Secondly, this article proposes a method to detect hot topics in certain fieldusing paper abstracts and citation information, performs the following crucialoperations in the process of detecting hot topics to the corpus texts: filtering thestopwords which have no help to detect hot topics, extracting the features in amodified TF method which can nearly fully cover the content of the text andgenerating vectors to describe the original text with math s language, after that, thisarticle performs the clustering operation which combines the AP algorithm andK-means algorithm based on the vectors and each cluster is a summary to part of thetext s content.Thirdly, the problem of detecting a whole field s hot topic, in its simpler form,may reduce to detecting one particular paper s topic. So this article first solves theproblem of detecting one paper s topic. From each cluster in the clustering results,this article extract a sentence which can best describe the cluster. After that a newsummary is generated which takes into account both the auhtor s opinion and otherauthors s opinion. From the new summary, this article detects the topic of the paper.In the last step, this article collects every paper s topic to compose the hot topics ofthis field; a more detailed collection of the hot topics in the field is generated takinginto account the information of the year in which each paper in the corpus is written. The final results show that the hot topics detected by this method can almost coverthe essence of the field and prove the method is effective. |