Font Size: a A A

Research On Sentiment Analysis Method Based On Short Text Feature Extension And Fused-KNN Algorithm

Posted on:2020-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:C S JiangFull Text:PDF
GTID:2428330623466994Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Information technology is in a state of rapid development,and has been extensively used in all sorts of fields.Mining the effective emotional information contained in news comment short texts,and then achieving the control of social public opinion and other purposes,has become a research hotspot in recent years.However,if the sentiment analysis is directly performed,the flaws with less inherent emotional characteristics will lead to a poor result.In addition,the K-Nearest Neighbor(KNN)algorithm,which is commonly used for sentiment classification,has defects in selecting neighbor samples and judging category.Based on the above problems,this thesis intends to improve the classification performance of KNN algorithm and extends the emotional characteristics of news comment short texts.The main contents are as follows:Firstly,when adopting Euclidean distance to measure the distance between samples,the traditional KNN algorithm treats the difference between attributes of samples equally,which leads to the inaccuracy selection of neighbor samples.In response to the problems,this thesis proposes an improved samples' selection strategy,which is based on correlation distance.Aiming at the problem that the result is susceptible to samples' inequality which is caused by the category judgment mechanism called “the minority is subordinate to the majority”,this thesis proposes an improved strategy,which is based on the sum of polarity influence factors.Then,these two improved strategies are applied to the traditional KNN algorithm,so the Fused-K-Nearest Neighbor(Fused-KNN)algorithm which integrates improved strategies is proposed.The algorithm is used as an emotional classifier for news comment short texts.The experimental result shows that,the algorithm can select neighbor samples more accurately and obtain better results.Secondly,in order to solve the problem that the existing algorithms underutilize the information contained in Wikipedia when expanding comment short texts' features,this thesis proposes a method which can realize comprehensive utilization.The method measures the semantic relevance between subject word and its candidate extended words through the Similarity Calculation Algorithm Based on Page and Structure(PS-SIM)based on the page's content,the article referenced network and the category tree;then,the results are sorted in descending order;at last,the candidate extended words with high correlation are selected as the final extended words.The experimental result shows that the algorithm can effectively extend emotional features,and can make the classification algorithm get more effective information,and then improve its performance.Finally,in order to verify the practical application effect of this thesis' s research results,a prototype system called the Online News Comments' Sentiment Analysis System is designed and implemented,which is combined with Tencent News.At the same time,the structure and operating process are analyzed,and expounds the realized process of the modules such as Comment Collection,Feature Expansion and Sentiment Analysis.Then,the module called Result Display shows the results of sentiment analysis visually.The system provides a reference value for the realization of related systems in the future.
Keywords/Search Tags:KNN Algorithm, Correlation Distance, Short Text Feature Extension, Wikipedia Knowledge Base
PDF Full Text Request
Related items