Font Size: a A A

Application Of Text Mining Technology In Drug Review

Posted on:2020-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:J P LuFull Text:PDF
GTID:2404330602950934Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet has led to the emergence of medical social networking sites.More and more patients comment on the relevant content of drugs(therapeutic effects,side effects,etc.)and share the experience of using drug through the Internet.The explosive growth of user's comment content has produced massive amounts of text data.If relying on manual analysis of these text data was time-consuming and labor-intensive,people began to use sentiment analysis technology.And applying sentiment analysis to drug reviews can help us understand the true emotional of the drug's terminal destination(consumer).However,the study of drug reviews is extremely challenging due to the misspelling and personalized style of writing and indirect expression of emotions.Nowadays,the research on comment data is generally concentrated in the fields of e-commerce,film review,etc.There are relatively few researches on drug reviews.This paper attempts to apply the existing sentiment analysis technology to drug reviews across the field in order to find out the best model.This paper first crawls drug reviews from a foreign healthy website,then preprocesses the data such as deduplication,missing value processing,elimination of punctuation,case conversion,acronym expansion,morphological restoration and filtering stop words.And next applies two methods which are lexical sentiment analysis and machine learning methods to analyze the data.For lexical sentiment analysis,since the existing dictionary is not suitable for the medical domain,this article first combines the WordNet dictionary,GI dictionary and HowNet's English dictionary to form a basie dictionary,and then adds some words which are specific in medical domain to the basic dictionary.Considering the privative words play an important role on emotional polarity,this paper constructs a dictionary of privative words.In addition,this paper marks the position index of positive words?negative words and privative words in the text.Based on the traditional lexicon-based sentiment analysis,this paper constructs the emotional score calculation rules considering the sentiment words and privative words.Through empirical research,the accuracy of the new algorithm is higher than the traditional algorithm.For the machine learning methods,this paper first uses grid search with cross validation to find the optimal setting of n-grams,and then compares the four singles methods which are logistic regression model,support vector machine,naive Bayes and random forest through cross-validation.And using the accuracy rate,recall rate and f-score evaluate these four models,this paper finds that Naive Bayes has the best prediction effect.In order to further improve the model effect,this paper tried the stacking algorithm.Firstly,two "good and different" single models which are Naive Bayes and Support Vector Machine selected as the base model of the first layer of the stacking algorithm.Then,the Logistic regression model is used as the second layer model,and reassuringly the fusion based on the stacking algorithm can improve the prediction effect of the model,and the accuracy rate can reach about 90%.
Keywords/Search Tags:Sentiment analysis, Sentiment lexica, Drug reviews, Machine learning
PDF Full Text Request
Related items