Font Size: a A A

Research On Using Text Mining Technology To Analyze Diabetes Tweets

Posted on:2021-04-20Degree:MasterType:Thesis
Institution:UniversityCandidate:Idemudia Christian UwaFull Text:PDF
GTID:2404330623983976Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Diabetes patients are increasing each day all over the world,so finding a way to efficiently and successfully treat and care for these patients has become a global challenge.Experts have resorted to developing efficient management and care systems to help reduce the fatalities of diabetes.As information and technology improved,more research and scientific study on the use of computer systems have been carried out to discover care procedures that will assist diabetes patient live a safer and stress-free life.Past research utilized patient data that were stored in electronic health devices or systems but recently,attention has moved to the application of social media data which exist in text format.The question has been how to efficiently utilize these data set for the design and development of support systems for diabetes patients.In this research work,discussions from Twitter and web search result from Google and Baidu are selected as the data source to be investigated.Using Latent Dirichlet Allocation(LDA)topic modelling technique and SVM classification algorithm,the paper implements a text mining procedure which provides an efficient means for the discovery of insights about diabetes.The major work carried out is as follows:(1)Tweet download and Transformation.Python Twitter API function is used to download tweets from Twitter micro blogging website into a CSV file.Using Spacy library implementation,stemming of text data was performed while the TF-IDF algorithm is applied to determine the frequency of terms in the data set.Principal Component Analysis(PCA)technique is implemented to reduce the complexity of the data set.In a bid to ensure that only diabetes-related tweets are analyzed,the popularity score of each hash tag over a certain period of time was calculated as a measure to determine correlation with diabetes discussion.After the hashtag popularity test,9 hash tags with the highest popularity value over were selected for use in order to have quality data set.(2)Data Annotation and Labeling.A double annotation procedure is developed for the annotation of tweets.Manual annotators with medical experience dealing with diabetics were selected to tag tweets using a web classification framework designed for the study.The reliability of their annotation is evaluated using Fleis Kappa statistic and F-score measure.Latent Dirichlet Allocation(LDA)topic modelling technique is applied to automatically classify tweets into various topics of interest with each topic assigned a label based on the dominant words in each topic group.The degree of semantic similarity for words in the topics constructed is evaluated using coherence measures(UCI and UMass)designed for Latent Dirichlet Allocation(LDA)topic modelling.The experimental results show that using the LDA topic modeling method to analyze the diabetes text information can provide users with reliable reference opinions.(3)Support Vector Machine(SVM),Naive Bayes(NB)and logistic regression algorithms are used to automatically classify tweets into two(2)categories(depressive and non-depressive).By tuning parameters,the prediction accuracy of each model was analyzed in four(4)different iterations.The support vector machine(SVM)algorithm showed higher performance than Naive Bayes(NB)and Logistic Regression models.The classification accuracy using the SVM model is 92% which outperforms other classification algorithm experimented with.Similarly,accuracy rate of the manual annotation process using Fleis-Kappa statistics and F scores were 84% and 78%,respectively.Finally,Spearman rank correlation coefficients for the association of tweets with Google and Baidu search data are 0.667 and 0.600,respectively.The correlation between tweets and Google and Baidu search data verifies the 95% significance level of the research analysis.
Keywords/Search Tags:Diabetes, Social Media, LDA Topic Modelling, Natural Language Processing, SVM Algorithm
PDF Full Text Request
Related items