| With the development of the Internet,medical forums play an important role in social networking platforms and medical and health knowledge services,and it has also become one of the popular targets coveted by online water army.Many unscrupulous merchants employ water army to publish baiting information and promote unqualified products and treatments,which,if left unchecked,will cause personal and property damage to the majority of Internet users(including patients).Therefore,it is especially important to accurately identify and remove these water army.The water army in the medical forums have different characteristics from other online platforms.With the popularization of network real name political,the water army in medical forums have also changed from the carefree mode to the behavior of normal users,which brings greater difficulty and higher requirements for identification,and many traditional detection methods have been unable to match the new scene also can not meet the new demand.In this thesis,I created several medical forum water army detection models,including traditional machine learning classification models and graph neural network classification models,by analyzing and extracting the behavioral and relational characteristics of medical forum users,and compare them through experiments.The specific work in this thesis includes(1)designing and deploying a medical forum web data collection crawler program to collect real massive user data in the Sweet Home forum;(2)formatting,cleaning and analyzing the collected data,analyzing the data from four perspectives: user information features,user behavior features,user forum social network features,and user behavior relationships,and extracting feature values.New features such as the number of replies per unit online time,user active time period,and speaking interval are proposed and used in the medical forum network water army detection model;(3)a graph neural network model based on a bipartite graph of user behavior relations is introduced to mine the relationship between user replying behavior and main posting.A fully connected graph with posting users and replying users as nodes and user behavior features as feature vectors is constructed and used a multi-layer graph neural network is used to construct a classification model based on user behavior features and relationship features.The online water army classification model designed in this thesis is tested in comparison with the traditional user behavior feature-based online water army classification model.The experiments on a real dataset show that the graph neural network approach with the addition of relational features achieves better classification results,reaching an AUC value of 0.902,which is higher than the traditional classification model scheme,and provides a new reference for the study of water army identification. |