With the integration of various medical devices into the medical diagnosis and treatment of diabetic patients,medical institutions have collected a large amount of diabetes data.In the analysis of diabetes data,anomaly detection has always been a hot issue of data analysts and medical staff.It is found that some high or low blood sugar values,insulin values,blood pressure values and daily monitoring data of diabetes are significantly different from those collected from diabetes data.At the same time,with the development of artificial intelligence technology,unsupervised learning method has been widely used in anomaly detection.Therefore,in this thesis,we use unsupervised learning methods to solve the problem of anomaly detection of different types of diabetes data,mainly doing the following three aspects:First,in view of the problems of diabetes static data(containing different patients’ data at the same time)with many feature dimensions,few anomaly data,and insufficient labeled abnormal data,we propose an anomaly detection method based on feature selection and auto encoder(FSAE).Firstly,the feature selection method of maximum correlation and minimum redundancy is used to analyze the importance of the features of high-dimensional diabetes data,and the features with the maximum correlation with the abnormality and the minimum redundancy with each other are extracted.Then,the autoencoder is modeled and trained using the daily monitoring data of diabetes,such as normal blood glucose value,insulin value,and blood pressure value.Finally,the abnormality probability of each data is calculated according to the reconstruction error of normal data and abnormal data.Data with high reconstruction error is considered abnormal data.Otherwise,it is considered normal data.The experimental results show that the proposed method can effectively detect abnormal data points.At the same time,it also proves that the method has a certain adaptability,and has a good detection effectiveness in different diabetes static data sets.Second,in view of the problems of rapid generation and concept drift of diabetes time series data,in which the statistical distribution of diabetes time series data will change with time,we propose an LSTM-based anomaly detection method for diabetes time series data(AELSTM).First,two different LSTM networks are used to predict the diabetes time series data respectively,and then calculate the difference between the current predicted value and the actual value.Then,for the difference data in each LSTM network,we select an appropriate sliding window,model the distribution of the difference data in the sliding window,and calculate the probability density value of the difference data in the current distribution.Finally,the mean probability density of the difference data in the two distributions is used to calculate the abnormal likelihood of the data.The experimental results show that the method can effectively detect abnormal data while maintaining a low false positive rate.Finally,based on the two anomaly detection methods proposed in this thesis,an anomaly detection prototype system for diabetes data is designed and implemented,providing functionalities such as function selection,personal information collection,anomaly discrimination and anomaly warning.Firstly,the design scheme and realization method of the system are explained.Then,each functional module of the system is expounded,and the function of the system is verified by experiments.Finally,the experimental results show that the diabetes data anomaly detection prototype system can realize the abnormal detection of diabetes static data and time series data,and verifies the effectiveness of the method proposed in this thesis. |