| Epilepsy is a chronic non-infectious brain disease and the second largest neurological disease.The diagnosis of epilepsy mainly relies on EEG data.The single time-domain and frequency-domain features of EEG data do not have high information gain for seizure classification.Therefore,traditional statistical methods are difficult to achieve epileptic seizure prediction.The machine learning method can not only process a large amount of data,but also fit the nonlinear relationship between the data.The application on the public data set of health and biological data has significantly improved the results.At present,there are a large number of excellent methods applied to the problem of epileptic seizure prediction.The main differences between the various methods are the pre-selection of data fragments by different researchers,the extraction of various features,and the choice of classifiers.However,there are also the following problems: First,the information of the public data set is limited,and most researchers have no way to obtain more effective data.Therefore,it is very easy to cause the model to overfit when studying the limited data set.The second is that most of the research programs are to model each patient individually,which means that a new patient needs to have at least 2 seizures as training data before the model can correctly identify the patient’s seizure pattern.The research content of this article is whether the existing well-performing methods can be applied to the design of the real-time monitoring and timely prediction system of the epilepsy-related department of a certain hospital in Nanjing.In view of the specificity of the epilepsy data set of a certain hospital in Nanjing,under the premise of meeting the recall rate and accuracy of clinical prediction,the false alarm rate of the system is ensured to be low,and improvements are proposed in data processing optimization.This article mainly completed the following work:First,analyze the similarities and differences between the EEG data set of a certain hospital in Nanjing and the public data set of the existing epileptic seizure research,and perform data cleaning and time-domain frequency-domain feature extraction on it to explore its inherent characteristics.In order to reduce the amount of subsequent calculations,the channel data with strong correlation is fused.In order to ensure that the classifier can correctly identify epileptic seizure patterns,balance the proportion of sample data,and extract independent seizure data as much as possible as the test set.Secondly,Random Forest,which performed well in the public data set,was used as the classifier to train the epilepsy data of a certain hospital in Nanjing,and evaluate the performance of the model on the test set.Due to the large number of data sets,the performance of the Random Forest model on the training set is quite different from that of the public data set.On this basis,a more advanced model-based solution is proposed,that is,epileptic seizure prediction based on LightGBM.The accuracy,sensitivity,and false alarm rate of the set have been improved.Thirdly,using a simple fully connected neural network and LSTM network structure to train the data set separately.The performance of each index of LSTM on the test set is better than that of a fully connected neural network.Although the Dropout layer is introduced,the complex neural network structure can still easily lead to overfitting of the model.The false alarm frequency of the LSTM model is much lower than that of the LightGBM model,and the accuracy and sensitivity are lower than that of the LightGBM model.Then,by comparing the four models proposed above,it can be found that the simple model that performed well in the public data set is lacking in the data set of a certain hospital in Nanjing.On this basis,the LightGBM model and the three-layer LSTM structure model with better performance are respectively proposed.The accuracy,sensitivity and false alarm rate of these two models in the test set are better than the fully connected neural network model and the Random Forest model.Based on the idea of model fusion,a simple voting method is proposed to combine the two strong models LightGBM and LSTM with a slight improvement in each index.Finally,the average accuracy rate of the model is 97.33%,the average recall rate is 87.23%,and the false alarms rate is 0.21/h.Finally,the LightGBM model obtains a more appropriate model complexity through the Bayesian parameter adjustment method,and the LSTM model determines the appropriate model structure through the comparison of the results of multiple complexity.Comprehensive model test results suggest that the bottleneck of existing epileptic seizure research is not the complexity of the model,but the idea of increasing the effective information of the data set.A possible solution to realize this idea is to add one-dimensional spatial information to the labeled data.The realization of this solution requires the corresponding help of medical experts or researchers. |