Background and ObjectiveAt present,significant progress has been made in the prevention and control of infectious diseases that pose major threats to human health worldwide.However,with the continuous emergence of new infectious diseases and the resurgence of old ones,the limitations of the existing infectious disease monitoring and early warning system have gradually been exposed,especially in terms of the limited ability to predict new infectious diseases of unknown origin.Therefore,it is necessary to integrate new predictive techniques and feature factors into the traditional infectious disease monitoring and early warning system,and fully utilize new technologies such as big data analysis and artificial intelligence to improve the overall warning capability.The purpose of this paper is to conduct early warning research on infectious diseases using multi-source network big data.By obtaining various network data sources related to infectious diseases,an integrated model suitable for high-dimensional feature big data is constructed,and data mining technology is integrated through visualization software system to ultimately improve the accuracy,robustness,and scalability of the infectious disease early warning system.Contents and Methods(1)Network big data feature analysis:Incorporating monitoring data of influenza and COVID-19,collecting three types of multi-source network big data related to the above-mentioned infectious diseases,including network search,population migration,and network media,analyzing the correlation between network big data and infectious disease monitoring data,and verifying the scientificity of using network big data to warn infectious diseases.(2)Improved swarm intelligence optimization algorithm:Considering the problems of slow convergence speed and easy falling into local optima in traditional swarm intelligence optimization algorithms,three strategies of chaotic mapping,inertia weight,and dimension-wise Gaussian mutation are integrated into the original algorithm.A self-adaptive chaos coronavirus swarm immune optimization algorithm(ICHIO)based on dimension-wise Gaussian mutation is proposed to efficiently and accurately perform optimization tasks.(3)Construction of infectious disease early warning model:Targeting at the high-dimensional features of network big data,guided by the previous ICHIO algorithm,taking into account the sparsity and data imbalance of high-dimensional feature big data,a swarm intelligence synchronous evolution ensemble model(SISEEM)is constructed,and application research and generalization verification are conducted.(4)Development of intelligent analysis system platform:Supported by key technologies such as synchronous evolution method,swarm intelligence optimization,and machine learning,incorporating data preprocessing,high-dimensional visualization,classification tasks,regression tasks,and one-step analysis(One-Step),an innovative one-stop service platform based on visual operation for accessing,processing,analyzing,and storing high-dimensional data is developed.Results(1)Characteristic analysis results:In terms of time characteristics,there is a correlation between big data on the internet and the epidemic monitoring data trend.The correlation analysis results between influenza monitoring data and big data on the internet are r=0.853(p<0.001),and the correlation analysis results between COVID-19 monitoring data and big data on the internet are r=0.736(p<0.001).The violin plot shows that there are more outliers in the search index during the epidemic spread stage of influenza type A and the hidden transmission stage of COVID-19.(2)Improved algorithm results:In the simulation performance test of standard test functions,ICHIO showed significant improvement in global optimization accuracy and convergence capability compared to before the improvement,achieving a good balance between exploration and exploitation.In the simulation performance test of the multiple traveling salesman problem,compared with various novel swarm intelligence optimization algorithms,ICHIO found the shortest optimal path value under three conditions of the number of traveling salesman(2,3,and 5),which were 364.6716,271.4355,and 162.5532,respectively.(3)Model construction results:Based on the characteristics of network big data warning infectious diseases,two types of swarm intelligence synchronous evolution integrated models,classification(USEE)and regression(BSEE),were constructed.In the influenza application research,USEE was used for influenza typing warning research.Two models,USSE1 and USEE2,were trained according to the differences in the included data sources.BSEE was used for influenza epidemic intensity warning research,and two models,BSEE1 and BSEE2,were trained according to the differences in the combination of base learners.The optimal models were USEE2(ROC-AUC=0.9858,PR-AUC=0.9555)that included multi-source data and BSEE2(R~2=0.8602)that used a"nonlinear+linear"fusion method.In the COVID-19application research,USEE(USEE3,USEE4,and USEE5)was trained for the occurrence of hidden transmission,and BSEE(BSEE3 and BSEE4)was trained for the development of hidden transmission.The optimal models were USEE5(ROC-AUC=0.9553,PR-AUC=0.8327)that included multi-source data and BSEE4(R~2=0.8698)that used a"nonlinear+linear"fusion method.In external validation data(Shaanxi Province),the time point of hidden transmission was predicted,and the prediction time was 13 days earlier than the official announcement.(4)System development results:The"High-dimensional data synchronization evolutionary intelligence analysis platform"was developed based on the Matlab APP Designer platform.The system is stable and highly portable.Conclusion(1)Expanding research perspective:The application verification of the warning model for both traditional and emerging infectious diseases is simultaneously included,and information mining is carried out on multi-source network big data,which has the significance of improving public health monitoring capabilities and promoting the development of high-dimensional big data analysis technology.(2)Innovative data mining:A comprehensive study is conducted on the non-balance problem of mixed high-dimensional big data,and various strategies such as feature subset sampling,feature selection,hyperparameter and basic learner weight optimization are simultaneously integrated into the ensemble model.The improved swarm intelligence optimization algorithm guides synchronous evolution,which is a truly innovative method suitable for high-dimensional feature big data mining.(3)Improving the warning system:By including multi-source network big data and innovative data mining techniques,the sensitivity and timeliness of the infectious disease warning system can be effectively improved.It is an optimization and extension of traditional infectious disease monitoring and warning,and has the significance of improving the infectious disease warning system.(4)Promoting practical application:The warning models with strong generalization ability are constructed for different types of infectious diseases,and the data mining techniques proposed in the research are integrated through visualization software.This has important reference and practical application value for future data mining,prediction and warning work related to high-dimensional big data.. |