Font Size: a A A

Research On Heterogeneous Data Mining Methods For Active Surveillance Of Infectious Diseases

Posted on:2019-07-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:H C ChenFull Text:PDF
GTID:1364330572450441Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Active surveillance is the most effective method for infectious diseases prevention and control.The spread of infectious disease can be fundamentally suppressed by sending medical personnel to conduct a comprehensive searching of the individuals in the monitoring area.To maximize the availability of limited medical resources,active surveillance strategy is usually performed in part of the high-risk areas.Therefore,how to accurately estimate the transmission risk within areas to be monitored and thus to provide a reliable basis for reasonable allocation of limited resources is crucial to improving the effectiveness of active surveillance strategy.Human’s mobility behavior is the main driving force of disease transmission,the motivation and decision-making process of such behavior can be affected by multiple factors,e.g.,economy,environment,weather,transportation,etc.If one can find the internal relationship between various factors and mobility behavior,then the transmission process of infectious diseases can be modeled and the potential risk of transmission can be predicted.In the era of Big Data,the theory and technology of heterogeneous data mining have developed rapidly,which provides new opportunities for transmission risks prediction,and also faced with the following challenges:(1)How to discover and model the mobility pattern dominating the transmission trend and find out the driving factors hidden behind this pattern,so as to explain the essential reason and generation mechanism for the existence of infected cases.(2)How to formulate a globally optimal strategy to allocate limited materials and take into account the predictive ability of the model in the future process of active surveillance while maximizing resource availability,i.e.,model sustainability.(3)How to reflect real-time influence of dynamic mobility behavior on the transmission trend and solve the problem of driving factors’ diversity of influence in terms of time and space,i.e.,spatiotemporal heterogeneity.Based on the aforementioned problems,accurate,sustainable,and real-time transmission risk prediction methods are proposed in this thesis,respectively,by adopting heterogeneous data mining methods.The main contributions are as follows:(1)An active surveillance method based on spatiotemporal diffusion network is proposed.This method proposes integrating heterogeneous data to model human mobility process,so as to predict transmission risk and provide accurate reference information for formulating reasonable active surveillance strategy in practical application.Specifically,by analyzing human mobility behavior,the input and transmission process of infectious diseases can be divided into four phases: whether to go out,where to go,whether to be infected,and when to go back,and these phases can be modeled as a Spatiotemporal Diffusion Network.On this basis,a new active surveillance framework for infectious diseases(ASPII)is proposed,which integrates multiple models(e.g,machine learning model,population radiation model,and malaria transmission model)various kinds of data(e.g.,meteorology,environment,physiology,population,geography,social economy and surveillance records).In addition,considering that human mobility behavior can be affected by multiple factors,and the driving forces of mobility behavior differ in different monitoring areas,i.e.,spatial heterogeneity.In this thesis,a hybrid optimization algorithm is proposed,which can automatically classify the monitoring areas while optimizing the parameters of the proposed model,by which the problem of spatial heterogeneity can be well solved.(2)A sustainable active surveillance method based on reinforcement learning is proposed.The implementation of active surveillance is usually limited to some high-risk areas,and the surveillance data fed back to the predictive model in the process is incomplete,which can easily lead to large deviation in subsequent infection risk prediction and even lose the capacity for sustainable prediction.In view of this,this thesis adopts the reinforcement learning algorithm to dynamically allocate the surveillance materials,and proposes a Sustainable Active Surveillance(SAS)framework.The framework is composed of a Predictor,a Classifier,and a Planner,they are cooperated with each other to complete active surveillance tasks.The predictor can evaluate the infection risk of the monitoring areas from both positive and negative perspectives to ensure the stability of the model.A classifier can divide candidate regions with similar attributes into the same group so as to share the data collected from monitored regions with unmonitored regions.The global optimization strategy of resources allocation can be given by the planner,which not only considers the availability of limited material but also considers the sustainability of the model in the subsequent process of infection risk prediction.(3)A real-time active surveillance method based on online learning is proposed.The influence of various kinds of driving factors on mobility behavior is sequential and real-time,such as the change of seasons and weather.As a result,trends in the spread of infectious diseases are equally sequential and real-time,e.g.,malaria spreads faster in summer and on sunny days.In addition,the influence of driving factors on infection risk in each monitoring area is different not only in the spatial scope but also in the time scale,i.e.,spatiotemporal heterogeneity.In view of this,this thesis proposes a Real-time Active Surveillance(RAS)method based on online learning,which adopts the FTRL-Proximal algorithm to update parameters of the model.FTRL-Proximal can not only reflect the real-time impact of various kinds of driving factors on transmission risk but also guarantee the sparsity of coefficients of all driving factors,which are conducive to the discovery of hidden dominant factors.In addition,this thesis also proposes a dynamic classification method for candidate regions,which can automatically optimize the number of categories and the number of areas to be monitored in each category.By this method the problem of spatiotemporal heterogeneity can be well solved.
Keywords/Search Tags:Active Surveillance, Spatiotemporal Pattern, Sustainable Active Surveillance, Reinforcement Learning, Real-time Active Surveillance, Online Learning
PDF Full Text Request
Related items