Font Size: a A A

Detecting And Predicting Civil Unrest Events Based On Frequent Subgraph Pattern Mining

Posted on:2019-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:F C QiaoFull Text:PDF
GTID:1366330623450379Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Civil unrest events refer to parading,sit-down,strikes,occupations and other forms of protests happening at specific time and place which are premeditated or spontaneously organized by certain social groups to express opposition to government,politicians,policies,regulations,or behaviors of large organizations(such as enterprises).In view of the frequent cases and high social costs of civil unrest events,government agencies in various countries around the world have attached great importance to the improvement of such public security incident control and decision-making capabilities,and have invested a lot of manpower and material resources to study and grasp the internalities of such incidents,especially Evolution mechanism and early detection and early warning mechanism.Since 2013,GDELT has officially opened its doors to the world.It has automatically coded and archived data on all conflict and mediation events mentioned in global news,TV broadcasts,newspapers and even academic papers from 1979 to the present.It also monitoring almost every country in the world,covering more than 100 languages,provides a rich data foundation for research on group protests based on data mining and machine learning methods.Based on the GDELT open source big data,this paper studies the technology of civil unrest events detection and prediction based on frequent subgraph pattern mining,and mainly conducts the following four aspects of research work.First,we built a GDELT big data warehouse based on the Hadoop+Hive+Spark architecture.As the world's largest data source for conflict and mediation events,GDELT currently has more than 2 billion raw data records and 7.5TB data.How to collect,store,and query these data is the most basic work.In this paper,the data of GDELT is first collected and stored in HDFS distributed file storage system in real time,and then Hive is used to complete the high-performance ETL of raw data,and GDELT data is finally loaded into the Hive data warehouse.The ”lazy computing” feature of the Hive data warehouse makes its real-time query ineffective.This paper further use Spark SQL to plug GDELT data warehouse,and provide a unified access interface through ThriftServer,greatly enhancing the practicality of GDELT big data warehouse.Secondly,two parallel mining algorithms for large-scale frequent subgraphs are proposed: PTrGraM algorithm for transaction graph data and SSiGraM algorithm for a single large graph.This paper uses frequent subgraph mining methods to discover feature patterns from GDELT big data.However,current frequent subgraph mining algorithms are executed on a single machine,which cannot meet the requirements of large-scale input maps and low threshold mining.This paper proposes parallel mining of frequent subgraphs.Considering that the graph-based frequent subgraph mining algorithm is relatively low in complexity,an single machine based multithreading parallel subgraph mining algorithm PTrGraM is proposed.The complexity of frequent subgraph mining for a single large graph is higher.Distributed mining is performed on multiple computers.This paper proposes the SSiGraM algorithm based on the Spark computing framework for a single large graph frequent subgraph distributed mining algorithm.The algorithm implements sub-graph distributed expansion and support distributed computing and introduces three optimizations.This paper also verify the performance of the algorithm on four large graphs with different densities.Third,a civil unrest events detection method based on frequent subgraph feature engineering was proposed.In view of the shortcomings of current heuristic-based feature selection strategies in distinguishability and interpretability,the use of transaction based frequent subgraphs is proposed to describe the interaction patterns of participants in mass protests.Furthermore,a frequent subgraph feature distinguishability metrics ISDP was also proposed.Finally,strong classifier SVM and ensemble learning classifiers Adaboost and Gradient Boosting were used to optimize learning subgraph features and train event detection models.The experiments focused on mass protests that have historically been reported by authorities to have a major impact.The validity of the detection model was verified on two data sets,”Occupy Central” and ”Occupy Wall Street.”Fourth,a prediction framework for civil unrest events based on the semi-Hidden Markov Model(HSMM)was proposed.Aiming at the multi-stage evolutionary characteristics of mass protest events,a predictive framework for civil unrest events based on semi-Hidden Markov model was proposed.It consists of four main steps: Ground-Truth extraction,BoEAG feature extraction,HSMM model training and sequence classification online test.Through this predicting framework,a country or region can be automatically selected from the GDELT data to capture the characteristics of a large number of civil unrest with the bag of event association graphs.Then the HSMM model were used to learn the law of its development and evolution.Finally the likelihood of an event occurring in a future period of time can be caculated by sequence classification with Bayes decision.In the experimental part,using the test datasets of five countries in Southeast Asia: Thailand,Indonesia,Malaysia,the Philippines,and Cambodia,the effectiveness was evaluated with four methods: the HSMM model,HMM model,logistic regression,and the baseline methods.In summary,this paper addresses the issue of detection and prediction of civil unrest events based on GDELT.This paper first builds a reliable and easy-to-use big data warehouse,and then mines frequent sub-graph pattern,and finally based on feature learning to train civil unrest events detection model and prediction model.It achieves the goal of full chain of data ETL,features mining,applications and analysis of open source big data,providing a feasible solution for analyzing data using GDELT data and other big data sources based on data mining and machine learning methods.It has important theoretical significance and application value.
Keywords/Search Tags:Civil Unrest Event, Frequent Subgraph Pattern, Event Detection, Event Prediction, HSMM, GDELT
PDF Full Text Request
Related items