Font Size: a A A

Research On Twitter Emerging Event Detection Based On High Utility Pattern And Multi-assignment Graph Partition

Posted on:2020-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:M M HuangFull Text:PDF
GTID:2428330575496966Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Twitter emerging event detection is a process of detecting the information about emerging events from tweet data stream,which can be widely used in the fields of news acquisition,public opinion analysis,and disaster event detection.However,the traditional emerging detection methods have two problems:First,the frequent itemset mining algorithm simply mines all the itemsets whose frequency is greater than the support,and ignores the problem that different words have different weights;Second,a burst word can belong to multiple events,while traditional single-assignment clustering algorithms assign only one burst word to a cluster.This dissertation has carried out research on the detection of Twitter emerging event.The main research work is as follows:(1)Time information extraction for event detection.Time information is one of the important elements of an event and is widely used in event detection and tracking research.The traditional rule-based recognition method has low recall rate and cannot recognize the event-type Chinese temporal expressions,this dissertation presents a Chinese temporal expression recognition method through combining rules with a statistical model.Firstly,we divide Chinese temporal expression into 7 categories and use time primitives as the smallest unit of recognition to simplify the complexity of rule-making.Then,using regular rules to recognize Chinese temporal expressions and labelling the training set automatically.At the same time,labelling the event-type temporal expressions that rule-based method can't recognize manually.Lastly,using the labelled training set to train a Conditional Random Field model.The implementation result shows that the method significantly reduces the amount of annotation work and effectively improves the recognition recall rate.The F1 value reaches 87.46%,which is 6.13%higher than the rule-based method.(2)Aiming at the problems of traditional frequent itemset mining algorithm and single-assignment clustering algorithm,this dissertation propose an emerging event detection method based on high utility pattern mining(HUPM)and multi-assignment graph partitioning.This dissertation first gives the definition of the utility of the words in tweets and calculates the utility for each word.Then determines the minimum word utility threshold and uses the HUPM algorithm to mine the high utility pattern(itemsets).Finally,uses the multi-assignment graph partitioning algorithm to cluster the itemsets and sorts the clusters bydf-idf_t.Experimental results show that the proposed method not only achieves good detection results,but also has better time performance.
Keywords/Search Tags:Chinese Temporal Expressions, Conditional Random Fields, Twitter, Emerging Event Detection, High Utility Pattern, Multi-assignment Graph Partitioning
PDF Full Text Request
Related items