Font Size: a A A

Design And Implementation Of Tobacco Intelligence Aided Judging System In Internet

Posted on:2020-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2416330590482225Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the extensive development of the Internet,Selling fake cigarette gangs have published a large amount of ilegal information on tobacco business and smuggling in internet.We refer to the information as tobacco intelligence.The collection,screening and research of Internet tobacco intelligence is the basis for decision-making by tobacco enforcement officers in combating tobacco-related ilegal activities in internet.Facing the current situation of tobacco-related ilegal activities in internet,the thesis proposes a scheme of tobacco intelligence aided judging system in Internet.The entire program is completed by the following steps:(1)Combine the existing data crawling tools and the python crawler tool designed in the thesis to capture the required text and image data from websites,forums,post bars,etc.After the data is cleaned,the natural language processing such as word segmentation,part-of-speech tagging,named entity recognition,and dependency syntax analysis is performed,and the data is manually labeled to construct the tobacco-related data set.(2)The text and the image are used as input to the smoke data screening model,and the output of the model is tobacco-related text and images.The text semantic feature is vectorized based on the word frequency-reverse file frequency method(Tf-idf).A simplified convolutional neural network model is constructed.The semantic value of the image is characterized by the classification probability value,and the text and image features are merged.Experiments show that for the smoke data screening,the classification accuracy based on fusion features proposed in this thesis is 2.65% higher than the classification accuracy based on single text features.It is further proved that the fusion feature of text and image contrasts with a single text feature,which combines more semantic information and fills the incompleteness of text feature information.(3)The tobacco-related text is used as the input of the tobacco intelligence event extraction model,and the tobacco-related information extraction task is transformed into the tobacco-related event extraction task.The output of the model is the tobacco intelligence information in tobacco management and smuggling.An improved event seed clustering algorithm based on Word2 vec sentence semantic similarity calculation is proposed in the thesis.After the event extraction mode is generalized and filtered,the F value of event extraction on the ACE corpus and our tobacco-related dataset is increased by 0.9%,3.7%.Experiments show that the method proposed in the thesis can complete the task of extracting tobacco intelligence.(4)Finally,the tobacco intelligence aided judging system in internet was designed and implemented.The main functional include: data collection module,data preprocessing module,tobacco-related data screening module and tobacco intelligence extraction module.After trial use by the tobacco department,it proves the feasibility and effectiveness of the tobacco intelligence aided judging system scheme proposed in the thesis.
Keywords/Search Tags:intelligence analysis, convolutional neural network, feature extraction, event extraction
PDF Full Text Request
Related items