Font Size: a A A

Chat Text Mining Of Drug-related Personnel Based On Semantic Analysis Model

Posted on:2022-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2516306527968049Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Semantic analysis of the chat text of drug-related personnel can dig out drug-related personnel from the massive and complex network and then investigate them in time quickly and accurately.This paper makes effective data selection based on the real-time data collected by the anti-drug research and judgment platform,and uses the chat text data of drug-related personnel with dialect characteristics and the chat text in a specific context.The general text classification model is used as the starting point to train the chat text of drug-related personnel,and the Bert model based on contextual semantic learning is used for theoretical analysis and experimental verification,then the existing problems are analyzed and corresponding improvements are made.The BERT model,which can learn the context,has a significant effect on drug-related data mining of the chat text,and is better than the general classification model in accuracy,recall and F1 value.The specific research work and results of this paper are as follows:(1)By learning the decentralized and distributed text representation,using the traditional word vector model,TF-IDF and Bayesian classification model to analyze the chat text data of drug-related personnel,it is observed that in the chat text data of this kind of drug-related personnel,when there are many occurrences of polysemous words in different contexts,the model's discrimination ability is poor,and it is difficult to classify the text correctly,and it is necessary to disambiguate polysemous words.(2)In order to consider the influence of context,the BERT model is proposed.Finishing the fine-tuning of the pre-training of the BERT model,and using the best learning rate obtained for the text classification.In the test text,the accuracy of the BERT model is 7 percentage points higher than that of the Bayesian model,and the anti-drug text classification task is generally better than the Bayesian model.(3)By analyzing the content structure of the misjudgment data of the BERT model,it is found that the model's discriminant ability is not strong when there are scattered sensitive words in the sentence,so the influence of adding sensitive words in the text encoding is considered.With the help of the sensitive word database,text sensitive words are extracted and outputted,and then integrate them into the Bert pre-trained model.The BERT-sen pre-trained model is established to relearn and output the vector representation of words in specific scenes.After learning the wrong sentence of the BERT model,the accuracy of the BERT-sen pre-trained model is 3% higher than that of the BERT model in the test text.It is more sensitive and effective than the BERT model when learning multi-word text.
Keywords/Search Tags:Text Classification, BERT Model, Drug-Related Personnel Mining, Self Attention Mechanism
PDF Full Text Request
Related items